linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
3 days	Merge tag 'drm-fixes-2026-07-18-1' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds
	Pull drm fixes from Daie Airlie: "Weekly drm fixes, there is amdgpu, xe and i915 and then a lot of scattered fixes. Looks about the right level for the new right. ttm: - Handle NULL pages and backup handles in ttm_pool_backup() correctly gpusvm: - Improve unmap and error handling on gpusvm udmabuf: - Always synchronize for CPU in begin_cpu_udmabuf xe: - Fix BO prefetch with CONSULT_MEM_ADVISE_PREF_LOCK - Hold a dma-buf reference for imported BOs - Fix writable override for CRI - Fix VF CCS attach/detach race with in-flight BO moves - Fix WOPCM size for LNL+ - Reset current_op in xe_pt_update_ops_init - Keep scheduler timeline name alive - Hold device ref until queue teardown completes - Disable display in admin only PF mode i915: - NV12 display fix for bigjoiner - clear watermark on plane disable - GT selftest fixes host1x: - Fix UAF amdxdna - Fix UAF - Reject more invalid amdxdna command submissions ivpu: - Fix wrong read - Handle invalid firmware log in ivpu panthor: - Fix error handling virtio: - Fix virtio deadlock - Fix invalid gem detach amdgpu: - DCN 4.2 fixes - NUTMEG fixes - 8K panel fix - Backlight fixes - UserQ fix - Fix bo->pin leaking in amdgpu_bo_create_reserved() - VFCT fixes - devcoredump fixes - Display fixes - SMU7 DPM fix - AC/DC fixes for SMU7 and SI - Queue reset fix - PCIe DPM fix - XHCI/GPU resume ordering fix - Pageflip timeout fix amdkfd: - Fix potential overflow in CWSR size calculation - DQM error clean up fixes * tag 'drm-fixes-2026-07-18-1' of https://gitlab.freedesktop.org/drm/kernel: (61 commits) Revert "drm/amd/display: Restore 5s vbl offdelay for NV3x+ DGPUs" drm/amd/display: check GRPH_FLIP status before sending event drm/amd/display: consolidate DCN vblank/flip handling onto vupdate_no_lock drm/amd: Create a device link between APU display and XHCI devices drm/amd/display: wire DCN42B mcache programming callback drm/amd/display: set new_stream to NULL after release drm/amd/display: Force PWM backlight on Lenovo Legion 5 15ARH05 drm/amdkfd: free MQD managers on DQM init failures drm/amdgpu/ttm: Consider concurrent VM flushes for buffer entities drm/amd/pm/smu7: Fix AC/DC switch notification drm/amdgpu: Disable PCIe dynamic speed switching on Ryzen Pinnacle Ridge drm/amdgpu: always emit the job vm fence drm/amd/pm/si: Fix AC/DC switch notification drm/amd/pm/si: Don't schedule thermal work when queue isn't initialized drm/amd/display: dce100: skip non-DP stream encoders for DP MST drm/amd/display: Set native cursor mode for disabled CRTCs drm/amd/pm/ci: Don't disable MCLK DPM on Bonaire 0x6658 (R7 260X) drm/amd/display: fix __udivdi3 link error drm/amdgpu: Reserve space for IB contents in devcoredumps drm/amdgpu: Print vmid, pasid and more task info in devcoredump ...
3 days	Merge tag 'amd-drm-fixes-7.2-2026-07-17' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.2-2026-07-17: amdgpu: - DCN 4.2 fixes - NUTMEG fixes - 8K panel fix - Backlight fixes - UserQ fix - Fix bo->pin leaking in amdgpu_bo_create_reserved() - VFCT fixes - devcoredump fixes - Display fixes - SMU7 DPM fix - AC/DC fixes for SMU7 and SI - Queue reset fix - PCIe DPM fix - XHCI/GPU resume ordering fix - Pageflip timeout fix amdkfd: - Fix potential overflow in CWSR size calculation - DQM error clean up fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260717215008.998399-1-alexander.deucher@amd.com
3 days	Revert "drm/amd/display: Restore 5s vbl offdelay for NV3x+ DGPUs"	Leo Li
	Now that proper fixes have been found, let's revert this workaround. This reverts commit a1fc7bf6677eb547167cb72b3bcafdc34b976692. Tested-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f64a9be5653689ff43e148cd8a6483077488c8e5) Cc: stable@vger.kernel.org # 8382cd234981: drm/amd/display: consolidate DCN vblank/flip handling onto vupdate_no_lock Cc: stable@vger.kernel.org # 48ab86360af1: drm/amd/display: check GRPH_FLIP status before sending event Cc: stable@vger.kernel.org
3 days	drm/amd/display: check GRPH_FLIP status before sending event	Leo Li
	[Why] After unifying DCN interrupt sources under VUPDATE_NO_LOCK, we have two remaining issues to clean up: 1. On DCN, flip completion is now delivered from VUPDATE_NO_LOCK (dm_crtc_high_irq_handler) instead of GRPH_PFLIP. But VUPDATE_NO_LOCK fires every frame, regardless of whether a flip has latched. 2. There is a window during commit where a flip is armed (pflip_status = SUBMITTED) but not yet programmed into HW. If the VUPDATE_NO_LOCK fires in that window, its handler would deliver a flip event to userspace before HW has latched to it. If userspace then renders to what it believes is now the back buffer (but HW is still latched to it!), it will cause display corruption. This issue seemed to have been introduced by: commit 1159898a88db ("drm/amd/display: Handle commit plane with no FB.") Enabling replay or psr extended the duration of this window, and hence made corruption more likely to be observed. [How] * Move acrtc->event/pflip_status arming to after update_planes_and_stream_adapter() has programmed the flip into HW. This closes the window where pflip_status is SUBMITTED but the flip is not yet programmed. * Add dc_get_flip_pending_on_otg(), which reads the HUBP flip-pending status straight from HW for the pipe(s) bound to an OTG instance. It is keyed only by otg_inst and does not take or mutate a dc_plane_state, so it is safe to call from the OTG interrupt handler without racing a concurrent commit that may be modifying plane state. * Optimistically query for flip-pending after programming, in the event that HW latched to the new fb between programming start and arming event. If it latched, send the vblank event immediately, rather than wait for the next vblank IRQ. * In the VUPDATE_NO_LOCK handler, only deliver flip completion once dc_get_flip_pending_on_otg() reports the flip is no longer pending. Otherwise leave the flip armed and retry on the next vupdate. * For DCE, maintain the existing behavior of arming flips before programming, and relying on GRPH_FLIP to fire at HW latch. v2: * Drop flip_programmed completion object, instead move event/pflip_status arming after programming. * For DCN, optimistically query for flip pending immediately after programming, and if it latched, send event right away. v3: * Fix event timestamps on optimistic flip latch detection, where it's possible for it to run before the vupdate IRQ updates the timestamp. * Add more docstrings for DCN vblank handling. * Clean up if conditions in dm_arm_vblank_event(). * Code style cleanup on braces surrounding multi-line statements. Fixes: 9b47278cec98 ("drm/amd/display: temp w/a for dGPU to enter idle optimizations") Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/3787 Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/4141 Assisted-by: Copilot:claude-opus-4.8 Tested-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f64a9be5653689ff43e148cd8a6483077488c8e5) Cc: stable@vger.kernel.org # 8382cd234981: drm/amd/display: consolidate DCN vblank/flip handling onto vupdate_no_lock Cc: stable@vger.kernel.org
3 days	drm/amd/display: consolidate DCN vblank/flip handling onto vupdate_no_lock	Leo Li
	[Why] On DCN, vblank events were delivered from VSTARTUP/VUPDATE (dm_crtc_high_irq/dm_vupdate_high_irq) and pageflip completion from GRPH_PFLIP (dm_pflip_high_irq). These signals can be masked by hardware by a few things: * DPG - DCN can Dynamically Power Gate parts of the display pipe when a self-refresh capable eDP is connected. DPG is engaged when there's enough static frames (detected through drm_vblank_off). Once gated, even though the OTG (output timing generator) is still enabled, VSTARTUP and GRPH_FLIP are masked. * GSL - Driver can use the Global Sync Lock to block HW from latching onto double-buffered registers during programming, to prevent HW from latching onto a partially programmed state. This will mask VSTARTUP, GRPH_FLIP, and VUPDATE. See dcn20_pipe_control_lock(). * MALL - A DCN accessible cache introduced in DCN32+ DGPUs that can store fb data to allow for longer DRAM sleep. When scanning out from MALL, VSTARTUP is masked. When masked, events are never delivered, which can show up as flip_done timeouts in the wild. However, there is an interrupt source on DCN that is never masked: VUPDATE_NO_LOCK. It's simply an unmasked variant of VUPDATE, which fires while the OTG is active, at the exact point hardware latches double-buffered registers. It is therefore the natural single signal for delivering both vblank and flip-completion events on DCN, and the correct point to timestamp both VRR and non-VRR vblanks. DCE's interrupt sources are different, it does not have an unmaskable VUPDATE_NO_LOCK. The only unmaskable DCE interrupt is VLINE0, but it can only be programmed as a vline offset from vsync_start, making it unsuitable for VRR. Thus, we keep DCE untouched and use the existing mix of interrupt sources. [How] For DCN1 and newer only: * Factor the body of dm_crtc_high_irq() into dm_crtc_high_irq_handler() and drive it from dm_vupdate_high_irq() (VUPDATE_NO_LOCK). DCE keeps using dm_crtc_high_irq() (VSTARTUP) and dm_pflip_high_irq() (GRPH_PFLIP) unchanged. * Stop registering VSTARTUP (crtc_irq) and GRPH_PFLIP (pageflip_irq) on DCN, and stop enabling them in amdgpu_dm_crtc_set_vblank() / manage_dm_interrupts(). Enable VUPDATE whenever vblank is enabled on DCN (previously only in VRR mode). The secure-display vline0 interrupt is left untouched. * VUPDATE_NO_LOCK does not early-fire on an immediate (tearing / async) flip, since HW latches the new address right away. Deliver the flip completion event immediately after programming such flips in amdgpu_dm_commit_planes(), and clear pflip_status so the next vupdate handler does not double-send. v2: Do not gate VUPDATE_NO_LOCK on DCN in dm_handle_vrr_transition() Also toggle VUPDATE_NO_LOCK on DCN in dm_gpureset_toggle_interrupts() Re-cook vblank event count and timestamp for immediate flips Fixes: 9b47278cec98 ("drm/amd/display: temp w/a for dGPU to enter idle optimizations") Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/3787 Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/4141 Assisted-by: Copilot:claude-opus-4.8 Co-developed-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev> Tested-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c87e6635d2db02c88ae8d09529362da672d34770) Cc: stable@vger.kernel.org
3 days	drm/amd: Create a device link between APU display and XHCI devices	Mario Limonciello
	Some AMD APU multi-function devices expose an integrated USB xHCI controller. In some circumstances (such as larger VRAM), the PM core can resume can fail when the xHCI controller is resuming in parallel with the GPU/display function. On affected systems, the xHCI controller can complete pci_pm_resume and start resuming USB devices while the GPU is still in its much longer resume path. This race condition leads to USB device resume failures followed by: xhci_hcd ...: xHCI host not responding to stop endpoint command xhci_hcd ...: HC died; cleaning up Create a device link from any xHCI controller sharing the same PCIe root port as the APU display function. The link uses DL_FLAG_STATELESS and DL_FLAG_PM_RUNTIME to ensure the GPU completes its resume before the xHCI controller begins resuming USB devices. This device link is done specifically in amdgpu so that if the platform firmware has been modified such that this issue doesn't happen the version can be detected and the workaround skipped. Suggested-by: Aaron Ma <aaron.ma@canonical.com> Reported-by: mrh@frame.work Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221073 Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Tested-by: Alexander F <superveridical@gmail.com> Tested-by: Francis DB <francisdb@gmail.com> Link: https://patch.msgid.link/20260713195313.1739762-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 07c93d7eeb0d990bc1b8e3b1eafa464bc9feee97) Cc: stable@vger.kernel.org
3 days	drm/amd/display: wire DCN42B mcache programming callback	Pengpeng Hou
	DCN42B enables DML2 and DML21 by default and defines dcn42b_prepare_mcache_programming(), but the resource function table only wires the callback when CONFIG_DRM_AMD_DC_DML21 is defined. There is no in-tree Kconfig symbol named DRM_AMD_DC_DML21, so the preprocessor always removes the callback entry. Sibling DCN42 and DCN401 resource tables wire their prepare_mcache_programming callbacks unconditionally, and the core DC code already checks whether the callback pointer is present before calling it. Remove the stale guard so DCN42B exposes the callback relation that its source and DML21 build world already provide. This is an RFC patch draft from static conditional callback legality auditing. It needs AMD display maintainer review before submission as a final fix. Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 85453fb4ff726e1ddb9984ee83dca260903c5353)
3 days	drm/amd/display: set new_stream to NULL after release	WenTao Liang
	In dm_update_crtc_state(), the skip_modeset path releases new_stream via dc_stream_release() but does not set the pointer to NULL. If a later error (e.g., color management failure) triggers the fail label, the error path calls dc_stream_release() again on the same dangling pointer, causing a double release and potential use-after-free. Fix this by setting new_stream to NULL after the initial release. Fixes: 9b690ef3c704 ("drm/amd/display: Avoid full modeset when not required") Signed-off-by: WenTao Liang <vulab@iscas.ac.cn> Reviewed-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 99f3af19073b3ddbfd96e789124cce12c4277b28) Cc: stable@vger.kernel.org
3 days	drm/amd/display: Force PWM backlight on Lenovo Legion 5 15ARH05	Alessandro Rinaldi
	The Lenovo Legion 5 15ARH05 (Renoir) ships a BOE 0x08DF eDP panel that advertises AUX/DPCD backlight control, so amdgpu's automatic detection (amdgpu_backlight == -1) selects AUX. On this panel the AUX backlight path has no effect: brightness writes are accepted but the panel level never changes, the display is stuck at a fixed brightness and max_brightness is reported as a bogus 511000. As a result neither the desktop brightness slider nor the brightness hotkeys do anything. Forcing PWM backlight (amdgpu.backlight=0) restores working control: max_brightness becomes 65535 and the level tracks writes. This has long been applied by users as a manual kernel-parameter workaround. Extend the generic panel backlight quirk with a force_pwm flag, add an entry for the Legion 5 15ARH05 / BOE 0x08DF panel, and have amdgpu disable AUX backlight (use PWM) when the quirk matches and the user lets the driver auto-select the backlight type. Signed-off-by: Alessandro Rinaldi <ale@alerinaldi.it> Tested-by: Alessandro Rinaldi <ale@alerinaldi.it> Reviewed-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 81b39f43e7e53589491e2eef6bad5389626b4b9c) Cc: stable@vger.kernel.org
3 days	drm/amdkfd: free MQD managers on DQM init failures	Guangshuo Li
	The change referenced by the Fixes tag releases the HIQ SDMA MQD trunk buffer when device_queue_manager_init() fails after it has been allocated. However, the same failure path can also be reached after init_mqd_managers() has succeeded. At that point dqm->mqd_mgrs[] contains per-type MQD manager objects owned by the device queue manager. The normal teardown path frees those objects from uninitialize(), but the initialization error path only frees dqm itself. Free the MQD managers from the initialization error path as well. This is safe for earlier failures because dqm is zeroed when allocated and init_mqd_managers() clears the entries it rolls back internally. Fixes: b7cccc8286bb ("drm/amdkfd: fix a memory leak in device_queue_manager_init()") Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1fff2e07b6670bc5b8f7344a8708c136259cb176) Cc: stable@vger.kernel.org
3 days	drm/amdgpu/ttm: Consider concurrent VM flushes for buffer entities	Timur Kristóf
	Allow using multiple SDMA schedulers only on GPUs where we are allowed to do concurrent VM flushes. This consideration is necessary because all GART windows are mapped in VMID 0 (the kernel VMID) so each buffer entity would flush VMID 0 concurrently. Practically this means that we can't use multiple SDMA engines for TTM on GFX6-8 and Navi 1x. Fixes: 01c836788b37 ("drm/amdgpu: pass all the sdma scheds to amdgpu_mman") Fixes: e4029f7a9474 ("drm/amdgpu: only use working sdma schedulers for ttm") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8171229bc836607fbc225d323ebc4d14489cfbb)
3 days	drm/amd/pm/smu7: Fix AC/DC switch notification	Timur Kristóf
	There were two mistakes in the previous implementation: The check for AutomaticDCTransition should be inverted. We recently learned that the kernel should send PPSMC_MSG_RunningOnAC when the flag is set, and not the other way around. The clocks also need to be recomputed, because the code in the smu7_apply_state_adjust_rules() function selects different limits on AC and DC. Fixes: 96da0d86614e ("drm/amd/pm/smu7: Notify SMU7 of DC->AC switch") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 516f8fc30a1b56af03f39e93c18707d13419fb1f) Cc: stable@vger.kernel.org
3 days	drm/amdgpu: Disable PCIe dynamic speed switching on Ryzen Pinnacle Ridge	Mario Limonciello
	AMD Ryzen Pinnacle Ridge (Zen+, family 0x17 model 0x08) CPUs have PCI controllers that don't support PCIe dynamic speed switching, causing system freezes during GPU initialization when enabled. Disable dynamic speed switching when this CPU is detected. Assisted-by: Claude:sonnet Fixes: 466a7d115326 ("drm/amd: Use the first non-dGPU PCI device for BW limits") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5436 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://patch.msgid.link/20260709031520.841611-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9ceb4e034a327a04155f32f1cd1a5031dfa5fe02) Cc: stable@vger.kernel.org
3 days	drm/amdgpu: always emit the job vm fence	Alex Deucher
	We need the fence to reemit the gds switch or spm update after a queue reset. Fixes: a17ef941212b ("drm/amdgpu: rework ring reset backup and reemit v9") Cc: timur.kristof@gmail.com Cc: christian.koenig@amd.com Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bc639a9eadc75822f7f15a4315c198a4b5513bd2) Cc: stable@vger.kernel.org
3 days	drm/amd/pm/si: Fix AC/DC switch notification	Timur Kristóf
	There were two mistakes in the previous implementation: The check for ATOM_PP_PLATFORM_CAP_HARDWAREDC should be inverted. We recently learned that the kernel should send PPSMC_MSG_RunningOnAC when the flag is set, and not the other way around. The clocks also need to be recomputed, because the code in the si_apply_state_adjust_rules() function selects different limits on AC and DC. Fixes: 2d071f6457af ("drm/amd/pm/si: Notify the SMC when switching to AC") Tested-by: Jeremy Klarenbeek <jeremy.klarenbeek99@gmail.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 358dd0a9ce66d898fa934887385327547d599d88) Cc: stable@vger.kernel.org
3 days	drm/amd/pm/si: Don't schedule thermal work when queue isn't initialized	Timur Kristóf
	When DPM is turned off with the amdgpu.dpm=0 module parameter, the thermal work queue isn't initialized so we shouldn't schedule any work on it. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bd018d36171a695952c6d391471c279c9e05c8b2)
3 days	drm/amd/display: dce100: skip non-DP stream encoders for DP MST	Andriy Korud
	On DCE8-class ASICs (e.g. Bonaire), the resource pool contains digital DIG stream encoders plus one analog DAC encoder. When assigning a stream encoder for a second DisplayPort MST stream, if the preferred digital encoder is already acquired, dce100_find_first_free_match_stream_enc_for_link() falls back to the first free pool entry. That entry may be the analog encoder, whose funcs table lacks DP hooks such as dp_set_stream_attribute. The subsequent atomic commit then dereferences NULL function pointers in link_set_dpms_on() and crashes. Skip encoders without dp_set_stream_attribute when the stream uses a DP signal (including MST). Use dc_is_dp_signal(stream->signal) for the MST fallback path instead of checking only the link connector signal. Tested on: - GPU: AMD Radeon R7 260X (Bonaire / DCE8) - Board: Supermicro C9X299-PG300 - Setup: DP MST daisy chain, hotplug second monitor or have it connected on boot - Kernel: 7.1.3 (issue observed since 6.19) - Result: kernel oops without patch; dual monitors stable with patch Signed-off-by: Andriy Korud <a.korud@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5162 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 28ec64943e3ee4d9b8d30cea61e380f1429953a8) Cc: stable@vger.kernel.org
3 days	drm/amd/display: Set native cursor mode for disabled CRTCs	Timur Kristóf
	Always set native cursor mode when the CRTC is disabled, to make sure it doesn't cause atomic commits to fail when they are trying to disable the CRTC. Fixes: 41af6215cdbc ("drm/amd/display: Reject cursor plane on DCE when scaled differently than primary") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5432 Cc: Leo Li <sunpeng.li@amd.com> Cc: Michel Dänzer <michel.daenzer@mailbox.org> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Viktor Jägersküpper <viktor_jaegerskuepper@freenet.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2f79f0130f828cf26fe2dcf45291821616af7b47) Cc: stable@vger.kernel.org
3 days	drm/amd/pm/ci: Don't disable MCLK DPM on Bonaire 0x6658 (R7 260X)	Timur Kristóf
	The old radeon driver has a documented workaround in ci_dpm.c which claims that Bonaire 0x6658 with old memory controller firmware is unstable with MCLK DPM, so as a precaution I disabled MCLK DPM on this ASIC in amdgpu. Note that the old MC firmware is not actually used with amdgpu, but in theory it's possible that the VBIOS sets up the ASIC with an old MC firmware that is already running when amdgpu initializes (in which case amdgpu doesn't load its own firmware). What I expected to happen is that the GPU would simply use its maximum memory clock, and indeed this is what seemed to happen according to amdgpu_pm_info which reads the current MCLK value from the SMU. However, some users reported a huge perf regression and upon a closer look it seems that the GPU seems to not actually use the highest MCLK value, despite the SMU reporting that it does. Let's not disable MCLK DPM on Bonaire 0x6658 (R7 260X). Keep MCLK DPM disabled on R9 M380 in the 2015 iMac because that still hangs if we enable it. Fixes: 9851f29cb06c ("drm/amd/pm/ci: Disable MCLK DPM on problematic CI ASICs") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d34acad064ee7d82bd18f5d87592c422d4d323ac) Cc: stable@vger.kernel.org
3 days	drm/amd/display: fix __udivdi3 link error	yanglinlin
	When compiling the AMDGPU display driver for 32-bit architectures, the linker reports undefined reference to `__udivdi3` in functions get_dp_dto_frequency_100hz() and dcn401_get_dp_dto_frequency_100hz(). This is because the code uses 64-bit division (/) on 32-bit systems, which GCC cannot handle directly and instead tries to call the missing __udivdi3 helper function. Replace the raw division with div_u64(), the kernel's standard 64-bit division helper, to avoid the link error. Signed-off-by: Linlin Yang <yanglinlin@kylinos.cn> Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0421fc6ab3a8514e99156ff3c2cee13ee9af3fa7) Cc: stable@vger.kernel.org
3 days	drm/amdgpu: Reserve space for IB contents in devcoredumps	Timur Kristóf
	Currently the contents of IBs are abruptly cut off and don't show the full contents. This patch makes sure to reserve space for those contents too so they may be printed. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e2c0821509fed754e8c31d5053d152fbb3484a5) Cc: stable@vger.kernel.org
3 days	drm/amdgpu: Print vmid, pasid and more task info in devcoredump	Timur Kristóf
	These are in the dmesg logs but are missing from devcoredumps. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fed7aa36d79802c3e02acd05aeae8b0a877e47c2) Cc: stable@vger.kernel.org
3 days	drm/amdgpu: Release VFCT ACPI table reference	Mario Limonciello
	amdgpu_acpi_vfct_bios() fetches the VFCT table with acpi_get_table() but never releases it. acpi_get_table() takes a reference on the table (incrementing its validation_count and mapping it on the 0->1 transition); without a paired acpi_put_table() the mapping is leaked on every call, whether or not a matching VBIOS image is found. Route all exit paths after the table is acquired through a common acpi_put_table(). The VBIOS image is copied out with kmemdup() before the table is released, so it remains valid for the caller. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260708193518.702584-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ca5988682b4cba4cd125a0fa99b2de1239164ae4) Cc: stable@vger.kernel.org
3 days	drm/amdgpu: Fix VFCT bus number matching with soft filter	Mario Limonciello
	On systems where PCI bus renumbering occurs (e.g. pci=realloc, resource conflicts), the runtime bus number may differ from the BIOS POST bus number recorded in the VFCT table. This causes amdgpu_acpi_vfct_bios() to fail finding the VBIOS even though the correct device entry exists. Introduce amdgpu_acpi_vfct_match() which treats the bus number as a soft filter: vendor/device/function identity is the hard requirement, while exact bus match is the preferred path. When bus numbers disagree but device identity matches, accept the VFCT entry and log a dev_notice for diagnostics. Reported-by: Oz Tiram <oz@shift-computing.de> Closes: https://lore.kernel.org/amd-gfx/20260621173211.28443-1-oz@shift-computing.de/ Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260708193518.702584-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 11c141672045ffc0187aa604f2c0f597bc334fb2) Cc: stable@vger.kernel.org
3 days	drm/amdgpu: fix bo->pin leaking in amdgpu_bo_create_reserved	Zhu Lingshan
	amdgpu_bo_create_reserved() only allocates a new BO when bo_ptr (struct amdgpu_bo bo_ptr as input parameter) is NULL, it simply skips creation when bo_ptr is non-NULL. But it unconditionally reserves, pins, gart allocates and maps the BO afterwards. When the same non-NULL BO pointer is passed in again, for example firmware buffers that live in adev and are re-loaded on every resume / cp_resume / start under AMDGPU_FW_LOAD_DIRECT, amdgpu_bo_pin() just increases pin_count unconditionally, however the matching teardown only unpins once, so pin_count never drops to zero, so TTM is not able to move, swap or evict a BO, causing BO leaks. This commit fixes this issue by only pinning the bo once at creation, and repeated calls no longer take additional pin references. Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ddc0ae76202c447b6aec61e907b852bc94671cf) Cc: stable@vger.kernel.org
3 days	drm/amdgpu/userq: fix indefinite fence wait during GPU reset	Jesse Zhang
	pre_reset only force-completes fences of MAPPED queues. A queue in any other state (e.g. mid-eviction) keeps its last_fence pending; after a GPU reset that fence never signals, so the eviction/suspend worker and process teardown (amdgpu_evf_mgr_flush_suspend) wait on it forever and wedge the machine: INFO: task kworker/6:28 blocked for more than 120 seconds. Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] Call Trace: dma_fence_wait_timeout+0x7e/0x130 amdgpu_userq_evict+0x67/0x140 [amdgpu] amdgpu_eviction_fence_suspend_worker+0xd8/0x160 [amdgpu] process_scheduled_works+0xa6/0x420 Force-complete every queue's fence regardless of state. The unmap and mark-hung step stays gated on MAPPED, since unmapping a queue that is not mapped is invalid. Fixes: 290f46cf5726 ("drm/amdgpu: Implement user queue reset functionality") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9102b39fa924dcc3dc75a3137bfa9633c40b88c0) Cc: stable@vger.kernel.org
3 days	drm/amd/display: fix dcn42b det allocation order	Dmytro Laktyushkin
	set_pipe_unlock_order needs to be set to true for the pipes to be unlocked in correct order to avoid det overallocation Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 183bbded999a70c5996e8f399fa8790568d71112)
3 days	drm/amd/display: fix dcn42 det allocation order	Dmytro Laktyushkin
	set_pipe_unlock_order needs to be set to true for the pipes to be unlocked in correct order to avoid det overallocation Reviewed-by: Taimur Hassan <syed.hassan@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 198663d035cc439eb48844a2da66f6ae1b0de303)
3 days	drm/amd/display: Fix backlight max_brightness to match exported range	Mario Limonciello
	[Why] FWTS autobrightness fails on eDP panels because actual_brightness can read higher than the advertised max_brightness (e.g. 63576 vs 62451). The conversion helpers expose the firmware PWM range to userspace as [0..max]. But max_brightness is advertised as (max - min), which is smaller. So reading the level can return a value above max_brightness. This regressed in commit 4b61b8a39051 ("drm/amd/display: Add debugging message for brightness caps"), which changed max_brightness to (max - min) and undid commit 8dbd72cb7900 ("drm/amd/display: Export full brightness range to userspace"). [How] Advertise max_brightness as max, and scale the initial AC/DC brightness against max too. Update the KUnit expectations to match. Fixes: 4b61b8a39051 ("drm/amd/display: Add debugging message for brightness caps") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bd9e2b5b0473c75abc0f4134dfe79ecbfb16610d) Cc: stable@vger.kernel.org
3 days	drm/amd/display: Fix 8K Mode Not Parsed by EDID	Fangzhi Zuo
	[why] The 8K120/8K240 timings live in DisplayID extension blocks 2 and 3 of this EDID. The EDID is a 4-block (512-byte) HDMI 2.1 EDID that uses HF-EEODB. drm core reads and parses this correctly, but amdgpu rebuilds its own copy. Only 2 of 4 blocks were copied into sink->dc_edid, that leads to drm_edid_connector_add_modes() never sees blocks 2 and 3. [how] Directly populate edid_blob_ptr with a blob whose length is the full, and HF-EEODB-aware size. Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 11a90eaf5c808ba800249dda0d481c35d0888589)
3 days	drm/amd/display: Add dp_skip_rbr flag for NUTMEG	Timur Kristóf
	No functional changes. Just clean up a conceptual mismatch. Based on feedback on the NUTMEG code in DC, the preferred_link_setting is meant to force the DP link to a specific setting, meaning both the link rate and lane count should be locked to an exact value. What NUTMEG needs is a lower bound on the link rate, which is not the same concept. Implement this as a HW workaround flag instead. Suggested-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 871ceb853841bcaa4e6cec3723b16c4887a760be) Cc: stable@vger.kernel.org
3 days	drm/amd/display: Fix preferred link rate for NUTMEG	Timur Kristóf
	When there is a preferred link rate setting, it needs to be applied to both the current and initial link rate. This was regressed by a "coding style" fix, which caused the current link rate to not respect the preferred value. This commit restores the functionality of NUTMEG, the DP bridge encoder found on old APUs such as Kaveri. Fixes: a62346043a89 ("drm/amd/display: Fix coding style issue") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5465 Cc: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Reviewed-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e78b0a367f8690b682029d90e75308dc84ed51de) Cc: stable@vger.kernel.org
3 days	drm/amdkfd: fix 32-bit overflow in CWSR total size calculation	Yongqiang Sun
	total_cwsr_size was computed in 32-bit before being used as a BO/SVM allocation size. With large ctx_save_restore_area_size and debug_memory_size multiplied by the XCC count, the product can wrap, yielding an undersized CWSR save area that firmware later overruns. Promote total_cwsr_size to u64 and use check_add_overflow()/ check_mul_overflow() in both kfd_queue_acquire_buffers() and kfd_queue_release_buffers(). Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 319f7e13423ae3f486b9aea82f9ad2d6af0ee608) Cc: stable@vger.kernel.org
3 days	drm/amd/display: Fix DCN42B null registers & register masks	Matthew Stewart
	[why] DCN42B is missing some register masks, which are causing errors in dmesg. [how] Make DCN42B reuse the DCN42 register lists, and add the missing defines manually. Fixes: 64142f9d51af ("drm/amd/display: Fix DCN42 null registers & register masks") Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b7d69145907cdefcbd39a70a31eefd30919af9f1)
3 days	drm/amdgpu/discovery: Fix device family for DCN42	Roman Li
	GC 11.7.0 and 11.7.1 should map to AMDGPU_FAMILY_GC_11_5_4 for DCN42. Fixes: cf591e67c095 ("drm/amdgpu: add support for GC IP version 11.7.0") Fixes: a928d8d81ec5 ("drm/amdgpu: add support for GC IP version 11.7.1") Signed-off-by: Roman Li <Roman.Li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f8ee6447e7ec1d75d6663c817e45566dd01f440b)
3 days	Merge tag 'drm-intel-fixes-2026-07-17' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes Couple of display fixes (NV12 for bigjoiner and Watermark clear on plane disable) along with couple of GT selftests fixes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/alp3ks0K1ZsxUC05@intel.com
3 days	Merge tag 'drm-xe-fixes-2026-07-17' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix BO prefetch with CONSULT_MEM_ADVISE_PREF_LOCK (Himal) - Hold a dma-buf reference for imported BOs (Nitin) - Fix writable override for CRI (Alexander) - Fix VF CCS attach/detach race with in-flight BO moves (Matthew Brost) - Fix WOPCM size for LNL+ (Daniele) - Reset current_op in xe_pt_update_ops_init (Zongyao Bai) - Keep scheduler timeline name alive (Arvind) - Hold device ref until queue teardown completes (Arvind) - Disable display in admin only PF mode (Satyanarayana) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/aln1tRUXZJ_qzD65@fedora
4 days	Merge tag 'soc-fixes-7.2-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "There are only three devicetree fixes this time: one critical memory corruption fix for Renesas and three minor corrections for Tegra. The MAINTAINERS file is updated for a new maintainer of the CIX platform and two address changes. The rest is all driver fixes, mostly firmware: - multiple runtime issues in ARM SCMI and FF-A firmware code, dealing with error handling for corner cases in firmware. - multiple fixes for reset drivers, dealing with individual platform specific mistakes and more error handling - minor build and runtime fixes for the Tegra SoC drivers" * tag 'soc-fixes-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: renesas: ironhide: Describe inline ECC carveouts MAINTAINERS: Update maintainer and git tree for CIX SoC ARM: Don't let ARMv5 platforms select USE_OF MAINTAINERS: Update SpacemiT SoC git tree repository firmware: arm_scmi: Rate-limit queue-full warnings in IRQ context firmware: arm_scmi: Use 64-bit division for clock rate rounding reset: imx7: Correct polarity of MIPI CSI resets on i.MX8MQ reset: sunxi: fix memory region leak on ioremap failure dt-bindings: reset: altr: add COMBOPHY_RESET for Agilex5 reset: spacemit: k3: fix USB2 ahb reset firmware: arm_scmi: Grammar s/may needed/may be needed/ firmware: arm_ffa: Fix NULL dereference in ffa_partition_info_get() firmware: arm_ffa: Respect firmware advertised RX/TX buffer size limits arm64: tegra: Fix CPU1 node unit-address on Tegra264 arm64: tegra: Fix CPU compatible string to cortex-a78ae on Tegra234 MAINTAINERS: .mailmap: update Jens Wiklander's email address soc/tegra: fuse: Fix spurious straps warning on SMCCC platforms soc/tegra: pmc: fix #ifdef block in header drm/tegra: Fix a strange error handling path arm64: tegra: Remove fallback compatible for GPCDMA
5 days	gpu: host1x: Fix use-after-free in host1x_bo_clear_cached_mappings	Mikko Perttunen
	__host1x_bo_unpin() drops the last reference to the mapping and frees it, so we can't dereference mapping afterwards. The cache itself outlives the mapping, so use the cache local variable instead. Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/linux-tegra/ah6ErK6f4kVudVIA@stanley.mountain/T/#u Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260603-host1x-bocache-leak-fix-v1-1-494101dbfd30@nvidia.com
5 days	drm/i915/selftests: Fix GT PM sort comparators	Emre Cecanpunar
	Compare the sampled clock values instead of their addresses. Comparing addresses leaves the samples unsorted, preventing the code from discarding the minimum and maximum samples. Fixes: 1a5392479207 ("drm/i915/selftests: Measure CS_TIMESTAMP") Signed-off-by: Emre Cecanpunar <emreleno@gmail.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20260714220430.238433-1-emreleno@gmail.com (cherry picked from commit 682ea2d28d18bb06f9fc663cb5ab7e80dc0e606a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 days	drm/i915/wm: clear the plane ddb_y entries on plane disable	Vinod Govindapillai
	The UV/Y plane DDB entriess are never cleared on sk_wm_plane_disable_noatomic() and can leave stale DDB state for NV12 planes on pre-Gen11 devices Fixes: d34b59d5ba41 ("drm/i915: Add skl_wm_plane_disable_noatomic()") Assisted-by: Copilot:claude-sonnet-4.6 Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260615203355.218578-2-vinod.govindapillai@intel.com (cherry picked from commit 60f68a6ba298fd1e971a2d91576304bee89a16fc) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 days	drm/xe/pf: Disable display in admin only PF mode	Satyanarayana K V P
	Admin-only PF mode does not expose media or 3D execution capabilities to userspace, so display pipelines cannot receive rendered content. Fixes: d88c4bac8c2a ("drm/xe/pf: Restrict device query responses in admin-only PF mode") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patch.msgid.link/20260714053259.504308-2-satyanarayana.k.v.p@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 7ef55ae582eba2b0a7a7441bd3b9aefd38a26bb9) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe/guc: Hold device ref until queue teardown completes	Arvind Yadav
	GuC exec queue destruction can run asynchronously. If the final device put happens from a destroy worker, drmm cleanup can end up draining the same workqueue and deadlock. Hold a drm_device reference for the queue lifetime and drop it after queue teardown completes. This keeps drmm cleanup from running while async destroy work is still pending. Move GuC destroy work to a module-lifetime Xe workqueue and flush it on PCI remove so hot-unbind/rebind still waits for pending destroy work. With queue-held device refs, guc_submit_sw_fini() cannot run with live GuC IDs. Replace the fini wait with an assertion and remove the unused fini_wq. v2: - Rebase v3: - Switch to queue-lifetime drm_dev_get()/drm_dev_put() model. (Matt) - Queue async teardown on system_dfl_wq instead of xe->destroy_wq. (Matt) - Drop separate deferred drm_dev_put worker. - Remove stale drain_workqueue(xe->destroy_wq) from guc_submit_sw_fini(). v4: - Replace the guc_submit_sw_fini() wait with an assertion and remove the now-unused fini_wq. (sashiko) v5: - Move destroy work to a module-lifetime Xe workqueue instead of system_dfl_wq. (Matt) - Flush the module-lifetime destroy workqueue during PCI remove to preserve the old device-remove wait semantics. v6: - Keep SVM pagemap destroy work on the per-device destroy_wq to avoid letting it outlive the xe_device/drm_device. (Sashiko) - Use WQ_MEM_RECLAIM for xe->destroy_wq because SVM pagemap destroy work can be queued from the reclaim path. v7: - Drop the per-device xe->destroy_wq and use the module-level destroy WQ for SVM pagemap destroy as well. (Matt) - Rename xe_exec_queue_destroy_wq_() helpers to xe_destroy_wq_() helpers because the WQ is no longer exec-queue specific. (Matt) v8: - Rebase. v9: - Keep SVM pagemap destroy work on the per-device WQ_MEM_RECLAIM destroy_wq because it can be queued from reclaim and embeds the dev_pagemap used by devres teardown. (Sashiko) - Keep the module-level destroy WQ GuC-only and drop WQ_MEM_RECLAIM from it. - Update the module-WQ kdoc to document the GuC/SVM split. v10: - Keep xe->destroy_wq per-cpu while adding WQ_MEM_RECLAIM to fix the workqueue allocation warning. v11: - Drop the SVM pagemap destroy comment as it was revision-specific. (Thomas) v12: - Rebase. Fixes: 2d2be279f1ca ("drm/xe: fix UAF around queue destruction") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260716062624.211396-1-arvind.yadav@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit da1124abac689cc2b1d8995e5f0a816f8a122edb) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe/guc: Keep scheduler timeline name alive	Arvind Yadav
	The scheduler keeps a pointer to the timeline name, but q->name is freed with the exec queue while scheduler fences can still reference it. Store the name in struct xe_guc_exec_queue so it shares the scheduler's RCU-deferred lifetime. Fixes: 6bd90e700b42 ("drm/xe: Make dma-fences compliant with the safe access rules") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260714064402.2457257-1-arvind.yadav@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit 41075f0eb5dcbd3b065d15f15ef7bbe9315188e8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe/pt: Reset current_op in xe_pt_update_ops_init()	Zongyao Bai
	xe_pt_update_ops_init() fails to reset current_op to 0. On the vm_bind path, ops_execute() calls xe_pt_update_ops_prepare() inside the xe_validation_guard() / drm_exec_until_all_locked() loop. When that loop retries due to lock contention or OOM eviction (drm_exec_retry_on_contention() / xe_validation_retry_on_oom()), xe_pt_update_ops_prepare() runs again on the same vops, and each call to bind_op_prepare() increments current_op without resetting it. After N retries current_op exceeds the array size allocated by xe_vma_ops_alloc(), causing an out-of-bounds write into SLUB-poisoned memory and a subsequent UAF crash in xe_migrate_update_pgtables_cpu() when reading the corrupted pt_op->bind. Also reset needs_svm_lock and needs_invalidation which are derived in the same prepare pass and would otherwise cause wrong migrate ops selection and redundant TLB invalidation on retry. Fix this by resetting current_op, needs_svm_lock and needs_invalidation in xe_pt_update_ops_init(). v2 (Matt): - Add details in commit message. - Add Fixes tag and Cc to stable@vger.kernel.org Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job") Suggested-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Signed-off-by: Zongyao Bai <zongyao.bai@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260714232433.2737533-1-zongyao.bai@intel.com (cherry picked from commit 046045543e530605c441063535e7dca0075369a6) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe/wopcm: fix WOPCM size for LNL+	Daniele Ceraolo Spurio
	Starting on LNL the WOPCM size is 8MB instead of 4, so we need to avoid using the [0, 8MB) range of the GGTT as that can be unaccessible from the microcontrollers. Note that the proper long-term fix here is to read the WOPCM size from the HW, but that is a more serious rework that would be difficult to backport, so we can do that as a follow-up. Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260713221758.3285744-2-daniele.ceraolospurio@intel.com (cherry picked from commit 3033b0b24ed0e2f5e56bdd4d9c183417c365a45b) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe/vf: Fix VF CCS attach/detach race with in-flight BO moves	Matthew Brost
	xe_bo_move() attaches VF CCS read/write batch buffers (BBs) to a BO after it transitions NULL/SYSTEM -> TT, and detaches them after it transitions TT -> SYSTEM. Both operations were done synchronously on the CPU immediately after building the move's copy/clear fence, without waiting for that fence to signal. This creates two races with VF migration: - Attach happens too late relative to the copy job it is meant to protect. If the copy job is submitted before the CCS BBs are attached, a VF migration event that pauses execution mid-copy can observe partially copied CCS metadata without the attach state needed to correctly save/restore it. - Detach happens too early relative to the copy job that moves data out of TT. The CCS BBs are torn down right after the copy fence is obtained, while the actual blit may still be in flight. A VF migration event that pauses execution mid-copy can then race the save/restore path against the still-running blit, and the CCS BBs it would need to make sense of the paused state have already been removed. Fix both races: - Move the attach call to before the copy/clear job is submitted, so the CCS BBs are already registered by the time the copy runs. On attach failure, unwind and bail out of the move. xe_migrate_ccs_rw_copy() now takes the destination resource explicitly, since bo->ttm.resource is not updated to the new resource until after the move commits. - Detach only after explicitly waiting for the copy fence to signal, instead of tearing down the CCS BBs immediately after obtaining it. While here, also fix xe_sriov_vf_ccs_attach_bo() to properly unwind and propagate errors: the per-context loop previously never broke out on error, silently discarding earlier failures. Unwind by clearing each attached context directly via xe_migrate_ccs_rw_copy_clear() instead of reusing xe_sriov_vf_ccs_detach_bo(), which requires both contexts to be attached before it will clean up either one. Fixes: 864690cf4dd6 ("drm/xe/vf: Attach and detach CCS copy commands with BO") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Assisted-by: GitHub_Copilot:claude-sonnet-5 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260714062440.3421225-1-matthew.brost@intel.com (cherry picked from commit d45ad0aa7a1eb5d7288b5ed948b05695611dc39e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe/nvm: fix writable override for CRI	Alexander Usyskin
	The witable override should be set when FDO_MODE bit is enabled. Fix the comparison to distingush this case from legacy systems where bit should be disabled to have override. Cc: stable@vger.kernel.org Fixes: 9dde74fd9e65 ("drm/xe/nvm: enable cri platform") Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260714-cri_nvm_fdo_flip-v2-1-14580e71b58e@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 2007be18d2318a59748da5da1b8968042213d5f1) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe: Hold a dma-buf reference for imported BOs	Nitin Gote
	An imported dma-buf BO is created as a ttm_bo_type_sg BO whose reservation object is the exporter's dma_buf->resv. The importer, however, only takes a dma-buf reference after a successful dma_buf_dynamic_attach(). Until then nothing keeps the exporter alive, so if the exporter is freed while the BO still references its resv, a later access to that resv is a use-after-free: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b9c Workqueue: ttm ttm_bo_delayed_delete [ttm] RIP: 0010:mutex_can_spin_on_owner+0x3f/0xc0 This can be reached on two paths: - dma_buf_dynamic_attach() fails, or - ttm_bo_init_reserved() fails during BO creation. In both cases the BO already has bo->base.resv pointing at the exporter resv, and sg BOs are always torn down via ttm_bo_delayed_delete(), which locks bo->base.resv asynchronously - potentially after the exporter has been freed. Take the dma-buf reference in xe_bo_init_locked(), before ttm_bo_init_reserved(), so it also covers a creation failure there, and release it in xe_ttm_bo_destroy(). The reference is held for the whole BO lifetime, keeping the shared resv alive on every path. v2: - Reworked the fix to avoid creating the imported sg BO before dma_buf_dynamic_attach() succeeds. - Attach with importer_priv == NULL and make invalidate_mappings ignore incomplete imports. v3: - Dropped the xe-side reordering approach since importer_priv must be valid when dma_buf_dynamic_attach() publishes the attachment. - Per Christian's suggestion on the v1 thread, keyed the check on import_attach rather than removing the sg guard entirely. - Fixes both xe and amdgpu in a single TTM patch. v4: - Moved import_attach check to after dma_resv_copy_fences() so fences are copied before returning for successful imports (Thomas). - Removed exporter-alive claim from commit message (Thomas). v5: - Add drm/xe patch to keep imported sg BOs off the LRU before attach succeeds; the TTM fix alone is not sufficient for xe if the BO is already LRU-visible. (Thomas) v4 patch: https://patchwork.freedesktop.org/patch/736663/?series=169129&rev=2 - Patch 1 (drm/ttm) carries Christian's Reviewed-by from v4. v6: - Reworked the fix based on Thomas' suggestion. Instead of the TTM resv individualization (v1-v5) plus the xe off-LRU/placement handling (v5), just hold a dma-buf reference for the imported BO lifetime so the shared resv can never be freed while the BO still references it. Single xe patch, no TTM change. (Thomas) - Take the reference in xe_bo_init_locked() before ttm_bo_init_reserved() so a TTM creation failure is covered too (Thomas). - Dropped the v5 series (drm/ttm + drm/xe off-LRU); the off-LRU approach also regressed in CI BAT via ttm_bo_pipeline_gutting() creating a ghost BO that outlived the exporter. Link to v5: https://patchwork.freedesktop.org/series/169984/ v7: - Move changelog above --- so it stays in the commit message. - Reorder changelog entries oldest-to-newest. (Thomas) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/8023 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Christian Konig <christian.koenig@amd.com> Cc: Matthew Auld <matthew.auld@intel.com> Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260710191027.260160-2-nitin.r.gote@intel.com (cherry picked from commit 3516f3fae6be35642f8f06f8a218da6425c0306a) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 days	drm/xe/vm: Fix BO prefetch with CONSULT_MEM_ADVISE_PREF_LOC	Himal Prasad Ghimiray
	When prefetch region is DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC for a BO VMA, the code used it as an index into region_to_mem_type[], causing an out-of-bounds access since the value is -1. Resolve the preferred location for BO VMAs directly: local VRAM on dGFX (using the BO's tile placement) or system memory on iGPU. Discovered using AI-assisted static analysis confirmed by Intel Product Security. v2: -Fix null dereference Reported-by: Martin Hodo <martin.hodo@intel.com> Fixes: c1bb69a2e8e2 ("drm/xe/svm: Consult madvise preferred location in prefetch") Cc: Matthew Brost <matthew.brost@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20260624174943.2808767-2-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> (cherry picked from commit d9a4906ac03be9f6ed3f3b45c56c866b867fd75b) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>