linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2026-01-19	Merge tag 'amd-drm-next-6.20-2026-01-16' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.20-2026-01-16: amdgpu: - SR-IOV fixes - Rework SMU mailbox handling - Drop MMIO_REMAP domain - UserQ fixes - MES cleanups - Panel Replay updates - HDMI fixes - Backlight fixes - SMU 14.x fixes - SMU 15 updates amdkfd: - Fix a memory leak - Fixes for systems with non-4K pages - LDS/Scratch cleanup - MES process eviction fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260116202609.23107-1-alexander.deucher@amd.com
2026-01-15	Merge tag 'amd-drm-next-6.20-2026-01-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.20-2026-01-09: amdgpu: - GPUVM updates - Initial support for larger GPU address spaces - Initial SMUIO 15.x support - Documentation updates - Initial PSP 15.x support - Initial IH 7.1 support - Initial IH 6.1.1 support - SMU 13.0.12 updates - RAS updates - Initial MMHUB 3.4 support - Initial MMHUB 4.2 support - Initial GC 12.1 support - Initial GC 11.5.4 support - HDMI fixes - Panel replay improvements - DML updates - DC FP fixes - Initial SDMA 6.1.4 support - Initial SDMA 7.1 support - Userq updates - DC HPD refactor - SwSMU cleanups and refactoring - TTM memory ops parallelization - DCN 3.5 fixes - DP audio fixes - Clang fixes - Misc spelling fixes and cleanups - Initial SDMA 7.11.4 support - Convert legacy DRM logging helpers to new drm logging helpers - Initial JPEG 5.3 support - Add support for changing UMA size via the driver - DC analog fixes - GC 9 gfx queue reset support - Initial SMU 15.x support amdkfd: - Reserved SDMA rework - Refactor SPM - Initial GC 12.1 support - Initial GC 11.5.4 support - Initial SDMA 7.1 support - Initial SDMA 6.1.4 support - Increase the kfd process hash table - Per context support - Topology fixes radeon: - Convert legacy DRM logging helpers to new drm logging helpers - Use devm for i2c adapters - Variable sized array fix - Misc cleanups UAPI: - KFD context support. Proposed userspace: https://github.com/ROCm/rocm-systems/pull/1705 https://github.com/ROCm/rocm-systems/pull/1701 - Add userq metadata queries for more queue types. Proposed userspace: https://gitlab.freedesktop.org/yogeshmohan/mesa/-/commits/userq_query From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260109154713.3242957-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-01-14	drm/amd/display: Add an hdmi_hpd_debounce_delay_ms module	Ivan Lipski
	[Why&How] Right now, the HDMI HPD filter is enabled by default at 1500ms. We want to disable it by default, as most modern displays with HDMI do not require it for DPMS mode. The HPD can instead be enabled as a driver parameter with a custom delay value in ms (up to 5000ms). Fixes: c918e75e1ed9 ("drm/amd/display: Add an HPD filter for HDMI") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4859 Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6a681cd9034587fe3550868bacfbd639d1c6891f)
2026-01-14	drm/amdgpu/userq: Fix fence reference leak on queue teardown v2	Srinivasan Shanmugam
	The user mode queue keeps a pointer to the most recent fence in userq->last_fence. This pointer holds an extra dma_fence reference. When the queue is destroyed, we free the fence driver and its xarray, but we forgot to drop the last_fence reference. Because of the missing dma_fence_put(), the last fence object can stay alive when the driver unloads. This leaves an allocated object in the amdgpu_userq_fence slab cache and triggers This is visible during driver unload as: BUG amdgpu_userq_fence: Objects remaining on __kmem_cache_shutdown() kmem_cache_destroy amdgpu_userq_fence: Slab cache still has objects Call Trace: kmem_cache_destroy amdgpu_userq_fence_slab_fini amdgpu_exit __do_sys_delete_module Fix this by putting userq->last_fence and clearing the pointer during amdgpu_userq_fence_driver_free(). This makes sure the fence reference is released and the slab cache is empty when the module exits. v2: Update to only release userq->last_fence with dma_fence_put() (Christian) Fixes: edc762a51c71 ("drm/amdgpu/userq: move some code around") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8e051e38a8d45caf6a866d4ff842105b577953bb)
2026-01-14	Revert "drm/amdgpu: don't attach the tlb fence for SI"	Prike Liang
	This reverts commit 820b3d376e8a102c6aeab737ec6edebbbb710e04. It’s better to validate VM TLB flushes in the flush‑TLB backend rather than in the generic VM layer. Reverting this patch depends on commit fa7c231fc2b0 ("drm/amdgpu: validate the flush_gpu_tlb_pasid()") being present in the tree. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9163fe4d790fb4e16d6b0e23f55b43cddd3d4a65)
2026-01-14	drm/amdgpu: validate the flush_gpu_tlb_pasid()	Prike Liang
	Validate flush_gpu_tlb_pasid() availability before flushing tlb. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f4db9913e4d3dabe9ff3ea6178f2c1bc286012b8)
2026-01-14	drm/amdgpu: make sure userqs are enabled in userq IOCTLs	Alex Deucher
	These IOCTLs shouldn't be called when userqs are not enabled. Make sure they are enabled before executing the IOCTLs. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d967509651601cddce7ff2a9f09479f3636f684d) Cc: stable@vger.kernel.org
2026-01-14	drm/amdgpu: Use correct address to setup gart page table for vram access	Xiaogang Chen
	Use dst input parameter to setup gart page table entries instead of using fixed location. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ca5d4db8db843be7ed35fc9334737490c2b58d32)
2026-01-14	Revert duplicate "drm/amdgpu: disable peer-to-peer access for DCC-enabled ↵	Peter Colberg
	GC12 VRAM surfaces" This reverts commit 22a36e660d014925114feb09a2680bb3c2d1e279 once, which was merged twice due to an incorrect backmerge resolution. Fixes: ce0478b02ed2 ("Merge tag 'v6.18-rc6' into drm-next") Signed-off-by: Peter Colberg <pcolberg@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 38a0f4cf8c6147fd10baa206ab349f8ff724e391)
2026-01-14	drm/amd: Clean up kfd node on surprise disconnect	Mario Limonciello (AMD)
	When an eGPU is unplugged the KFD topology should also be destroyed for that GPU. This never happens because the fini_sw callbacks never get to run. Run them manually before calling amdgpu_device_ip_fini_early() when a device has already been disconnected. This location is intentionally chosen to make sure that the kfd locking refcount doesn't get incremented unintentionally. Cc: kent.russell@amd.com Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6a23e7b4332c10f8b56c33a9c5431b52ecff9aab) Cc: stable@vger.kernel.org
2026-01-14	drm/amdgpu: fix drm panic null pointer when driver not support atomic	Lu Yao
	When driver not support atomic, fb using plane->fb rather than plane->state->fb. Fixes: fe151ed7af54 ("drm/amdgpu: add generic display panic helper code") Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2f2a72de673513247cd6fae14e53f6c40c5841ef)
2026-01-14	drm/amdgpu: Fix gfx9 update PTE mtype flag	Philip Yang
	Fix copy&paste error, that should have been an assignment instead of an or, otherwise MTYPE_UC 0x3 can not be updated to MTYPE_RW 0x1. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fc1366016abe4103c0f0fac882811aea961ef213) Cc: stable@vger.kernel.org
2026-01-14	drm/amd/display: Add an hdmi_hpd_debounce_delay_ms module	Ivan Lipski
	[Why&How] Right now, the HDMI HPD filter is enabled by default at 1500ms. We want to disable it by default, as most modern displays with HDMI do not require it for DPMS mode. The HPD can instead be enabled as a driver parameter with a custom delay value in ms (up to 5000ms). Fixes: c918e75e1ed9 ("drm/amd/display: Add an HPD filter for HDMI") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4859 Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-14	drm/amdkfd: Add domain parameter to alloc kernel BO	Philip Yang
	To allocate kernel BO from VRAM domain for MQD in the following patch. No functional change because kernel BO allocate all from GTT domain. Rename amdgpu_amdkfd_alloc_gtt_mem to amdgpu_amdkfd_alloc_kernel_mem Rename amdgpu_amdkfd_free_gtt_mem to amdgpu_amdkfd_free_kernel_mem Rename mem_kfd_mem_obj gtt_mem to mem Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-14	drm/amdgpu/userq: Fix fence reference leak on queue teardown v2	Srinivasan Shanmugam
	The user mode queue keeps a pointer to the most recent fence in userq->last_fence. This pointer holds an extra dma_fence reference. When the queue is destroyed, we free the fence driver and its xarray, but we forgot to drop the last_fence reference. Because of the missing dma_fence_put(), the last fence object can stay alive when the driver unloads. This leaves an allocated object in the amdgpu_userq_fence slab cache and triggers This is visible during driver unload as: BUG amdgpu_userq_fence: Objects remaining on __kmem_cache_shutdown() kmem_cache_destroy amdgpu_userq_fence: Slab cache still has objects Call Trace: kmem_cache_destroy amdgpu_userq_fence_slab_fini amdgpu_exit __do_sys_delete_module Fix this by putting userq->last_fence and clearing the pointer during amdgpu_userq_fence_driver_free(). This makes sure the fence reference is released and the slab cache is empty when the module exits. v2: Update to only release userq->last_fence with dma_fence_put() (Christian) Fixes: edc762a51c71 ("drm/amdgpu/userq: move some code around") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-14	Revert "drm/amdgpu: don't attach the tlb fence for SI"	Prike Liang
	This reverts commit 820b3d376e8a102c6aeab737ec6edebbbb710e04. It’s better to validate VM TLB flushes in the flush‑TLB backend rather than in the generic VM layer. Reverting this patch depends on commit fa7c231fc2b0 ("drm/amdgpu: validate the flush_gpu_tlb_pasid()") being present in the tree. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-14	drm/amdgpu: validate the flush_gpu_tlb_pasid()	Prike Liang
	Validate flush_gpu_tlb_pasid() availability before flushing tlb. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-14	drm/amdkfd: kfd driver supports hot unplug/replug amdgpu devices	Xiaogang Chen
	This patch allows kfd driver function correctly when AMD gpu devices got unplug/replug at run time. When an AMD gpu device got unplug kfd driver gracefully terminates existing kfd processes after stops all queues by sending SIGBUS to user process. After that user space can still use remaining AMD gpu devices. When all AMD gpu devices at system got removed kfd driver will not response new requests. Unplugged AMD gpu devices can be re-plugged. kfd driver will use added devices to function as usual. The purpose of this patch is having kfd driver behavior as expected during and after AMD gpu devices unplug/replug at run time. Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10	drm/amdgpu/mes: Simplify hqd mask initialization	Lang Yu
	"adev->mes.compute_hqd_mask[i] = adev->gfx.disable_kq ? 0xF" is actually incorrect for MEC with 8 queues per pipe. Let's get rid of version check and hardcode, calculate hqd mask with number of queues per pipe and number of gfx/compute queues kernel used. Currently, only MEC1 is used for both kernel/user compute queue. To enable other MEC, we need to redistribute queues per pipe and adjust queue resource shared with kfd that needs a separate patch. Just skip other MEC for now to avoid potential issues. v2: Force reserved queues to 0 if kernel queue is explicitly disabled. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10	drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and ↵	Srinivasan Shanmugam
	Timeline Management v7 When GPU memory mappings are updated, the driver returns a fence so userspace knows when the update is finished. The previous refactor could pick the wrong fence or rely on checks that are not safe for GPU mappings that stay valid even when memory is missing. In some cases this could return an invalid fence or cause fence reference counting problems. Fix this by (v5,v6, per Christian): - Starting from the VM’s existing last update fence, so a valid and meaningful fence is always returned even when no new work is required. - Selecting the VM-level fence only for always-valid / PRT mappings using the required combined bo_va + bo guard. - Using the per-BO page table update fence for normal MAP and REPLACE operations. - For UNMAP and CLEAR, returning the fence provided by amdgpu_vm_clear_freed(), which may remain unchanged when nothing needs clearing. - Keeping fence reference counting balanced. v7: Drop the extra bo_va/bo NULL guard since amdgpu_vm_is_bo_always_valid() handles NULL BOs correctly (including PRT). (Christian) This makes VM timeline fences correct and prevents crashes caused by incorrect fence handling. Fixes: bd8150a1b337 ("drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4") Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10	drm/amdgpu: make sure userqs are enabled in userq IOCTLs	Alex Deucher
	These IOCTLs shouldn't be called when userqs are not enabled. Make sure they are enabled before executing the IOCTLs. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10	drm/amdgpu: Slightly simplify base_addr_show()	Christophe JAILLET
	sysfs_emit_at() never returns a negative error code. It returns 0 or the number of characters written in the buffer. Remove the useless tests. This simplifies the logic and saves a few lines of code. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10	drm/amdgpu: Drop MMIO_REMAP domain bit and keep it Internal	Christian König
	"AMDGPU_GEM_DOMAIN_MMIO_REMAP" - Never activated as UAPI and it turned out that this was to inflexible. Allocate the MMIO_REMAP buffer object as a regular GEM BO and explicitly move it into the fixed AMDGPU_PL_MMIO_REMAP placement at the TTM level. This avoids relying on GEM domain bits for MMIO_REMAP, keeps the placement purely internal, and makes the lifetime and pinning of the global MMIO_REMAP BO explicit. The BO is pinned in TTM so it cannot be migrated or evicted. The corresponding free path relies on normal DRM teardown ordering, where no further user ioctls can access the global BO once TTM teardown begins. v2 (Srini): - Updated patch title. - Drop use of AMDGPU_GEM_DOMAIN_MMIO_REMAP in amdgpu_ttm.c. The MMIO_REMAP domain bit is removed from UAPI, so keep the MMIO_REMAP BO allocation domain-less (bp.domain = 0) and rely on the TTM placement (AMDGPU_PL_MMIO_REMAP) for backing/pinning. - Keep fdinfo/mem-stats visibility for MMIO_REMAP by classifying BOs based on bo->tbo.resource->mem_type == AMDGPU_PL_MMIO_REMAP, since the domain bit is removed. v3: Squash patches #1 & #3 Fixes: 056132483724 ("drm/amdgpu/uapi: Introduce AMDGPU_GEM_DOMAIN_MMIO_REMAP") Fixes: 2a7a794eb82c ("drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Leo Liu <leo.liu@amd.com> Cc: Ruijing Dong <ruijing.dong@amd.com> Cc: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10	drm/amdgpu: Use correct address to setup gart page table for vram access	Xiaogang Chen
	Use dst input parameter to setup gart page table entries instead of using fixed location. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10	drm/amdgpu: Skip loading SDMA_RS64 in VF	YuBiao Wang
	VFs use the PF SDMA ucode and are unable to load SDMA_RS64. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Signed-off-by: Victor Skvortsov <Victor.Skvortsov@amd.com> Reviewed-by: Gavin Wan <gavin.wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	Revert duplicate "drm/amdgpu: disable peer-to-peer access for DCC-enabled ↵	Peter Colberg
	GC12 VRAM surfaces" This reverts commit 22a36e660d014925114feb09a2680bb3c2d1e279 once, which was merged twice due to an incorrect backmerge resolution. Fixes: ce0478b02ed2 ("Merge tag 'v6.18-rc6' into drm-next") Signed-off-by: Peter Colberg <pcolberg@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	drm/amd: Clean up kfd node on surprise disconnect	Mario Limonciello (AMD)
	When an eGPU is unplugged the KFD topology should also be destroyed for that GPU. This never happens because the fini_sw callbacks never get to run. Run them manually before calling amdgpu_device_ip_fini_early() when a device has already been disconnected. This location is intentionally chosen to make sure that the kfd locking refcount doesn't get incremented unintentionally. Cc: kent.russell@amd.com Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	drm/amdgpu: Extend psp_skip_tmr for bare-metal and sriov	Hawking Zhang
	In SRIOV, guest drivers no longer setup/destory VMR starting from mp0 v11_0_7. In bare-metal, if boot-time TMR is enabled, some generation (e.g., mp0 v13_0_x) don’t need runtime TMR allocation but still require SETUP_TMR command with tmr address 0 for backward compatibility. some newer generations require neither SETUP_TMR nor DESTROY_TMR and will return errors if they are sent. Driver relies on boot_time_tmr and autoload_supported to handle these cases correctly. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	drm/amdgpu: Add helper to alloc GART entries	Philip Yang
	Add helper amdgpu_gtt_mgr_alloc/free_entries, define GART_ENTRY_WITHOUT_BO_COLOR color for GART node not allocated with GTT bo, then amdgpu_gtt_mgr_recover skip those mm_node. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	drm/amdgpu: fix drm panic null pointer when driver not support atomic	Lu Yao
	When driver not support atomic, fb using plane->fb rather than plane->state->fb. Fixes: fe151ed7af54 ("drm/amdgpu: add generic display panic helper code") Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	drm/amd: Enable SMU 15_0_0 support	Pratik Vishwakarma
	Add SMU 15_0_0 v2: rebase (Alex) v3: fix clang build (Alex) Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	drm/amd: Enable SMUIO 15_0_0 support	Pratik Vishwakarma
	Add SMUIO 15_0_0. Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-08	drm/amdgpu: Fix gfx9 update PTE mtype flag	Philip Yang
	Fix copy&paste error, that should have been an assignment instead of an or, otherwise MTYPE_UC 0x3 can not be updated to MTYPE_RW 0x1. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-07	Reapply "Revert "drm/amd: Skip power ungate during suspend for VPE""	Mario Limonciello (AMD)
	Skipping power ungate exposed some scenarios that will fail like below: ``` amdgpu: Register(0) [regVPEC_QUEUE_RESET_REQ] failed to reach value 0x00000000 != 0x00000001n amdgpu 0000:c1:00.0: amdgpu: VPE queue reset failed ... amdgpu: [drm] ERROR wait_for_completion_timeout timeout! ``` The underlying s2idle issue that prompted this commit is going to be fixed in BIOS. This reverts commit 2a6c826cfeedd7714611ac115371a959ead55bda. This was lost in the 6.19 merge so reapply it. Fixes: 2a6c826cfeed ("drm/amd: Skip power ungate during suspend for VPE") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Konstantin <answer2019@yandex.ru> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220812 Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3925683515e93844be204381d2d5a1df5de34f31)
2026-01-07	drm/amd/pm: Disable MMIO access during SMU Mode 1 reset	Perry Yuan
	During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily inaccessible via PCIe. Any attempt to access MMIO registers during this window (e.g., from interrupt handlers or other driver threads) can result in uncompleted PCIe transactions, leading to NMI panics or system hangs. To prevent this, set the `no_hw_access` flag to true immediately after triggering the reset. This signals other driver components to skip register accesses while the device is offline. A memory barrier `smp_mb()` is added to ensure the flag update is globally visible to all cores before the driver enters the sleep/wait state. Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7edb503fe4b6d67f47d8bb0dfafb8e699bb0f8a4)
2026-01-05	drm/amdgpu: Fix query for VPE block_type and ip_count	Alan Liu
	[Why] Query for VPE block_type and ip_count is missing. [How] Add VPE case in ip_block_type and hw_ip_count query. Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a6ea0a430aca5932b9c75d8e38deeb45665dd2ae) Cc: stable@vger.kernel.org
2026-01-05	drm/amd/amdgpu: Fix SMU warning during isp suspend-resume	Pratap Nirujogi
	ISP mfd child devices are using genpd and the system suspend-resume operations between genpd and amdgpu parent device which uses only runtime suspend-resume are not in sync. Linux power manager during suspend-resume resuming the genpd devices earlier than the amdgpu parent device. This is resulting in the below warning as SMU is in suspended state when genpd attempts to resume ISP. WARNING: CPU: 13 PID: 5435 at drivers/gpu/drm/amd/amdgpu/../pm/swsmu/amdgpu_smu.c:398 smu_dpm_set_power_gate+0x36f/0x380 [amdgpu] To fix this warning isp suspend-resume is handled as part of amdgpu parent device suspend-resume instead of genpd sequence. Each ISP MFD child device is marked as dev_pm_syscore_device to skip genpd suspend-resume and use pm_runtime_force api's to suspend-resume the devices when callbacks from amdgpu are received. Co-developed-by: Gjorgji Rosikopulos <grosikop@amd.com> Signed-off-by: Gjorgji Rosikopulos <grosikop@amd.com> Signed-off-by: Bin Du <bin.du@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0288a345f19b2162546352161509bb24614729e1)
2026-01-05	drm/amdgpu: always backup and reemit fences	Alex Deucher
	If when we backup the ring contents for reemit before a ring reset, we skip jobs associated with the bad context, however, we need to make sure the fences are reemited as unprocessed submissions may depend on them. v2: clean up fence handling, make helpers static Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 155a748f14bc0b72783994dea7c5a12276730342)
2026-01-05	drm/amdgpu: don't reemit ring contents more than once	Alex Deucher
	If we cancel a bad job and reemit the ring contents, and we get another timeout, cancel everything rather than reemitting. The wptr markers are only relevant for the original emit. If we reemit, the wptr markers are no longer correct. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fb62a2067ca4555a6572d911e05919a311c010aa)
2026-01-05	drm/amdgpu: update outdated comment	Julia Lawall
	The function amdgpu_amdkfd_gpuvm_import_dmabuf() was split into import_obj_create() and amdgpu_amdkfd_gpuvm_import_dmabuf_fd() in commit 0188006d7c79 ("drm/amdkfd: Import DMABufs for interop through DRM"). import_obj_create() now does the allocation for the mem variable discussed in the comment. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amd/pm: Disable MMIO access during SMU Mode 1 reset	Perry Yuan
	During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily inaccessible via PCIe. Any attempt to access MMIO registers during this window (e.g., from interrupt handlers or other driver threads) can result in uncompleted PCIe transactions, leading to NMI panics or system hangs. To prevent this, set the `no_hw_access` flag to true immediately after triggering the reset. This signals other driver components to skip register accesses while the device is offline. A memory barrier `smp_mb()` is added to ensure the flag update is globally visible to all cores before the driver enters the sleep/wait state. Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and ↵	Srinivasan Shanmugam
	Timeline Management v4 This commit simplifies the amdgpu_gem_va_ioctl function, key updates include: - Moved the logic for managing the last update fence directly into amdgpu_gem_va_update_vm. - Introduced checks for the timeline point to enable conditional replacement or addition of fences. v2: Addressed review comments from Christian. v3: Updated comments (Christian). v4: The previous version selected the fence too early and did not manage its reference correctly, which could lead to stale or freed fences being used. This resulted in refcount underflows and could crash when updating GPU timelines. The fence is now chosen only after the VA mapping work is completed, and its reference is taken safely. After exporting it to the VM timeline syncobj, the driver always drops its local fence reference, ensuring balanced refcounting and avoiding use-after-free on dma_fence. Crash signature: [ 205.828135] refcount_t: underflow; use-after-free. [ 205.832963] WARNING: CPU: 30 PID: 7274 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110 ... [ 206.074014] Call Trace: [ 206.076488] <TASK> [ 206.078608] amdgpu_gem_va_ioctl+0x6ea/0x740 [amdgpu] [ 206.084040] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [ 206.089994] drm_ioctl_kernel+0x86/0xe0 [drm] [ 206.094415] drm_ioctl+0x26e/0x520 [drm] [ 206.098424] ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu] [ 206.104402] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu] [ 206.109387] __x64_sys_ioctl+0x96/0xe0 [ 206.113156] do_syscall_64+0x66/0x2d0 ... [ 206.553351] BUG: unable to handle page fault for address: ffffffffc0dfde90 ... [ 206.553378] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0 ... [ 206.553405] Call Trace: [ 206.553409] <IRQ> [ 206.553415] ? __pfx_drm_sched_fence_free_rcu+0x10/0x10 [gpu_sched] [ 206.553424] dma_fence_signal+0x30/0x60 [ 206.553427] drm_sched_job_done.isra.0+0x123/0x150 [gpu_sched] [ 206.553434] dma_fence_signal_timestamp_locked+0x6e/0xe0 [ 206.553437] dma_fence_signal+0x30/0x60 [ 206.553441] amdgpu_fence_process+0xd8/0x150 [amdgpu] [ 206.553854] sdma_v4_0_process_trap_irq+0x97/0xb0 [amdgpu] [ 206.554353] edac_mce_amd(E) ee1004(E) [ 206.554270] amdgpu_irq_dispatch+0x150/0x230 [amdgpu] [ 206.554702] amdgpu_ih_process+0x6a/0x180 [amdgpu] [ 206.555101] amdgpu_irq_handler+0x23/0x60 [amdgpu] [ 206.555500] __handle_irq_event_percpu+0x4a/0x1c0 [ 206.555506] handle_irq_event+0x38/0x80 [ 206.555509] handle_edge_irq+0x92/0x1e0 [ 206.555513] __common_interrupt+0x3e/0xb0 [ 206.555519] common_interrupt+0x80/0xa0 [ 206.555525] </IRQ> [ 206.555527] <TASK> ... [ 206.555650] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0 ... [ 206.555667] Kernel panic - not syncing: Fatal exception in interrupt Link: https://patchwork.freedesktop.org/patch/654669/ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amdgpu: only check critical address when it is not reserved	Gangliang Xie
	when an address is reserved already, no need to check if it is in critical or not, to save time Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amdgpu: Fix query for VPE block_type and ip_count	Alan Liu
	[Why] Query for VPE block_type and ip_count is missing. [How] Add VPE case in ip_block_type and hw_ip_count query. Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amdgpu/gfx9: Implement KGQ ring reset	Alex Deucher
	GFX ring resets work differently on pre-GFX10 hardware since there is no MQD managed by the scheduler. For ring reset, you need issue the reset via CP_VMID_RESET via KIQ or MMIO and submit the following to the gfx ring to complete the reset: 1. EOP packet with EXEC bit set 2. WAIT_REG_MEM to wait for the fence 3. Clear CP_VMID_RESET to 0 4. EVENT_WRITE ENABLE_LEGACY_PIPELINE 5. EOP packet with EXEC bit set 6. WAIT_REG_MEM to wait for the fence Once those commands have completed the reset should be complete and the ring can accept new packets. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Jiqian Chen <Jiqian.Chen@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amdgpu/gfx9: rework pipeline sync packet sequence	Alex Deucher
	Replace WAIT_REG_MEM with EVENT_WRITE flushes for all shader types and ACQUIRE_MEM. That should accomplish the same thing and avoid having to wait on a fence preventing any issues with pipeline syncs during queue resets. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amdgpu: avoid a warning in timedout job handler	Alex Deucher
	Only set an error on the fence if the fence is not signalled. We can end up with a warning if the per queue reset path signals the fence and sets an error as part of the reset, but fails to recover. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amd/amdgpu: Fix SMU warning during isp suspend-resume	Pratap Nirujogi
	ISP mfd child devices are using genpd and the system suspend-resume operations between genpd and amdgpu parent device which uses only runtime suspend-resume are not in sync. Linux power manager during suspend-resume resuming the genpd devices earlier than the amdgpu parent device. This is resulting in the below warning as SMU is in suspended state when genpd attempts to resume ISP. WARNING: CPU: 13 PID: 5435 at drivers/gpu/drm/amd/amdgpu/../pm/swsmu/amdgpu_smu.c:398 smu_dpm_set_power_gate+0x36f/0x380 [amdgpu] To fix this warning isp suspend-resume is handled as part of amdgpu parent device suspend-resume instead of genpd sequence. Each ISP MFD child device is marked as dev_pm_syscore_device to skip genpd suspend-resume and use pm_runtime_force api's to suspend-resume the devices when callbacks from amdgpu are received. Co-developed-by: Gjorgji Rosikopulos <grosikop@amd.com> Signed-off-by: Gjorgji Rosikopulos <grosikop@amd.com> Signed-off-by: Bin Du <bin.du@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	drm/amdgpu: use dma_fence_get_status() for adapter reset	Alex Deucher
	We need to check if the fence was signaled without an error as the per queue resets may have signalled the fence while attempting to reset the queue. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05	Documentation/amdgpu: Add UMA carveout details	Yo-Jung Leo Lin (AMD)
	Add documentation for the uma/carveout_options and uma/carveout attributes in sysfs Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yo-Jung Leo Lin (AMD) <Leo.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>