linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
5 days	drm/amdgpu: cap GTT size to physical RAM on APUs	Harkirat Gill
	On APUs, the GTT pool is backed by system RAM, but its size is not bound to the non-carveout memory that actually backs it. A user can end up with GTT + VRAM exceeding total physical memory through the following sequence: - Have a large non-carveout memory space (~128GB) and accordingly set a large GTT (~100GB) via the ttm module parameter. - Lower the non-carveout memory space in BIOS by increasing the UMA Frame Buffer Size (VRAM) to 64GB. - The previously set GTT value (~100GB) persists, even though the new non-carveout space (64GB) can no longer back it. This leads to a case where kernel reports GTT (100GB) + VRAM (64GB) despite the sum being greater than total physical memory (128GB). Cap the GTT size to totalram_pages() on APUs. totalram_pages() already excludes the VRAM carveout, so the resulting GTT can never exceed the system RAM that actually backs it. Signed-off-by: Harkirat Gill <harkirat.gill@amd.com> Reviewed-by: David Francis <David.Francis@amd.com> Assisted-by: Claude:claude-opus-4 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5dafdd649280c7dc6c22c8f877da3f54fcc441e1) Cc: stable@vger.kernel.org
5 days	drm/amd/pm: use milliwatts for GPU power sensors	Yang Wang
	GPU average and input power backends report a mix of whole watts, milliwatts, Q24.8 watts and decimal-packed fractions. Q24.8 is inherited from the legacy PowerPlay sensor format. Milliwatts are a more natural unit for the hwmon and pm_info consumers in amdgpu_pm.c. A common decoder cannot distinguish these formats, and converting native milliwatts through Q24.8 also loses precision. Use milliwatts as the internal unit across all PPT and PowerPlay backends. Decode Q24.8 only at the legacy smu7 input boundary and encode it only for the raw amdgpu_sensors debugfs interface. This gives hwmon, pm_info and the sensor ioctl one unambiguous unit while preserving the format used by UMR. Fixes: 5b79d0482f3c ("drm/amd/pp: Remove struct pp_gpu_power") Fixes: 01992b121fb6 ("drm/amd/pm: fix amdgpu_pm_info power display units") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reported-by: Lars Nieradzik <l.nieradzik@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 757ba0790bafec47a507e9662bf380f2e027d420) Cc: stable@vger.kernel.org
5 days	drm/amdgpu: restore UMD profile pstate after runtime resume	Candice Li
	Runtime suspend runs GFX hw_fini and clears perfmon clock gating while the UMD profile DPM level remains set in software. Re-apply stable pstate after a successful runtime resume when a profile mode is active. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 138531c8850cc247aa12b104bb29ea387bcdcbb1) Cc: stable@vger.kernel.org
5 days	drm/amdgpu: enable mode2 reset for SMU IP v15.0.5	Kanala Ramalingeswara Reddy
	Set the default reset method to mode2 for SMU 15.0.5. Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com> Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 314d49abe315cd0d0a872a43f68f08be43a305c8)
5 days	drm/amdgpu: Fix NBIO 7.11.5 offsets	Shubhankar Milind Sardeshpande
	Fix NBIO 7.11.5 related offsets Signed-off-by: Shubhankar Milind Sardeshpande <Shubhankar.MilindSardeshpande@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dcc27ae3092211c913a1bea04618c4faf1234d48)
5 days	drm/amdgpu: Enable support for PSP 15_0_5	Shubhankar Milind Sardeshpande
	Add PSP 15.0.5 related offsets for GFX to KMD interface and enable support for it. Co-developed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Shubhankar Milind Sardeshpande <Shubhankar.MilindSardeshpande@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b01e244c82c5d033d7424a64abe4079f3fceb869)
5 days	drm/amdgpu: move debug_vm handling to amdgpu_cs_parser_fini	Pierre-Eric Pelloux-Prayer
	The commit referenced below restarts the CS if the validation is still in progress. When debug_vm is enabled, all BOs from the CS are invalidated so we will hit an infinite loop. To avoid that, defer BO invalidation to amdgpu_cs_parser_fini. Fixes: 59720bfd8c6d ("drm/amdgpu: restart the CS if some parts of the VM are still invalidated") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8c990ee9daa295462df24982ce6878db997a380a) Cc: stable@vger.kernel.org
5 days	drm/amdgpu: Update message IDs to PMFW to correctly gather GFXOFF residency logs	Fares Soliman
	Updates PPSMC_MSGs and set/get functions for gathering GFXOFF logs on Van Gogh. Logs are now gathered live rather than starting then stopping logging and reading an average value afterwards. This is in accordance to changes made in PMFW. In regards to messageID 0x52, the old interface uses a start/stop parameter, and the new one doesn't. The firmware is checked to determine which method to use. v2: added firmware guard to new interface, old interface kept as fallback Signed-off-by: Fares Soliman <Fares.Soliman@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 482e2cadea8c34ae4e733f269a640d6b04fc9262)
5 days	drm/amdgpu: Pack nested ucode_info struct	Alex Hung
	Building for ARCH=um with W=1 C=1 makes the "amd_sriov_msg_vf2pf_info must be 1 KB" static assertion in amdgv_sriovmsg.h fail under sparse, exposed after UML builds were enabled. Sparse does not honor #pragma pack(push, 1) for the nested ucode_info struct, so it sizes each element as 8 bytes instead of 5 and computes the surrounding structure as larger than 1 KB. The compilers get this right via the enclosing pragma, but the annotation should be explicit. Fixes: af3f2f5db265 ("drm/amdgpu: Remove UML build exclusion from Kconfig") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202607091659.SHEscT0c-lkp@intel.com/ Cc: Harry Wentland <harry.wentland@amd.com> Assisted-by: Copilot:Claude-Opus-4.8 Signed-off-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1d8cfeb69daa863a70134b8ed6df8055c418a5b0)
5 days	drm/amdgpu: Fix __rcu fence pointer accesses	Alex Hung
	Building for ARCH=um with W=1 C=1 makes sparse report "incompatible types in comparison expression (different address spaces)" warnings in the KFD code, exposed after UML builds were enabled: - amdgpu_amdkfd_fence.c compares the __rcu-annotated dma_fence.ops pointer directly in to_amdgpu_amdkfd_fence(). - amdgpu_amdkfd_gpuvm.c compares the __rcu eviction fence pointer directly in amdgpu_amdkfd_gpuvm_restore_process_bos(). Fixes: af3f2f5db265 ("drm/amdgpu: Remove UML build exclusion from Kconfig") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202607091659.SHEscT0c-lkp@intel.com/ Cc: Harry Wentland <harry.wentland@amd.com> Assisted-by: Copilot:Claude-Opus-4.8 Signed-off-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 764f241ad227bb942e5b0b8b4d9898f1a4175605)
5 days	drm/amdgpu: skip clearing empty freed VM list on GEM close	Bob Zhou
	amdgpu_vm_clear_freed() allocates an amdgpu_sync object and walks the VM reservation fences via amdgpu_sync_resv() before checking whether vm->freed has anything to clear. Return early when the list is empty to skip this overhead on a hot path (every GEM close and command submission). Signed-off-by: Bob Zhou <bobzhou2@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8ba869e852d4f1b1c0e5ae9225c77f7ceccbe056)
5 days	drm/amdgpu: dont pin wptr bo instead use eviction fence	Sunil Khatri
	Instead of pinning the wptr bo attach the eviction fence to the bo to make sure it remains valid all the time. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7264bc10c7c657a54603c7fc058bf8e15f18ce12)
5 days	drm/amdgpu : update mmhub eco sec lvl for vcn5_3	Suresh Guttula
	This patch requests PSP to set the sec lvl for vcn and jpeg. Signed-off-by: Suresh Guttula <suresh.guttula@amd.com> Reviewed-by: McRae Geoffrey<Geoffrey.McRae@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4c8b8472f85a730a6853ab68474f210143f42b5a)
2026-07-17	drm/amdgpu/ttm: Consider concurrent VM flushes for buffer entities	Timur Kristóf
	Allow using multiple SDMA schedulers only on GPUs where we are allowed to do concurrent VM flushes. This consideration is necessary because all GART windows are mapped in VMID 0 (the kernel VMID) so each buffer entity would flush VMID 0 concurrently. Practically this means that we can't use multiple SDMA engines for TTM on GFX6-8 and Navi 1x. Fixes: 01c836788b37 ("drm/amdgpu: pass all the sdma scheds to amdgpu_mman") Fixes: e4029f7a9474 ("drm/amdgpu: only use working sdma schedulers for ttm") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8171229bc836607fbc225d323ebc4d14489cfbb)
2026-07-17	drm/amdgpu: Disable PCIe dynamic speed switching on Ryzen Pinnacle Ridge	Mario Limonciello
	AMD Ryzen Pinnacle Ridge (Zen+, family 0x17 model 0x08) CPUs have PCI controllers that don't support PCIe dynamic speed switching, causing system freezes during GPU initialization when enabled. Disable dynamic speed switching when this CPU is detected. Assisted-by: Claude:sonnet Fixes: 466a7d115326 ("drm/amd: Use the first non-dGPU PCI device for BW limits") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5436 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://patch.msgid.link/20260709031520.841611-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9ceb4e034a327a04155f32f1cd1a5031dfa5fe02) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu: always emit the job vm fence	Alex Deucher
	We need the fence to reemit the gds switch or spm update after a queue reset. Fixes: a17ef941212b ("drm/amdgpu: rework ring reset backup and reemit v9") Cc: timur.kristof@gmail.com Cc: christian.koenig@amd.com Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bc639a9eadc75822f7f15a4315c198a4b5513bd2) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu: Reserve space for IB contents in devcoredumps	Timur Kristóf
	Currently the contents of IBs are abruptly cut off and don't show the full contents. This patch makes sure to reserve space for those contents too so they may be printed. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e2c0821509fed754e8c31d5053d152fbb3484a5) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu: Print vmid, pasid and more task info in devcoredump	Timur Kristóf
	These are in the dmesg logs but are missing from devcoredumps. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fed7aa36d79802c3e02acd05aeae8b0a877e47c2) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu: Release VFCT ACPI table reference	Mario Limonciello
	amdgpu_acpi_vfct_bios() fetches the VFCT table with acpi_get_table() but never releases it. acpi_get_table() takes a reference on the table (incrementing its validation_count and mapping it on the 0->1 transition); without a paired acpi_put_table() the mapping is leaked on every call, whether or not a matching VBIOS image is found. Route all exit paths after the table is acquired through a common acpi_put_table(). The VBIOS image is copied out with kmemdup() before the table is released, so it remains valid for the caller. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260708193518.702584-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ca5988682b4cba4cd125a0fa99b2de1239164ae4) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu: Fix VFCT bus number matching with soft filter	Mario Limonciello
	On systems where PCI bus renumbering occurs (e.g. pci=realloc, resource conflicts), the runtime bus number may differ from the BIOS POST bus number recorded in the VFCT table. This causes amdgpu_acpi_vfct_bios() to fail finding the VBIOS even though the correct device entry exists. Introduce amdgpu_acpi_vfct_match() which treats the bus number as a soft filter: vendor/device/function identity is the hard requirement, while exact bus match is the preferred path. When bus numbers disagree but device identity matches, accept the VFCT entry and log a dev_notice for diagnostics. Reported-by: Oz Tiram <oz@shift-computing.de> Closes: https://lore.kernel.org/amd-gfx/20260621173211.28443-1-oz@shift-computing.de/ Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260708193518.702584-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 11c141672045ffc0187aa604f2c0f597bc334fb2) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu: fix bo->pin leaking in amdgpu_bo_create_reserved	Zhu Lingshan
	amdgpu_bo_create_reserved() only allocates a new BO when bo_ptr (struct amdgpu_bo bo_ptr as input parameter) is NULL, it simply skips creation when bo_ptr is non-NULL. But it unconditionally reserves, pins, gart allocates and maps the BO afterwards. When the same non-NULL BO pointer is passed in again, for example firmware buffers that live in adev and are re-loaded on every resume / cp_resume / start under AMDGPU_FW_LOAD_DIRECT, amdgpu_bo_pin() just increases pin_count unconditionally, however the matching teardown only unpins once, so pin_count never drops to zero, so TTM is not able to move, swap or evict a BO, causing BO leaks. This commit fixes this issue by only pinning the bo once at creation, and repeated calls no longer take additional pin references. Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ddc0ae76202c447b6aec61e907b852bc94671cf) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu/userq: fix indefinite fence wait during GPU reset	Jesse Zhang
	pre_reset only force-completes fences of MAPPED queues. A queue in any other state (e.g. mid-eviction) keeps its last_fence pending; after a GPU reset that fence never signals, so the eviction/suspend worker and process teardown (amdgpu_evf_mgr_flush_suspend) wait on it forever and wedge the machine: INFO: task kworker/6:28 blocked for more than 120 seconds. Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] Call Trace: dma_fence_wait_timeout+0x7e/0x130 amdgpu_userq_evict+0x67/0x140 [amdgpu] amdgpu_eviction_fence_suspend_worker+0xd8/0x160 [amdgpu] process_scheduled_works+0xa6/0x420 Force-complete every queue's fence regardless of state. The unmap and mark-hung step stays gated on MAPPED, since unmapping a queue that is not mapped is invalid. Fixes: 290f46cf5726 ("drm/amdgpu: Implement user queue reset functionality") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9102b39fa924dcc3dc75a3137bfa9633c40b88c0) Cc: stable@vger.kernel.org
2026-07-17	drm/amdgpu/discovery: Fix device family for DCN42	Roman Li
	GC 11.7.0 and 11.7.1 should map to AMDGPU_FAMILY_GC_11_5_4 for DCN42. Fixes: cf591e67c095 ("drm/amdgpu: add support for GC IP version 11.7.0") Fixes: a928d8d81ec5 ("drm/amdgpu: add support for GC IP version 11.7.1") Signed-off-by: Roman Li <Roman.Li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f8ee6447e7ec1d75d6663c817e45566dd01f440b)
2026-07-10	Merge tag 'amd-drm-fixes-7.2-2026-07-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.2-2026-07-09: amdgpu: - PSP 15.0.9 update - SMU 15.0.9 update - VCN 5.3 fix - VI ASPM fix - Userq fix - lifetime fix for amdgpu_vm_get_task_info_pasid() - Gfx10 fix - SMU 14 fix amdkfd: - CRIU bounds checking fixes - secondary context id fix - Event bounds checking fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260709212303.15913-1-alexander.deucher@amd.com
2026-07-10	Merge tag 'drm-misc-fixes-2026-07-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v7.2-rc3: - Fix uaf in amdxdna mmap failure path. - A lot of deadlocks, access races and return value fixes in amdxdna. - Fix analogix_dp bitshifts during link training. - Use direct label in drm_exec. - Fix absent indirect bo handling in v3d. - Sync on first active crtc in fb_dirty, rather than first crtc. - Rework try_harder in the buddy allocator. - Make imagination function static to solve compiler warning. - Fix imagination error checking. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/71e5b48b-307f-47f5-8fd5-b60ea43e4196@linux.intel.com
2026-07-08	drm/gfx10: Program DB_RING_CONTROL	Alex Deucher
	This is needed to allocate occlusion counters across both gfx pipes. Fixes: b7a1a0ef12b8 ("drm/amd/amdgpu: add pipe1 hardware support") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6807352cbabb74b61ba42888769283af72191f66) Cc: stable@vger.kernel.org
2026-07-08	drm/amdgpu: fix lifetime issue of amdgpu_vm_get_task_info_pasid()	Shahyan Soltani
	The vm pointer returned from amdgpu_vm_get_vm_from_pasid() is only valid while the lock is still being held. Once xa_unlock_irqrestore is called and returned, the pointer is no longer under lock and is subject to modification. Since, the caller still dereferences vm->task_info in amdgpu_vm_get_task_info_vm() after the lock is removed, this causes a use after unlock problem. Remove the lifetime issue present in amdgpu_vm_get_task_info_pasid() through removing the amdgpu_vm_get_vm_from_pasid() function from amdgpu_vm.c and making the relevant code inline to hold the lock while it is still in use. Signed-off-by: Shahyan Soltani <shahyan.soltani@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9d01579f3f868b333acc901815972685989092c7) Cc: stable@vger.kernel.org
2026-07-08	drm/amdgpu: trigger GPU recovery when userq destroy fails to unmap a hung queue	Jesse Zhang
	Destroying a hung user queue issues a MES REMOVE_QUEUE that times out, The destroy path only logged the error and freed the queue, so the next userq submission failed and forced a GPU reset attributed to an innocent workload. Kick the userq reset work when unmap fails so the GPU is recovered at destroy time. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8396b9de4198a54ec4760a94a179347540a9764d) Cc: stable@vger.kernel.org
2026-07-08	drm/amd/amdgpu: disable ASPM on VI if pcie dpm is disabled	Kenneth Feng
	Disable ASPM on VI if PCIE dpm is disabled. Fixes: bb00bf17328d ("drm/amd/amdgpu: decouple ASPM with pcie dpm") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5370 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 873a8d6b3c0a386408c891e4ff1c684fa11783e1) Cc: stable@vger.kernel.org
2026-07-08	drm/amdgpu: Disable JDPG on VCN5_3	Suresh Guttula
	JDPG does not support on VCN5 This patch will disable JDPG, because DPG is not correctly copying the JRBC Read/Write Pointers (R/WPTR) from the PG (Power Gating) block to JRBC. Signed-off-by: Suresh Guttula <suresh.guttula@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ea3fdd1eda088030d8925f023613728969f55955)
2026-07-08	drm/amdgpu: add support for SMU version 15.0.9	Kanala Ramalingeswara Reddy
	Initialize SMU Version 15_0_9 Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com> Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1dfd4e84b5beec353a81d61af9eaf4e5a56e0c57)
2026-07-08	drm/amdgpu: add support for PSP version 15.0.9	Kanala Ramalingeswara Reddy
	Initialize PSP Version 15_0_9 Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com> Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ef71f00173228904763552b7405169023f8034a8)
2026-07-07	drm/drm_exec: avoid indirect goto	Christian König
	The drm_exec component uses a variable with scope limited to the for() and an indirect goto to allow instantiating multiple macros in the same function. This unfortunately doesn't work well with certain compilers when the indirect goto can't be lowered to a direct jump. Switch the indirect goto to a direct goto, the drawback is that we now can't use the dma_exec_until_all_locked() macro in the same function multiple times. The is currently only one user of this and only as a hacky workaround which is about to be removed. So document that the __label__ statement should be used when the macro is used multiple times and fix the tests and the only use case where that is necessary. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 9920249a5288 ("drm/amdgpu: convert amdgpu_vm_lock_by_pasid() to drm_exec") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202606231854.7LeCtlLe-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606232356.gwHMAJAW-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606240753.kYjobJVl-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606241110.iUga5vVw-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607031446.1PWG18mN-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607031837.HSmBj8pr-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607040159.GopyEswS-lkp@intel.com/ Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://lore.kernel.org/r/20260704084133.122053-1-christian.koenig@amd.com
2026-07-01	drm/amdgpu/jpeg: fix jpeg_v4_0_3_is_idle detection	Boyuan Zhang
	jpeg_v4_0_3_is_idle() initializes ret to false and then accumulates ring idle status using &=. Since false & condition always remains false, the function can never report the JPEG block as idle. Initialize ret to true so the function returns true only when all JPEG rings report RB_JOB_DONE. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e9df8e9d04e0593d17ddb069f3b7958991cd18c9) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: Fix kernel panic during driver load failure	Harish Kasiviswanathan
	Avoid kernel panic if MES init fails during driver load. The KIQ ring is falsely marked as ready as ASICs that use MES, KIQ is owned by MES. BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:gfx_v12_1_wait_reg_mem+0x5a/0x1f0 [amdgpu] Call Trace: gfx_v12_1_ring_emit_reg_write_reg_wait+0x1f/0x30 [amdgpu] amdgpu_gmc_fw_reg_write_reg_wait+0xb2/0x190 [amdgpu] amdgpu_gmc_flush_gpu_tlb+0x1cc/0x230 [amdgpu] amdgpu_gart_invalidate_tlb+0x81/0xa0 [amdgpu] amdgpu_gart_unbind+0x72/0x90 [amdgpu] amdgpu_ttm_backend_unbind+0xa4/0xb0 [amdgpu] amdgpu_ttm_tt_unpopulate+0x13/0xd0 [amdgpu] amdttm_tt_unpopulate+0x29/0x70 [amdttm] ttm_bo_put+0x1eb/0x360 [amdttm] amdgpu_bo_free_kernel+0xf9/0x1f0 [amdgpu] amdgpu_ih_ring_fini+0x5a/0x90 [amdgpu] amdgpu_irq_fini_hw+0x58/0x80 [amdgpu] amdgpu_device_fini_hw+0x4e0/0x5b0 [amdgpu] amdgpu_driver_load_kms+0x60/0xa0 [amdgpu] amdgpu_pci_probe+0x28e/0x6d0 [amdgpu] pci_device_probe+0x19f/0x220 really_probe+0x1ed/0x340 driver_probe_device+0x1e/0x80 __driver_attach+0xd3/0x1a0 bus_for_each_dev+0x68/0xa0 bus_add_driver+0x19f/0x270 driver_register+0x5d/0xf0 do_one_initcall+0xac/0x200 do_init_module+0x1ec/0x280 __se_sys_finit_module+0x2de/0x310 do_syscall_64+0x6a/0x250 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4623b958dd6da0f4c3026afdf330626a09ecb0f0) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: flush pending RCU callbacks on module unload	Perry Yuan
	Call rcu_barrier() in module exit to wait for outstanding call_rcu() callbacks before freeing module text, preventing late callback execution in freed memory. BUG: unable to handle page fault for address: ffffffffc1d59c40 PGD 6a12067 P4D 6a12067 PUD 6a14067 PMD 13698b067 PTE 0 Oops: 0010 [#1] SMP NOPTI RIP: 0010:0xffffffffc1d59c40 Code: Unable to access opcode bytes at RIP 0xffffffffc1d59c16. RSP: 0018:ffffc900198c0f28 EFLAGS: 00010286 RAX: ffffffffc1d59c40 RBX: ffff897c7d6b61c0 RCX: ffff88826aff4590 RDX: ffff8884d8b35490 RSI: ffffc900198c0f30 RDI: ffff88812af67290 RBP: 000000000000000a (DONE segment entries) R08: 0000000000000000 R09: 0000000000000100 R10: 0000000000000000 R11: ffffffff82a06100 R12: ffff88811a4e3700 R13: 0000000000000000 R14: ffff897c7d6b6270 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff897c7d680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc1d59c16 CR3: 00000104a980a001 CR4: 0000000002770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? rcu_do_batch+0x163/0x450 ? rcu_core+0x177/0x1c0 ? __do_softirq+0xc1/0x280 ? asm_call_irq_on_stack+0xf/0x20 </IRQ> ? do_softirq_own_stack+0x37/0x50 ? irq_exit_rcu+0xc4/0x100 ? sysvec_apic_timer_interrupt+0x36/0x80 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 ? cpuidle_enter_state+0xd4/0x360 ? cpuidle_enter+0x29/0x40 ? cpuidle_idle_call+0x108/0x1a0 ? do_idle+0x77/0xf0 ? cpu_startup_entry+0x19/0x20 ? secondary_startup_64_no_verify+0xbf/0xcb Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit feaa5039f6c12acc9aa934c2d45dcd251a12c69f)
2026-07-01	drm/amdgpu: Fix AMDGPU_GTT_MAX_TRANSFER_SIZE for non-4K systems	Donet Tom
	Running RCCL unit tests on a system with a 64K PAGE_SIZE triggers the following warning and causes the test to terminate on latest upstream kernel: WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1335 at amdgpu_bo_release_notify+0x1bc/0x280 [amdgpu], CPU#18: rccl-UnitTests/33151 Call trace: amdgpu_bo_release_notify ttm_bo_release amdgpu_gem_object_free drm_gem_object_free amdgpu_bo_unref amdgpu_bo_create amdgpu_bo_create_user amdgpu_gem_object_create amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu kfd_ioctl_alloc_memory_of_gpu kfd_ioctl sys_ioctl The warning is triggered because amdgpu_ttm_next_clear_entity() returns NULL when a clear buffer operation is requested. This happens because the GART window allocation for the default_entity, clear_entity and move_entity fails during initialization. Commit [1] introduced separate GART windows for the default_entity, clear_entity and move_entity of each SDMA instance. Their sizes are derived from AMDGPU_GTT_MAX_TRANSFER_SIZE, which is currently defined as 1024 pages. This implicitly assumes a 4K PAGE_SIZE, where 1024 pages correspond to a 4MB transfer. On a 64K PAGE_SIZE system, however, the same value expands to 64MB. The default_entity and clear_entity each allocate one AMDGPU_GTT_MAX_TRANSFER_SIZE GART window, while the move_entity allocates two such windows. This results in 16MB of GART space per SDMA instance on a 4K PAGE_SIZE system, but 256MB per SDMA instance on a 64K PAGE_SIZE system. On an MI210 system with five SDMA instances and a 512MB GART aperture, the total GART space required becomes 1.25GB, exceeding the available GART aperture. Consequently, GART window allocation fails, amdgpu_ttm_next_clear_entity() returns NULL, and the above warning is triggered. Redefine AMDGPU_GTT_MAX_TRANSFER_SIZE in bytes instead of page units. Where a page count is required, convert it using PAGE_SHIFT. This preserves the existing 4MB transfer size across all PAGE_SIZE configurations while keeping GART window allocations within the available GART aperture. [1] https://lore.kernel.org/all/20260408100327.1372-3-pierre-eric.pelloux-prayer@amd.com/#t Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5435 Fixes: 897ee11ec020 ("drm/amdgpu: create multiple clear/move ttm entities") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 27213b776a666d3030de5acc3cd75278197b0494) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: add support for GC IP version 11.7.1	Granthali Vinodkumar Dhandar
	Initialize GC IP 11_7_1 Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a928d8d81ec5cdb5a8944d08136720811efad0f6)
2026-07-01	drm/amdgpu: add support for GC IP version 11.7.0	Granthali Vinodkumar Dhandar
	Initialize GC IP 11_7_0 Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cf591e67c095542a16475df293ec7bc9a118e4ee)
2026-07-01	drm/amdgpu: add the doorbell index input for suspending userq	Prike Liang
	It requires inputing the doorbell offset for MES firmware preempts the userq, and adding the doorbell offset also keep aliging with the union MESAPI__SUSPEND in MES firmware. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bc434335ab3c096a33a9e88c7951b4ac574db458) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/mes12: set doorbell offset for suspending userq	Prike Liang
	Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to add the doorbell offset for suspending userq. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5b58a2c120063544869d0284d3b355527f9f04f5) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/mes11: set doorbell offset for suspending userq	Prike Liang
	Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to add the doorbell offset for suspending userq. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 30af09db33696f7e0de5c0c505cbb0cb92b6e25b) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: fix check in amdgpu_hmm_invalidate_gfx	Christian König
	For a short moment during alloc/free the userptr BO is not part of his VM, so bo->vm_bo can be NULL. Keep a reference to the VM root PD as parent of the userptr BO so that we can always use that to wait for all submissions of the VM instead of only the one involving the userptr BO. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 91250893cbaa ("drm/amdgpu: fix waiting for all submissions for userptrs") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5399 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 631849ff5d603841e74f19f4a5e30fe1f7d7cf30) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/jpeg: fix jpeg_v5_0_1_is_idle detection	Boyuan Zhang
	jpeg_v5_0_1_is_idle() initializes ret to false and then accumulates ring idle status using &=. Since false & condition always remains false, the function can never report the JPEG block as idle. Initialize ret to true so the function returns true only when all JPEG rings report RB_JOB_DONE. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 680adf5faeeabb4585f7aeb53681719e2d6c2f41) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: Rename moved state to needs_update	Natalie Vock
	This state can be reached via other means than physical moves, like PRT bindings. Make the name match the actual purpose of the state. Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1f7a795fb9f8186bd81ca9c4a80f75482db53c9e)
2026-07-01	drm/amdgpu: Only set bo->moved when the BO was actually moved	Natalie Vock
	The "moved" VM state is a bit unfortunately named, because BOs can end up in this state without being physically moved. While we need to invalidate every mapping when BOs are physically moved, in some other cases like PRT binds/unbinds there is no need to refresh mappings except those affected by the bind. Full invalidation of all BO mappings manifested as severe regressions in PRT bind performance, which this patch fixes. The offending patch is 4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4") in the amd-staging-drm-next tree, although it has not yet propagated anywhere else. Fixes: 4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5437 Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0b2fa33b4235991a100dd799c891cf5c242aaed1) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: invoke pm_genpd_remove() before freeing genpd	Ce Sun
	Call pm_genpd_remove() to unregister from global list prior to releasing acp_genpd memory, and clear the pointer after free. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cd8650d7a91ee8b768e202354672553faa5cc1f2) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: fix resource leak on ACP reset timeout	Ce Sun
	When ACP soft reset poll times out, original code returns early without cleanup, leaking MFD child devices, genpd links and all ACP heap allocations. Replace direct early return with goto out to force run all cleanup logic regardless of reset success, preserve timeout error code for caller. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 98073e4328d7a8d75d03696ab27f6de70ef1aeda) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: reject mapping a reserved doorbell to a new queue	Zhu Lingshan
	When creating an user-queue, the user space provides a doorbell BO handle and an offset within the bo to obtain a doorbell. However current implementation using xa_store_irq() to store a doorbell, which allows a later queue created with the same BO and offset parameters to overwrite an existing queue and doorbell mapping. This can cause problems like misrouting fence IRQ processing to a wrong queue, and mislead the cleanup process of one queue erasing the mapping of another queue. This commit fixes this issue by replacing xa_store_irq with xa_insert_irq, which rejects mapping a reserved doorbell to a newly created queue Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6244eae22966350db52faf9c1369d3b2ffc5de4e) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/gfx12: fix EOP interrupt routing for KQ and userq	Jesse Zhang
	Try KQ by ring_id first (KCQ and UQ never share a HW slot); fall back to amdgpu_userq_process_fence_irq() on miss, since KCQ EOPs were misrouted into the userq fence path when enable_mes is true. Require a strict (me,pipe,queue) match in the gfx case, then userq gfx EOPs fall through to amdgpu_userq_process_fence_irq(). Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6c1f4f7ff08448e0e18cd7fc4e59d6c96a36f25d) Cc: stable@vger.kernel.org