linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
46 hours	drm/amdgpu/ttm: Consider concurrent VM flushes for buffer entities	Timur Kristóf
	Allow using multiple SDMA schedulers only on GPUs where we are allowed to do concurrent VM flushes. This consideration is necessary because all GART windows are mapped in VMID 0 (the kernel VMID) so each buffer entity would flush VMID 0 concurrently. Practically this means that we can't use multiple SDMA engines for TTM on GFX6-8 and Navi 1x. Fixes: 01c836788b37 ("drm/amdgpu: pass all the sdma scheds to amdgpu_mman") Fixes: e4029f7a9474 ("drm/amdgpu: only use working sdma schedulers for ttm") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8171229bc836607fbc225d323ebc4d14489cfbb)
46 hours	drm/amdgpu: Disable PCIe dynamic speed switching on Ryzen Pinnacle Ridge	Mario Limonciello
	AMD Ryzen Pinnacle Ridge (Zen+, family 0x17 model 0x08) CPUs have PCI controllers that don't support PCIe dynamic speed switching, causing system freezes during GPU initialization when enabled. Disable dynamic speed switching when this CPU is detected. Assisted-by: Claude:sonnet Fixes: 466a7d115326 ("drm/amd: Use the first non-dGPU PCI device for BW limits") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5436 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://patch.msgid.link/20260709031520.841611-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9ceb4e034a327a04155f32f1cd1a5031dfa5fe02) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu: always emit the job vm fence	Alex Deucher
	We need the fence to reemit the gds switch or spm update after a queue reset. Fixes: a17ef941212b ("drm/amdgpu: rework ring reset backup and reemit v9") Cc: timur.kristof@gmail.com Cc: christian.koenig@amd.com Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bc639a9eadc75822f7f15a4315c198a4b5513bd2) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu: Reserve space for IB contents in devcoredumps	Timur Kristóf
	Currently the contents of IBs are abruptly cut off and don't show the full contents. This patch makes sure to reserve space for those contents too so they may be printed. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e2c0821509fed754e8c31d5053d152fbb3484a5) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu: Print vmid, pasid and more task info in devcoredump	Timur Kristóf
	These are in the dmesg logs but are missing from devcoredumps. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fed7aa36d79802c3e02acd05aeae8b0a877e47c2) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu: Release VFCT ACPI table reference	Mario Limonciello
	amdgpu_acpi_vfct_bios() fetches the VFCT table with acpi_get_table() but never releases it. acpi_get_table() takes a reference on the table (incrementing its validation_count and mapping it on the 0->1 transition); without a paired acpi_put_table() the mapping is leaked on every call, whether or not a matching VBIOS image is found. Route all exit paths after the table is acquired through a common acpi_put_table(). The VBIOS image is copied out with kmemdup() before the table is released, so it remains valid for the caller. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260708193518.702584-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ca5988682b4cba4cd125a0fa99b2de1239164ae4) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu: Fix VFCT bus number matching with soft filter	Mario Limonciello
	On systems where PCI bus renumbering occurs (e.g. pci=realloc, resource conflicts), the runtime bus number may differ from the BIOS POST bus number recorded in the VFCT table. This causes amdgpu_acpi_vfct_bios() to fail finding the VBIOS even though the correct device entry exists. Introduce amdgpu_acpi_vfct_match() which treats the bus number as a soft filter: vendor/device/function identity is the hard requirement, while exact bus match is the preferred path. When bus numbers disagree but device identity matches, accept the VFCT entry and log a dev_notice for diagnostics. Reported-by: Oz Tiram <oz@shift-computing.de> Closes: https://lore.kernel.org/amd-gfx/20260621173211.28443-1-oz@shift-computing.de/ Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260708193518.702584-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 11c141672045ffc0187aa604f2c0f597bc334fb2) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu: fix bo->pin leaking in amdgpu_bo_create_reserved	Zhu Lingshan
	amdgpu_bo_create_reserved() only allocates a new BO when bo_ptr (struct amdgpu_bo bo_ptr as input parameter) is NULL, it simply skips creation when bo_ptr is non-NULL. But it unconditionally reserves, pins, gart allocates and maps the BO afterwards. When the same non-NULL BO pointer is passed in again, for example firmware buffers that live in adev and are re-loaded on every resume / cp_resume / start under AMDGPU_FW_LOAD_DIRECT, amdgpu_bo_pin() just increases pin_count unconditionally, however the matching teardown only unpins once, so pin_count never drops to zero, so TTM is not able to move, swap or evict a BO, causing BO leaks. This commit fixes this issue by only pinning the bo once at creation, and repeated calls no longer take additional pin references. Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ddc0ae76202c447b6aec61e907b852bc94671cf) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu/userq: fix indefinite fence wait during GPU reset	Jesse Zhang
	pre_reset only force-completes fences of MAPPED queues. A queue in any other state (e.g. mid-eviction) keeps its last_fence pending; after a GPU reset that fence never signals, so the eviction/suspend worker and process teardown (amdgpu_evf_mgr_flush_suspend) wait on it forever and wedge the machine: INFO: task kworker/6:28 blocked for more than 120 seconds. Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] Call Trace: dma_fence_wait_timeout+0x7e/0x130 amdgpu_userq_evict+0x67/0x140 [amdgpu] amdgpu_eviction_fence_suspend_worker+0xd8/0x160 [amdgpu] process_scheduled_works+0xa6/0x420 Force-complete every queue's fence regardless of state. The unmap and mark-hung step stays gated on MAPPED, since unmapping a queue that is not mapped is invalid. Fixes: 290f46cf5726 ("drm/amdgpu: Implement user queue reset functionality") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9102b39fa924dcc3dc75a3137bfa9633c40b88c0) Cc: stable@vger.kernel.org
46 hours	drm/amdgpu/discovery: Fix device family for DCN42	Roman Li
	GC 11.7.0 and 11.7.1 should map to AMDGPU_FAMILY_GC_11_5_4 for DCN42. Fixes: cf591e67c095 ("drm/amdgpu: add support for GC IP version 11.7.0") Fixes: a928d8d81ec5 ("drm/amdgpu: add support for GC IP version 11.7.1") Signed-off-by: Roman Li <Roman.Li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f8ee6447e7ec1d75d6663c817e45566dd01f440b)
10 days	Merge tag 'amd-drm-fixes-7.2-2026-07-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.2-2026-07-09: amdgpu: - PSP 15.0.9 update - SMU 15.0.9 update - VCN 5.3 fix - VI ASPM fix - Userq fix - lifetime fix for amdgpu_vm_get_task_info_pasid() - Gfx10 fix - SMU 14 fix amdkfd: - CRIU bounds checking fixes - secondary context id fix - Event bounds checking fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260709212303.15913-1-alexander.deucher@amd.com
10 days	Merge tag 'drm-misc-fixes-2026-07-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v7.2-rc3: - Fix uaf in amdxdna mmap failure path. - A lot of deadlocks, access races and return value fixes in amdxdna. - Fix analogix_dp bitshifts during link training. - Use direct label in drm_exec. - Fix absent indirect bo handling in v3d. - Sync on first active crtc in fb_dirty, rather than first crtc. - Rework try_harder in the buddy allocator. - Make imagination function static to solve compiler warning. - Fix imagination error checking. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/71e5b48b-307f-47f5-8fd5-b60ea43e4196@linux.intel.com
11 days	drm/gfx10: Program DB_RING_CONTROL	Alex Deucher
	This is needed to allocate occlusion counters across both gfx pipes. Fixes: b7a1a0ef12b8 ("drm/amd/amdgpu: add pipe1 hardware support") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6807352cbabb74b61ba42888769283af72191f66) Cc: stable@vger.kernel.org
11 days	drm/amdgpu: fix lifetime issue of amdgpu_vm_get_task_info_pasid()	Shahyan Soltani
	The vm pointer returned from amdgpu_vm_get_vm_from_pasid() is only valid while the lock is still being held. Once xa_unlock_irqrestore is called and returned, the pointer is no longer under lock and is subject to modification. Since, the caller still dereferences vm->task_info in amdgpu_vm_get_task_info_vm() after the lock is removed, this causes a use after unlock problem. Remove the lifetime issue present in amdgpu_vm_get_task_info_pasid() through removing the amdgpu_vm_get_vm_from_pasid() function from amdgpu_vm.c and making the relevant code inline to hold the lock while it is still in use. Signed-off-by: Shahyan Soltani <shahyan.soltani@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9d01579f3f868b333acc901815972685989092c7) Cc: stable@vger.kernel.org
11 days	drm/amdgpu: trigger GPU recovery when userq destroy fails to unmap a hung queue	Jesse Zhang
	Destroying a hung user queue issues a MES REMOVE_QUEUE that times out, The destroy path only logged the error and freed the queue, so the next userq submission failed and forced a GPU reset attributed to an innocent workload. Kick the userq reset work when unmap fails so the GPU is recovered at destroy time. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8396b9de4198a54ec4760a94a179347540a9764d) Cc: stable@vger.kernel.org
11 days	drm/amd/amdgpu: disable ASPM on VI if pcie dpm is disabled	Kenneth Feng
	Disable ASPM on VI if PCIE dpm is disabled. Fixes: bb00bf17328d ("drm/amd/amdgpu: decouple ASPM with pcie dpm") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5370 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 873a8d6b3c0a386408c891e4ff1c684fa11783e1) Cc: stable@vger.kernel.org
11 days	drm/amdgpu: Disable JDPG on VCN5_3	Suresh Guttula
	JDPG does not support on VCN5 This patch will disable JDPG, because DPG is not correctly copying the JRBC Read/Write Pointers (R/WPTR) from the PG (Power Gating) block to JRBC. Signed-off-by: Suresh Guttula <suresh.guttula@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ea3fdd1eda088030d8925f023613728969f55955)
11 days	drm/amdgpu: add support for SMU version 15.0.9	Kanala Ramalingeswara Reddy
	Initialize SMU Version 15_0_9 Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com> Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1dfd4e84b5beec353a81d61af9eaf4e5a56e0c57)
11 days	drm/amdgpu: add support for PSP version 15.0.9	Kanala Ramalingeswara Reddy
	Initialize PSP Version 15_0_9 Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com> Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ef71f00173228904763552b7405169023f8034a8)
12 days	drm/drm_exec: avoid indirect goto	Christian König
	The drm_exec component uses a variable with scope limited to the for() and an indirect goto to allow instantiating multiple macros in the same function. This unfortunately doesn't work well with certain compilers when the indirect goto can't be lowered to a direct jump. Switch the indirect goto to a direct goto, the drawback is that we now can't use the dma_exec_until_all_locked() macro in the same function multiple times. The is currently only one user of this and only as a hacky workaround which is about to be removed. So document that the __label__ statement should be used when the macro is used multiple times and fix the tests and the only use case where that is necessary. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 9920249a5288 ("drm/amdgpu: convert amdgpu_vm_lock_by_pasid() to drm_exec") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202606231854.7LeCtlLe-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606232356.gwHMAJAW-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606240753.kYjobJVl-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606241110.iUga5vVw-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607031446.1PWG18mN-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607031837.HSmBj8pr-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607040159.GopyEswS-lkp@intel.com/ Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://lore.kernel.org/r/20260704084133.122053-1-christian.koenig@amd.com
2026-07-01	drm/amdgpu/jpeg: fix jpeg_v4_0_3_is_idle detection	Boyuan Zhang
	jpeg_v4_0_3_is_idle() initializes ret to false and then accumulates ring idle status using &=. Since false & condition always remains false, the function can never report the JPEG block as idle. Initialize ret to true so the function returns true only when all JPEG rings report RB_JOB_DONE. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e9df8e9d04e0593d17ddb069f3b7958991cd18c9) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: Fix kernel panic during driver load failure	Harish Kasiviswanathan
	Avoid kernel panic if MES init fails during driver load. The KIQ ring is falsely marked as ready as ASICs that use MES, KIQ is owned by MES. BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:gfx_v12_1_wait_reg_mem+0x5a/0x1f0 [amdgpu] Call Trace: gfx_v12_1_ring_emit_reg_write_reg_wait+0x1f/0x30 [amdgpu] amdgpu_gmc_fw_reg_write_reg_wait+0xb2/0x190 [amdgpu] amdgpu_gmc_flush_gpu_tlb+0x1cc/0x230 [amdgpu] amdgpu_gart_invalidate_tlb+0x81/0xa0 [amdgpu] amdgpu_gart_unbind+0x72/0x90 [amdgpu] amdgpu_ttm_backend_unbind+0xa4/0xb0 [amdgpu] amdgpu_ttm_tt_unpopulate+0x13/0xd0 [amdgpu] amdttm_tt_unpopulate+0x29/0x70 [amdttm] ttm_bo_put+0x1eb/0x360 [amdttm] amdgpu_bo_free_kernel+0xf9/0x1f0 [amdgpu] amdgpu_ih_ring_fini+0x5a/0x90 [amdgpu] amdgpu_irq_fini_hw+0x58/0x80 [amdgpu] amdgpu_device_fini_hw+0x4e0/0x5b0 [amdgpu] amdgpu_driver_load_kms+0x60/0xa0 [amdgpu] amdgpu_pci_probe+0x28e/0x6d0 [amdgpu] pci_device_probe+0x19f/0x220 really_probe+0x1ed/0x340 driver_probe_device+0x1e/0x80 __driver_attach+0xd3/0x1a0 bus_for_each_dev+0x68/0xa0 bus_add_driver+0x19f/0x270 driver_register+0x5d/0xf0 do_one_initcall+0xac/0x200 do_init_module+0x1ec/0x280 __se_sys_finit_module+0x2de/0x310 do_syscall_64+0x6a/0x250 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4623b958dd6da0f4c3026afdf330626a09ecb0f0) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: flush pending RCU callbacks on module unload	Perry Yuan
	Call rcu_barrier() in module exit to wait for outstanding call_rcu() callbacks before freeing module text, preventing late callback execution in freed memory. BUG: unable to handle page fault for address: ffffffffc1d59c40 PGD 6a12067 P4D 6a12067 PUD 6a14067 PMD 13698b067 PTE 0 Oops: 0010 [#1] SMP NOPTI RIP: 0010:0xffffffffc1d59c40 Code: Unable to access opcode bytes at RIP 0xffffffffc1d59c16. RSP: 0018:ffffc900198c0f28 EFLAGS: 00010286 RAX: ffffffffc1d59c40 RBX: ffff897c7d6b61c0 RCX: ffff88826aff4590 RDX: ffff8884d8b35490 RSI: ffffc900198c0f30 RDI: ffff88812af67290 RBP: 000000000000000a (DONE segment entries) R08: 0000000000000000 R09: 0000000000000100 R10: 0000000000000000 R11: ffffffff82a06100 R12: ffff88811a4e3700 R13: 0000000000000000 R14: ffff897c7d6b6270 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff897c7d680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc1d59c16 CR3: 00000104a980a001 CR4: 0000000002770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? rcu_do_batch+0x163/0x450 ? rcu_core+0x177/0x1c0 ? __do_softirq+0xc1/0x280 ? asm_call_irq_on_stack+0xf/0x20 </IRQ> ? do_softirq_own_stack+0x37/0x50 ? irq_exit_rcu+0xc4/0x100 ? sysvec_apic_timer_interrupt+0x36/0x80 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 ? cpuidle_enter_state+0xd4/0x360 ? cpuidle_enter+0x29/0x40 ? cpuidle_idle_call+0x108/0x1a0 ? do_idle+0x77/0xf0 ? cpu_startup_entry+0x19/0x20 ? secondary_startup_64_no_verify+0xbf/0xcb Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit feaa5039f6c12acc9aa934c2d45dcd251a12c69f)
2026-07-01	drm/amdgpu: Fix AMDGPU_GTT_MAX_TRANSFER_SIZE for non-4K systems	Donet Tom
	Running RCCL unit tests on a system with a 64K PAGE_SIZE triggers the following warning and causes the test to terminate on latest upstream kernel: WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1335 at amdgpu_bo_release_notify+0x1bc/0x280 [amdgpu], CPU#18: rccl-UnitTests/33151 Call trace: amdgpu_bo_release_notify ttm_bo_release amdgpu_gem_object_free drm_gem_object_free amdgpu_bo_unref amdgpu_bo_create amdgpu_bo_create_user amdgpu_gem_object_create amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu kfd_ioctl_alloc_memory_of_gpu kfd_ioctl sys_ioctl The warning is triggered because amdgpu_ttm_next_clear_entity() returns NULL when a clear buffer operation is requested. This happens because the GART window allocation for the default_entity, clear_entity and move_entity fails during initialization. Commit [1] introduced separate GART windows for the default_entity, clear_entity and move_entity of each SDMA instance. Their sizes are derived from AMDGPU_GTT_MAX_TRANSFER_SIZE, which is currently defined as 1024 pages. This implicitly assumes a 4K PAGE_SIZE, where 1024 pages correspond to a 4MB transfer. On a 64K PAGE_SIZE system, however, the same value expands to 64MB. The default_entity and clear_entity each allocate one AMDGPU_GTT_MAX_TRANSFER_SIZE GART window, while the move_entity allocates two such windows. This results in 16MB of GART space per SDMA instance on a 4K PAGE_SIZE system, but 256MB per SDMA instance on a 64K PAGE_SIZE system. On an MI210 system with five SDMA instances and a 512MB GART aperture, the total GART space required becomes 1.25GB, exceeding the available GART aperture. Consequently, GART window allocation fails, amdgpu_ttm_next_clear_entity() returns NULL, and the above warning is triggered. Redefine AMDGPU_GTT_MAX_TRANSFER_SIZE in bytes instead of page units. Where a page count is required, convert it using PAGE_SHIFT. This preserves the existing 4MB transfer size across all PAGE_SIZE configurations while keeping GART window allocations within the available GART aperture. [1] https://lore.kernel.org/all/20260408100327.1372-3-pierre-eric.pelloux-prayer@amd.com/#t Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5435 Fixes: 897ee11ec020 ("drm/amdgpu: create multiple clear/move ttm entities") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 27213b776a666d3030de5acc3cd75278197b0494) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: add support for GC IP version 11.7.1	Granthali Vinodkumar Dhandar
	Initialize GC IP 11_7_1 Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a928d8d81ec5cdb5a8944d08136720811efad0f6)
2026-07-01	drm/amdgpu: add support for GC IP version 11.7.0	Granthali Vinodkumar Dhandar
	Initialize GC IP 11_7_0 Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cf591e67c095542a16475df293ec7bc9a118e4ee)
2026-07-01	drm/amdgpu: add the doorbell index input for suspending userq	Prike Liang
	It requires inputing the doorbell offset for MES firmware preempts the userq, and adding the doorbell offset also keep aliging with the union MESAPI__SUSPEND in MES firmware. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bc434335ab3c096a33a9e88c7951b4ac574db458) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/mes12: set doorbell offset for suspending userq	Prike Liang
	Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to add the doorbell offset for suspending userq. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5b58a2c120063544869d0284d3b355527f9f04f5) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/mes11: set doorbell offset for suspending userq	Prike Liang
	Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to add the doorbell offset for suspending userq. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 30af09db33696f7e0de5c0c505cbb0cb92b6e25b) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: fix check in amdgpu_hmm_invalidate_gfx	Christian König
	For a short moment during alloc/free the userptr BO is not part of his VM, so bo->vm_bo can be NULL. Keep a reference to the VM root PD as parent of the userptr BO so that we can always use that to wait for all submissions of the VM instead of only the one involving the userptr BO. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 91250893cbaa ("drm/amdgpu: fix waiting for all submissions for userptrs") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5399 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 631849ff5d603841e74f19f4a5e30fe1f7d7cf30) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/jpeg: fix jpeg_v5_0_1_is_idle detection	Boyuan Zhang
	jpeg_v5_0_1_is_idle() initializes ret to false and then accumulates ring idle status using &=. Since false & condition always remains false, the function can never report the JPEG block as idle. Initialize ret to true so the function returns true only when all JPEG rings report RB_JOB_DONE. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 680adf5faeeabb4585f7aeb53681719e2d6c2f41) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: Rename moved state to needs_update	Natalie Vock
	This state can be reached via other means than physical moves, like PRT bindings. Make the name match the actual purpose of the state. Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1f7a795fb9f8186bd81ca9c4a80f75482db53c9e)
2026-07-01	drm/amdgpu: Only set bo->moved when the BO was actually moved	Natalie Vock
	The "moved" VM state is a bit unfortunately named, because BOs can end up in this state without being physically moved. While we need to invalidate every mapping when BOs are physically moved, in some other cases like PRT binds/unbinds there is no need to refresh mappings except those affected by the bind. Full invalidation of all BO mappings manifested as severe regressions in PRT bind performance, which this patch fixes. The offending patch is 4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4") in the amd-staging-drm-next tree, although it has not yet propagated anywhere else. Fixes: 4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5437 Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0b2fa33b4235991a100dd799c891cf5c242aaed1) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: invoke pm_genpd_remove() before freeing genpd	Ce Sun
	Call pm_genpd_remove() to unregister from global list prior to releasing acp_genpd memory, and clear the pointer after free. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cd8650d7a91ee8b768e202354672553faa5cc1f2) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: fix resource leak on ACP reset timeout	Ce Sun
	When ACP soft reset poll times out, original code returns early without cleanup, leaking MFD child devices, genpd links and all ACP heap allocations. Replace direct early return with goto out to force run all cleanup logic regardless of reset success, preserve timeout error code for caller. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 98073e4328d7a8d75d03696ab27f6de70ef1aeda) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: reject mapping a reserved doorbell to a new queue	Zhu Lingshan
	When creating an user-queue, the user space provides a doorbell BO handle and an offset within the bo to obtain a doorbell. However current implementation using xa_store_irq() to store a doorbell, which allows a later queue created with the same BO and offset parameters to overwrite an existing queue and doorbell mapping. This can cause problems like misrouting fence IRQ processing to a wrong queue, and mislead the cleanup process of one queue erasing the mapping of another queue. This commit fixes this issue by replacing xa_store_irq with xa_insert_irq, which rejects mapping a reserved doorbell to a newly created queue Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6244eae22966350db52faf9c1369d3b2ffc5de4e) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/gfx12: fix EOP interrupt routing for KQ and userq	Jesse Zhang
	Try KQ by ring_id first (KCQ and UQ never share a HW slot); fall back to amdgpu_userq_process_fence_irq() on miss, since KCQ EOPs were misrouted into the userq fence path when enable_mes is true. Require a strict (me,pipe,queue) match in the gfx case, then userq gfx EOPs fall through to amdgpu_userq_process_fence_irq(). Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6c1f4f7ff08448e0e18cd7fc4e59d6c96a36f25d) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/gfx11: fix EOP interrupt routing for KQ and userq	Jesse Zhang
	Try KQ by ring_id first (KCQ and UQ never share a HW slot); fall back to amdgpu_userq_process_fence_irq() on miss, since KQ EOPs were misrouted into the userq fence path when enable_mes is true. Require a strict (me,pipe,queue) match in the gfx case, then userq gfx EOPs fall through to amdgpu_userq_process_fence_irq(). Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 88e589cc811ba907209a426c426c469bcb4bb894) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: fix aperture mapping leak	Asad Kamal
	amdgpu_pci_remove() calls drm_dev_unplug() before invoking the driver fini routines. This causes drm_dev_enter() in amdgpu_ttm_fini() to always return false, so iounmap(aper_base_kaddr) never runs on normal driver unload, leaving an orphaned entry in the x86 PAT interval tree. On connected_to_cpu hardware, the aperture is mapped write-back (WB) via ioremap_cache(). On reload, IP discovery calls memremap(..., MEMREMAP_WC) over the same range. The WC vs WB conflict causes: ioremap error for 0x..., requested 0x1, got 0x0 amdgpu: discovery failed: -2 Fix by switching to devres-managed mappings so cleanup is guaranteed regardless of drm_dev_enter() state: - connected_to_cpu path: devm_memremap(MEMREMAP_WB). For IORESOURCE_SYSTEM_RAM ranges this takes the try_ram_remap() shortcut, returning __va(offset) from the existing kernel direct map. No new ioremap VA or PAT entry is created, so there is nothing to orphan. - dGPU path: devm_ioremap_wc() registers iounmap() as a devres action, guaranteeing cleanup at device_del() time. Also remove iounmap(aper_base_kaddr) from amdgpu_device_unmap_mmio() since the mapping is now devres-owned. v2: Remove redundant x86_64 guard (Lijo) Fixes: 9d0af8b4def0 ("drm/amdgpu: pre-map device buffer as cached for A+A config") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d871e99879cb5fd1fa798b006b4888887e63a17a) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/vce: fix integer overflow in image size	Boyuan Zhang
	Fix a security vulnerability where malicious VCE command streams with oversized dimensions (e.g. 65536×65536) cause 32-bit integer overflow, wrapping the calculated buffer size to 0. This bypasses validation and allows GPU firmware to perform out-of-bound memory access. The fix uses 64-bit arithmetic to detect overflow and rejects invalid dimensions before they reach the hardware. V2: remove redundant check V3: modify max height value V4: remove size64 Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cbe408dba581755ad1279a487ec786d8927d778d) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/vcn4: avoid rereading IB param length	Boyuan Zhang
	Reuse the parameter length returned by vcn_v4_0_enc_find_ib_param() instead of rereading it from the IB. This avoids a potential TOCTOU issue if the IB contents change between reads. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: David Rosca <david.rosca@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dbb02b4755f8c1f3773263f2d779872c1c0c073a) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu: fix division by zero with invalid uvd dimensions	Boyuan Zhang
	When width or height is less than 16, width_in_mb or height_in_mb becomes 0, leading to fs_in_mb being 0. This causes a division by zero when calculating num_dpb_buffer in H264 and H264 Perf decode paths. Add validation to reject frames with width < 16 or height < 16 before performing any calculations that depend on these values. V2: Format change - move up all vaiable definitions. V3: Use warn_once to avoid spam. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3e41d26c70b0a459d041cc19482a226c4b7423cb) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/sdma7.1: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c4f230b51cf2d3e7e8b1c800331f3dbed2a9e3f5) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/sdma7.0: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9723a8bed3aa251a26bee4583bac9d8fb064dd44) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/sdma6.0: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c17a508a7d652da3728f8bbc481bfffe96d65a87) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/sdma5.2: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ae658afc7f47f6147371ec42cc6b1a793dfdb5af) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/sdma5.0: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8d144a0eb09537055841af48c9e7c2d4cd48e84d) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/sdma4.4.2: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fa4f86a148271e325e95287630a3a15a9cd35fdc) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/gfx12.1: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e4d99e04b2e9b13b97d3b17804c735f62689db23) Cc: stable@vger.kernel.org
2026-07-01	drm/amdgpu/gfx12: replace BUG_ON() with WARN_ON()	Alex Deucher
	There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f952076f76d62f783e8ba4995a7c400d39354ccf) Cc: stable@vger.kernel.org