linux-stable.git/drivers/gpu/drm/amd, branch master

drm/amdgpu/jpeg: fix jpeg_v4_0_3_is_idle detection

2026-07-01T17:02:53+00:00

jpeg_v4_0_3_is_idle() initializes ret to false and then accumulates ring
idle status using &=. Since false & condition always remains false, the
function can never report the JPEG block as idle.

Initialize ret to true so the function returns true only when all JPEG
rings report RB_JOB_DONE.

Signed-off-by: Boyuan Zhang 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
(cherry picked from commit e9df8e9d04e0593d17ddb069f3b7958991cd18c9)
Cc: stable@vger.kernel.org

drm/amdgpu: Fix kernel panic during driver load failure

2026-07-01T17:02:32+00:00

Avoid kernel panic if MES init fails during driver load. The KIQ ring is
falsely marked as ready as ASICs that use MES, KIQ is owned by MES.

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:gfx_v12_1_wait_reg_mem+0x5a/0x1f0 [amdgpu]
Call Trace:
 gfx_v12_1_ring_emit_reg_write_reg_wait+0x1f/0x30 [amdgpu]
 amdgpu_gmc_fw_reg_write_reg_wait+0xb2/0x190 [amdgpu]
 amdgpu_gmc_flush_gpu_tlb+0x1cc/0x230 [amdgpu]
 amdgpu_gart_invalidate_tlb+0x81/0xa0 [amdgpu]
 amdgpu_gart_unbind+0x72/0x90 [amdgpu]
 amdgpu_ttm_backend_unbind+0xa4/0xb0 [amdgpu]
 amdgpu_ttm_tt_unpopulate+0x13/0xd0 [amdgpu]
 amdttm_tt_unpopulate+0x29/0x70 [amdttm]
 ttm_bo_put+0x1eb/0x360 [amdttm]
 amdgpu_bo_free_kernel+0xf9/0x1f0 [amdgpu]
 amdgpu_ih_ring_fini+0x5a/0x90 [amdgpu]
 amdgpu_irq_fini_hw+0x58/0x80 [amdgpu]
 amdgpu_device_fini_hw+0x4e0/0x5b0 [amdgpu]
 amdgpu_driver_load_kms+0x60/0xa0 [amdgpu]
 amdgpu_pci_probe+0x28e/0x6d0 [amdgpu]
 pci_device_probe+0x19f/0x220
 really_probe+0x1ed/0x340
 driver_probe_device+0x1e/0x80
 __driver_attach+0xd3/0x1a0
 bus_for_each_dev+0x68/0xa0
 bus_add_driver+0x19f/0x270
 driver_register+0x5d/0xf0
 do_one_initcall+0xac/0x200
 do_init_module+0x1ec/0x280
 __se_sys_finit_module+0x2de/0x310
 do_syscall_64+0x6a/0x250
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Signed-off-by: Harish Kasiviswanathan 
Reviewed-by: Kent Russell 
Signed-off-by: Alex Deucher 
(cherry picked from commit 4623b958dd6da0f4c3026afdf330626a09ecb0f0)
Cc: stable@vger.kernel.org

drm/amd/display: detect_link_and_local_sink: DP alt mode timeout path leaks prev_sink reference

2026-07-01T17:02:23+00:00

prev_sink is unconditionally retained via dc_sink_retain at function
  entry, but the DP alt mode timeout path inside SIGNAL_TYPE_DISPLAY_PORT
  returns false without releasing prev_sink. All other return paths in the
  function correctly call dc_sink_release(prev_sink), making this the only
  missing cleanup.

Fixes: 54618888d1ea ("drm/amd/display: break down dc_link.c")
Signed-off-by: WenTao Liang 
Reviewed-by: Mario Limonciello (AMD) 
Link: https://patch.msgid.link/20260626124555.36910-1-vulab@iscas.ac.cn
Signed-off-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
(cherry picked from commit 45510cf662dcf46b5d8926d454f338809f107b9d)
Cc: stable@vger.kernel.org

drm/amd/pm: fix smu13 power limit range calculation

2026-07-01T17:02:15+00:00

SMU13 reports SocketPowerLimitAc/Dc as the default power limit, but
MsgLimits.Power may carry a different firmware bound for the same PPT
throttler. Using only the socket limit for both min and max can therefore
expose an incorrect power range.

Keep the socket limit as the default, but derive the range from both values:
use the lower value for the min base and the higher value for the max base
before applying OD percentages. Keep the current limit query independent
from the cap calculation.

Fixes: 1eaf26db9590 ("drm/amd/pm: fix smu13 power limit default/cap calculation")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5419
Signed-off-by: Yang Wang 
Reviewed-by: Kenneth Feng 
Signed-off-by: Alex Deucher 
(cherry picked from commit f45bbf0f62f266ed8422d84f347d75d5fca846a7)
Cc: stable@vger.kernel.org

drm/amdgpu: flush pending RCU callbacks on module unload

2026-07-01T17:02:09+00:00

Call rcu_barrier() in module exit to wait for outstanding call_rcu() callbacks
before freeing module text, preventing late callback execution in freed memory.

BUG: unable to handle page fault for address: ffffffffc1d59c40
PGD 6a12067 P4D 6a12067 PUD 6a14067 PMD 13698b067 PTE 0
Oops: 0010 [#1] SMP NOPTI
RIP: 0010:0xffffffffc1d59c40
Code: Unable to access opcode bytes at RIP 0xffffffffc1d59c16.
RSP: 0018:ffffc900198c0f28 EFLAGS: 00010286
RAX: ffffffffc1d59c40 RBX: ffff897c7d6b61c0 RCX: ffff88826aff4590
RDX: ffff8884d8b35490 RSI: ffffc900198c0f30 RDI: ffff88812af67290
RBP: 000000000000000a (DONE segment entries) R08: 0000000000000000 R09: 0000000000000100
R10: 0000000000000000 R11: ffffffff82a06100 R12: ffff88811a4e3700
R13: 0000000000000000 R14: ffff897c7d6b6270 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff897c7d680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc1d59c16 CR3: 00000104a980a001 CR4: 0000000002770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 
 ? rcu_do_batch+0x163/0x450
 ? rcu_core+0x177/0x1c0
 ? __do_softirq+0xc1/0x280
 ? asm_call_irq_on_stack+0xf/0x20
 
 ? do_softirq_own_stack+0x37/0x50
 ? irq_exit_rcu+0xc4/0x100
 ? sysvec_apic_timer_interrupt+0x36/0x80
 ? asm_sysvec_apic_timer_interrupt+0x12/0x20
 ? cpuidle_enter_state+0xd4/0x360
 ? cpuidle_enter+0x29/0x40
 ? cpuidle_idle_call+0x108/0x1a0
 ? do_idle+0x77/0xf0
 ? cpu_startup_entry+0x19/0x20
 ? secondary_startup_64_no_verify+0xbf/0xcb

Signed-off-by: Perry Yuan 
Reviewed-by: Yifan Zhang 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit feaa5039f6c12acc9aa934c2d45dcd251a12c69f)

drm/amdgpu: Fix AMDGPU_GTT_MAX_TRANSFER_SIZE for non-4K systems

2026-07-01T17:02:00+00:00

Running RCCL unit tests on a system with a 64K PAGE_SIZE triggers
the following warning and causes the test to terminate on latest
upstream kernel:

WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1335 at
amdgpu_bo_release_notify+0x1bc/0x280 [amdgpu],
CPU#18: rccl-UnitTests/33151

Call trace:
amdgpu_bo_release_notify
ttm_bo_release
amdgpu_gem_object_free
drm_gem_object_free
amdgpu_bo_unref
amdgpu_bo_create
amdgpu_bo_create_user
amdgpu_gem_object_create
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu
kfd_ioctl_alloc_memory_of_gpu
kfd_ioctl
sys_ioctl

The warning is triggered because
amdgpu_ttm_next_clear_entity() returns NULL when a clear buffer
operation is requested. This happens because the GART window
allocation for the default_entity, clear_entity and move_entity
fails during initialization.

Commit [1] introduced separate GART windows for the
default_entity, clear_entity and move_entity of each SDMA
instance. Their sizes are derived from
AMDGPU_GTT_MAX_TRANSFER_SIZE, which is currently defined as 1024
pages. This implicitly assumes a 4K PAGE_SIZE, where 1024 pages
correspond to a 4MB transfer. On a 64K PAGE_SIZE system, however,
the same value expands to 64MB.

The default_entity and clear_entity each allocate one
AMDGPU_GTT_MAX_TRANSFER_SIZE GART window, while the move_entity
allocates two such windows. This results in 16MB of GART space
per SDMA instance on a 4K PAGE_SIZE system, but 256MB per SDMA
instance on a 64K PAGE_SIZE system.

On an MI210 system with five SDMA instances and a 512MB GART
aperture, the total GART space required becomes 1.25GB,
exceeding the available GART aperture. Consequently, GART window
allocation fails, amdgpu_ttm_next_clear_entity() returns NULL,
and the above warning is triggered.

Redefine AMDGPU_GTT_MAX_TRANSFER_SIZE in bytes instead of page
units. Where a page count is required, convert it using
PAGE_SHIFT. This preserves the existing 4MB transfer size across
all PAGE_SIZE configurations while keeping GART window
allocations within the available GART aperture.

[1] https://lore.kernel.org/all/20260408100327.1372-3-pierre-eric.pelloux-prayer@amd.com/#t

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5435
Fixes: 897ee11ec020 ("drm/amdgpu: create multiple clear/move ttm entities")
Signed-off-by: Donet Tom 
Signed-off-by: Alex Deucher 
(cherry picked from commit 27213b776a666d3030de5acc3cd75278197b0494)
Cc: stable@vger.kernel.org

drm/amdkfd: Use kvcalloc to allocate arrays

2026-07-01T17:01:50+00:00

There were a few instances in kfd_chardev.c of kvzalloc being
used to allocate memory for an array.

Switch those to kvcalloc, which
- is the standard way of allocating a zero-initialized array
- does a check for the mul overflowing

Signed-off-by: David Francis 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
(cherry picked from commit 60b048c93f7a3add39757ad65fe2bb6e58eeae23)
Cc: stable@vger.kernel.org

drm/amdgpu: add support for GC IP version 11.7.1

2026-07-01T17:01:48+00:00

Initialize GC IP 11_7_1

Signed-off-by: Granthali Vinodkumar Dhandar 
Reviewed-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
(cherry picked from commit a928d8d81ec5cdb5a8944d08136720811efad0f6)

drm/amdgpu: add support for GC IP version 11.7.0

2026-07-01T17:01:42+00:00

Initialize GC IP 11_7_0

Signed-off-by: Granthali Vinodkumar Dhandar 
Reviewed-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
(cherry picked from commit cf591e67c095542a16475df293ec7bc9a118e4ee)

drm/amdgpu: add the doorbell index input for suspending userq

2026-07-01T17:01:35+00:00

It requires inputing the doorbell offset for MES firmware preempts the
userq, and adding the doorbell offset also keep aliging with the
union MESAPI__SUSPEND in MES firmware.

Signed-off-by: Prike Liang 
Acked-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit bc434335ab3c096a33a9e88c7951b4ac574db458)
Cc: stable@vger.kernel.org