linux.git/drivers/gpu/drm/amd/amdgpu, branch v7.2-rc1

drm/amdgpu: Use system unbound workqueue for soft IH ring

2026-06-17T22:36:38+00:00

Allow the kernel to dispatch the soft IH work on other CPUs.

Otherwise it can happen that the soft IH ring fills up
before it actually starts processing anything, which
can easily happen with retry page faults, in which case
the CP repeatedly spams the CPU with a lot of interrupts.

This significantly improves retry page fault handling on
GPUs that don't have the filter CAM and must rely on
software based filtering.

Reviewed-by: Tvrtko Ursulin 
Signed-off-by: Timur Kristóf 
Signed-off-by: Alex Deucher 
(cherry picked from commit 3cdff3c8b93c2834977224d9c2b201fc334dd184)

amdgpu/ih6.1: Fix minor version

2026-06-17T22:36:31+00:00

Report the correct version of IH v6.1 (previously it showed v6.0).

Reviewed-by: Tvrtko Ursulin 
Signed-off-by: Timur Kristóf 
Signed-off-by: Alex Deucher 
(cherry picked from commit 940d33ebbcdebaf095fade86e9c981ad8789aee2)

drm/amdgpu/gfx9: Fix Ring and IB test fail after mode2

2026-06-17T22:33:04+00:00

For Renior APU with gfx9, in some test scenarios with disabling
ring_reset, like accessing an unmapped invalid address, it can
trigger a gpu job timeout event, then driver uses Mode2 reset
to reset GPU, but after Mode2 compute Ring test and IB test fail
randomly. It because the HQDs of MECs are always active before or
after Mode2, that causes MECs use stale HQDs when MECs are unhalted
before driver restore MQDs, and causes CPC and CPF are still stuck
after Mode2, then causes compute Ring and IB tests fail.

So, add sequences to deactivate HQDs of MECs in suspend IP function
of the resetting process.

v2: Move all sequences into a new function gfx_v9_0_cp_mode2_clear_state (Ray Huang)
    To check reset Mode2 method in the if condition (Ray Huang)
v3: Move all sequences before Mode2 instead of after Mode2 (Timur Kristóf)
v4: Call amdgpu_gfx_rlc_enter/exit_safe_mode int the begin and end of
    gfx_v9_0_deactivate_kcq_hqd (Alex Deucher)

Signed-off-by: Jiqian Chen 
Reviewed-by: Huang Rui 
Reviewed-by: Timur Kristóf 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
(cherry picked from commit c3988a7ad4799514447294f04f063b422e0551df)
Cc: stable@vger.kernel.org

drm/amdgpu/uvd: Fix forcing MSG, FB BOs into VCPU segment when it isn't at 0 (v2)

2026-06-17T22:30:52+00:00

UVD 4.x and older can only access MSG, FEEDBACK buffers from a
specific 256M VRAM segment that the VCPU BO is also located in.
We already modify all placements of the given BO to ensure
the BO is placed within this segment.

Previously, it always assumed that the VCPU segment is
the first 256M of VRAM, even though under some conditions
the VCPU BO could be allocated outside this segment,
which made UVD non-functional as the BOs were
not inside the same segment as the UVD VCPU BO.

Solve that by using the segment where the VCPU BO actually is.

This fixes an issue with UVD failing to initialize on SI/CIK
when resizable BAR is enabled and the VCPU BO is allocated
in a different segment.

v2:
- For other BOs, keep using the same UVD segment as before.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3851
Reviewed-by: Christian König 
Signed-off-by: Timur Kristóf 
Signed-off-by: Alex Deucher 
(cherry picked from commit cbfd4d3fc2061a1ec8e9d36e65973ac3e813358a)
Cc: stable@vger.kernel.org

drm/amdgpu/uvd: Place VCPU BO only in VRAM for UVD 4.x and older

2026-06-17T22:29:52+00:00

These UVD versions don't fully support GPUVM and are only
validated to work when their VCPU BO is placed in VRAM.

Signed-off-by: Timur Kristóf 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit 01b8dfc0660db5d6cdd62c22dc20f774a26ce853)
Cc: stable@vger.kernel.org

drm/amdgpu: Fix amdgpu_bo_move() when old_mem and new_mem are both GTT

2026-06-17T22:28:41+00:00

The UVD code relies on GTT to GTT moves in order to ensure
that its BOs don't cross 256M segments.

Fixes: bfe5e585b44f ("drm/ttm: move last binding into the drivers.")
Signed-off-by: Timur Kristóf 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit 21fd45e5e2628d00b478590bcc3d14d3de5d45b6)
Cc: stable@vger.kernel.org

drm/amdgpu: Respect placement requirements in amdgpu_gtt_mgr functions

2026-06-17T22:27:50+00:00

When testing intersection and compatibility, respect
the actual placement requirements. This is a pre-requisite
for ensuring that UVD CS BOs do not cross 256M segments.

Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible functions")
Suggested-by: Christian König 
Signed-off-by: Timur Kristóf 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit bc06579ca29dee9c245a41b12e39c7bb6938af5d)
Cc: stable@vger.kernel.org

drm/amdgpu: Fix context pstate override handling

2026-06-17T22:26:40+00:00

There are several problems in the context pstate handling code.

The most serious ones are potential use-after-free and NULL pointer
dereferences at context initialization time. Both are due
amdgpu_ctx_init() not holding the adev->pm.stable_pstate_ctx_lock, which
is otherwise used from both sysfs and the context code itself for
modifying and clearing the stored context pointer.

Second issue is that context fini can trample over the pstate
configuration set via sysfs. This is due the restore state
(ctx->stable_pstate) being saved at context init time, and not if, or when
the context actually changes the pstate. As the context exits it will
therefore incorrectly restore to what was set before the sysfs override
was requested.

The simplest fix is to drastically simplify how the state is tracked, by
clearly defining the points at which pstate ownership is taken and
released, and to handle all transitions under the correct lock.

Instead of at context init time, the previous state is saved only at the
point the context overrides the current state, and is restored on context
exit only if the context is still the owner of the current override state.

Signed-off-by: Tvrtko Ursulin 
Fixes: 79610d304133 ("drm/amdgpu: fix pstate setting issue")
Cc: Chengming Gui 
Cc: Alex Deucher 
Cc: "Christian König" 
Signed-off-by: Alex Deucher 
(cherry picked from commit 1b5e413713c0a93bc1818394d0ce49aaad21bd27)
Cc:  # v6.1+

drm/amdkfd: Let driver decide buffer size at AMDKFD_IOC_GET_DMABUF_INFO ioctl

2026-06-17T22:24:38+00:00

amdkfd driver needs allocate buffer to return bo metadata to user space. The
buffer size is controlled by user currently. It is a potential security issue
that hostile value (e.g. 2 GiB) lets any render-group user trigger order-MAX
allocation/OOM in kernel context.

This patch first finds bo metadata size. If the size is smaller than user
provided value drive can safely allocate buffer in kernel space and copy to
user space buffer. If not, driver will let user know, not allocate and copy.
User will redo with new buffer in user space.

This patch lets driver decide buffer allocation size to avoid potential hostile
size from user space.

Signed-off-by: Xiaogang Chen 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
(cherry picked from commit f54ce9e8cbd3abe0eda3a285f54dc4f572fe589a)

drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format

2026-06-17T22:23:43+00:00

When dumping IB contents from a hung job, amdgpu_devcoredump_format()
acquired the VM root PD's reservation via amdgpu_vm_lock_by_pasid() and
then, for each IB, called amdgpu_bo_reserve() on the BO backing the IB.
Both reservations are reservation_ww_class_mutex objects and neither
used a ww_acquire_ctx, which trips lockdep:

  WARNING: possible recursive locking detected
  --------------------------------------------
  kworker/u128:0 is trying to acquire lock:
  ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
    at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]

  but task is already holding lock:
  ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
    at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]

   Possible unsafe locking scenario:
         CPU0
         ----
    lock(reservation_ww_class_mutex);
    lock(reservation_ww_class_mutex);

   *** DEADLOCK ***
   May be due to missing lock nesting notation

  Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
  Call Trace:
   __ww_mutex_lock.constprop.0
   ww_mutex_lock
   amdgpu_bo_reserve
   amdgpu_devcoredump_format+0x1594 [amdgpu]
   amdgpu_devcoredump_deferred_work+0xea [amdgpu]

The two reservations are on different BOs in the captured trace, so the
splat is a lockdep-correctness warning, not an observed deadlock. It
becomes a real self-deadlock whenever the IB BO shares its dma_resv with
the root PD (the always-valid case, see amdgpu_vm_is_bo_always_valid()):
amdgpu_bo_reserve(abo) re-acquires the same ww_mutex without a ticket
and blocks forever. With amdgpu.gpu_recovery=0 the timeout handler
refires every ~2 s and each invocation produces this splat, drowning the
kernel ring buffer.

Now that amdgpu_vm_lock_by_pasid() takes a drm_exec context, move the IB
dumping into a separate helper that locks the root PD and every IB BO
together in a single drm_exec ticket. DRM_EXEC_IGNORE_DUPLICATES handles
IB BOs that share a dma_resv (e.g. always-valid BOs, or two IBs backed
by the same BO). Every lock is now a top-level acquire under one
ww_acquire_ctx, so the recursive ww_mutex condition is gone, and the
per-IB amdgpu_bo_reserve()/amdgpu_bo_unref() dance -- including a BO
refcount leak on the amdgpu_bo_reserve() failure path -- is removed.

Fixes: 7b15fc2d1f1a ("drm/amdgpu: dump job ibs in the devcoredump")
Suggested-by: Christian König 
Signed-off-by: Mikhail Gavrilov 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit d6bf4242731219ee08ce54c365631e395486651e)