linux.git/drivers/gpu/drm/amd/amdgpu, branch v7.1-rc4

drm/amdgpu/gfx_v12_0: set gfx.rs64_enable from PFP header on GFX12

2026-05-11T21:54:44+00:00

gfx_v12_0_init_microcode() always loads RS64 CP ucode but never set
adev->gfx.rs64_enable, so it stayed false and code that branches on it
(e.g. MEC pipe reset) used the legacy CP_MEC_CNTL path incorrectly.

Match GFX11: derive RS64 mode from the PFP firmware header (v2.0) via
amdgpu_ucode_hdr_version(). Log at debug when RS64 is enabled.

Reviewed-by: Alex Deucher 
Signed-off-by: Jesse Zhang 
Signed-off-by: Alex Deucher 
(cherry picked from commit b03d53598b0d2048e8fa7303b8d0784768ec4fa6)

drm/amd/ras: Fix CPER ring debugfs read overflow

2026-05-11T21:54:28+00:00

The legacy CPER debugfs reader can reach the payload path without a
valid pointer snapshot. The remaining user byte count is also treated as
the ring occupancy in dwords, so reads past the header can copy more than
requested.

Take the CPER lock before sampling pointers. Resample rptr/wptr for
payload reads, bound the payload copy by available dwords and the
remaining user size, and advance the file position for each dword copied.

Signed-off-by: Xiang Liu 
Reviewed-by: Tao Zhou 
Signed-off-by: Alex Deucher 
(cherry picked from commit 1e40ef87ffdc291e05ccdade8b9170cc9c1c4249)

drm/amdgpu: fix userq hang detection and reset

2026-05-11T21:47:11+00:00

Fix lock inversions pointed out by Prike and Sunil. The hang detection
timeout *CAN'T* grab locks under which we wait for fences, especially
not the userq_mutex lock.

Then instead of this completely broken handling with the
hang_detect_fence just cancel the work when fences are processed and
re-start if necessary.

Signed-off-by: Christian König 
Reviewed-by: Sunil Khatri 
Signed-off-by: Alex Deucher 
(cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)

drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues

2026-05-11T21:47:04+00:00

Well the reset handling seems broken on multiple levels.

As first step of fixing this remove most calls to the hang detection.
That function should only be called after we run into a timeout! And *NOT*
as random check spread over the code in multiple places.

Signed-off-by: Christian König 
Reviewed-by: Sunil Khatri 
Signed-off-by: Alex Deucher 
(cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)

drm/amdgpu: rework amdgpu_userq_signal_ioctl v3

2026-05-11T21:46:43+00:00

This one was fortunately not looking so bad as the wait ioctl path, but
there were still a few things which could be fixed/improved:

1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that
   before taking the userq_lock.
2. Use a new mutex as protection for the fence_drv_xa so that we can do
   memory allocations while holding it.
3. Starting the reset timer is unnecessary when the fence is already
   signaled when we create it.
4. Cleanup error handling, avoid trying to free the queue when we don't
   even got one.

v2: fix incorrect usage of xa_find, destroy the new mutex on error
v3: cleanup ref ordering

Signed-off-by: Christian König 
Reviewed-by: Sunil Khatri 
Signed-off-by: Alex Deucher 
(cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)

drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset

2026-05-11T21:46:34+00:00

The purpose of a GPU reset is to make sure that fence can be signaled
again and the signal and resume workers can make progress again.

So waiting for the resume worker or any fence in the GPU reset path is
just utterly nonsense.

Signed-off-by: Christian König 
Reviewed-by: Prike Liang 
Signed-off-by: Alex Deucher 
(cherry picked from commit fcd5f065eab46993af43442fd77ee8d9eb9c5bdf)

drm/amdgpu: nuke amdgpu_userq_fence_slab v2

2026-05-05T14:23:06+00:00

As preparation for independent fences remove the extra slab, kmalloc
should do just fine.

v2: use GFP_KERNEL instead of GFP_ATOMIC

Signed-off-by: Christian König 
Reviewed-by: Prike Liang 
Reviewed-by: Sunil Khatri 
Signed-off-by: Alex Deucher 
(cherry picked from commit 0d831487b5be0ae59cac865a0aa87b0acc3dc717)

drm/amdgpu/userq: fix access to stale wptr mapping

2026-05-05T14:22:13+00:00

Use drm_exec to take both locks i.e vm root bo and
wptr_obj bo to access the mapping data properly.

This fixes the security issue of unmap the wptr_obj while
a queue creation is in progress and passing other
bo at same address.

Signed-off-by: Sunil Khatri 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit 1fc6c8ab45dbee096469c08c13f6099d57a52d6c)
Cc: stable@vger.kernel.org

drm/amdgpu: zero-initialize GART table on allocation

2026-05-05T14:17:22+00:00

GART TLB is flushed after unmapping but not after mapping. Since
amdgpu_bo_create_kernel() does not zero-initialize the buffer, when a
single PTE is written the TLB may speculatively load other uninitialized
entries from the same cacheline. Those garbage entries can appear valid,
and a subsequent write to another PTE in the same cacheline may cause the
GPU to use a stale garbage PTE from the TLB.

Fix this by calling memset_io() to zero-initialize the GART table with
gart_pte_flags immediately after allocation.

Using AMDGPU_GEM_CREATE_VRAM_CLEARED, SDMA-based clear will not work
since SDMA needs GART to be initialized to work.

Suggested-by: Felix Kuehling 
Signed-off-by: Philip Yang 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit d9af8263b82b6eaa60c5718e0c6631c5037e4b24)
Cc: stable@vger.kernel.org

drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission

2026-05-05T14:16:09+00:00

sdma_v4_0_ring_emit_fence() contains two BUG_ON(addr & 0x3) assertions
that verify fence writeback addresses are dword-aligned.  These
assertions can be reached from unprivileged userspace via crafted
DRM_IOCTL_AMDGPU_CS submissions, causing a fatal kernel panic in a
scheduler worker thread.

Replace both BUG_ON() calls with WARN_ON() to log the condition without
crashing the kernel.  A misaligned fence address at this point indicates
a driver bug, but crashing the kernel is never the correct response when
the assertion is reachable from userspace.

The CS IOCTL path is the correct place to filter invalid submissions;
the ring emission callback is too late to do anything about it.

Fixes: 2130f89ced2c ("drm/amdgpu: add SDMA v4.0 implementation (v2)")
Reviewed-by: Christian König 
Signed-off-by: John B. Moore 
Signed-off-by: Alex Deucher 
(cherry picked from commit b90250bd933afd1ba94d86d6b13821997b22b18e)
Cc: stable@vger.kernel.org