linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h, branch v7.1-rc4

drm/amdgpu: Add bounds checking to ib_{get,set}_value

2026-04-03T17:48:10+00:00

The uvd/vce/vcn code accesses the IB at predefined offsets without
checking that the IB is large enough. Check the bounds here. The caller
is responsible for making sure it can handle arbitrary return values.

Also make the idx a uint32_t to prevent overflows causing the condition
to fail.

Signed-off-by: Benjamin Cheng 
Reviewed-by: Christian König 
Reviewed-by: Ruijing Dong 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: rework ring reset backup and reemit v9

2026-02-23T19:33:11+00:00

Store the start wptr and ib size in the IB fence. On queue
reset, save the ring contents of all IBs.

For reemit, reemit the entire IB state for non-guilty contexts.
For guilty contexts, replace the IB submission with nops, but reemit
the rest.  Split the reemit per fence and when we reemit, update the
wptr with the new values from reemit.  This allows us to reemit jobs
repeatedly as the wptrs get properly updated each time.

v2: further simplify the logic
v3: reemit vm state, not just vm fence
v4: just nop the IB and possibly the VM portion of the submission
v5: simplify the vm fence check
v6: split the vm and ib fences
v7: fix commit message
v8: use wptr rather than count_dw to calculate offsets
v9: fix missing documenation update spotted by the kernel test robot

Reviewed-by: Jesse Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: Make amdgpu_fence_emit() non-failing v2

2026-02-23T19:16:31+00:00

dma_fence_wait(old, false) is not interruptible and cannot return an
error. Drop the unreachable error handling in amdgpu_fence_emit().

Since the function can no longer fail, convert amdgpu_fence_emit() to
return void and remove return value handling from all callers.

v2:
- Add comment explaining why dma_fence_wait(..., false)
  return value is ignored (Alex)

Suggested-by: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: add a helper to calculate ring distance

2026-02-23T19:16:31+00:00

Add a helper to calculate the distance in DWs between
two wptrs.

Reviewed-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Alex Deucher

drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()

2026-01-21T19:27:45+00:00

The function no longer signals the fence so rename it to
better match what it does.

Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: always backup and reemit fences

2026-01-05T21:59:58+00:00

If when we backup the ring contents for reemit before a
ring reset, we skip jobs associated with the bad
context, however, we need to make sure the fences
are reemited as unprocessed submissions may depend on
them.

v2: clean up fence handling, make helpers static

Reviewed-by: Timur Kristóf 
Signed-off-by: Alex Deucher

drm/amdgpu: don't reemit ring contents more than once

2026-01-05T21:59:57+00:00

If we cancel a bad job and reemit the ring contents, and
we get another timeout, cancel everything rather than reemitting.
The wptr markers are only relevant for the original emit.  If
we reemit, the wptr markers are no longer correct.

Reviewed-by: Timur Kristóf 
Signed-off-by: Alex Deucher

drm/amdgpu: Expand kernel-doc in amdgpu_ring

2025-12-08T18:56:32+00:00

Expand the kernel-doc about amdgpu_ring and add some tiny improvements.

Cc: Alex Deucher 
Cc: Christian König 
Cc: Timur Kristóf 
Signed-off-by: Rodrigo Siqueira 
Reviewed-by: Alex Deucher 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Implement user queue reset functionality

2025-11-04T16:53:05+00:00

This patch adds robust reset handling for user queues (userq) to improve
recovery from queue failures. The key components include:

1. Queue detection and reset logic:
- amdgpu_userq_detect_and_reset_queues() identifies failed queues
- Per-IP detect_and_reset callbacks for targeted recovery
- Falls back to full GPU reset when needed

2. Reset infrastructure:
- Adds userq_reset_work workqueue for async reset handling
- Implements pre/post reset handlers for queue state management
- Integrates with existing GPU reset framework

3. Error handling improvements:
- Enhanced state tracking with HUNG state
- Automatic reset triggering on critical failures
- VRAM loss handling during recovery

4. Integration points:
- Added to device init/reset paths
- Called during queue destroy, suspend, and isolation events
- Handles both individual queue and full GPU resets

The reset functionality works with both gfx/compute and sdma queues,
providing better resilience against queue failures while minimizing
disruption to unaffected queues.

v2: add detection and reset calls when preemption/unmaped fails.
add a per device userq counter for each user queue type.(Alex)
v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex)
warn if the adev->userq_mutex is not held.
v4: make sure we have all of the uqm->userq_mutex held.
warn if the uqm->userq_mutex is not held.

v5: Use array for user queue type counters.(Alex)
all of the uqm->userq_mutex need to be held when calling detect and reset. (Alex)

v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process

v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo)
v8: remove atomic_set(&userq_mgr->userq_count[i], 0).
it should already be 0 since we kzalloc the structure (Alex)
v9: For consistency with kernel queues, We may want something like:
amdgpu_userq_is_reset_type_supported (Alex)

Signed-off-by: Jesse Zhang
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher

drm/amdgpu: clean up and unify hw fence handling

2025-10-13T18:14:35+00:00

Decouple the amdgpu fence from the amdgpu_job structure.
This lets us clean up the separate fence ops for the embedded
fence and other fences.  This also allows us to allocate the
vm fence up front when we allocate the job.

v2: Additional cleanup suggested by Christian
v3: Additional cleanups suggested by Christian
v4: Additional cleanups suggested by David and
    vm fence fix
v5: cast seqno (David)

Cc: David.Wu3@amd.com
Cc: christian.koenig@amd.com
Tested-by: David (Ming Qiang) Wu 
Reviewed-by: David (Ming Qiang) Wu 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher