linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c, branch v7.1-rc4

drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault

2026-04-24T15:10:12+00:00

svm_range_restore_pages might reserve the root bo so it must
be called after unreserving it.

Fixes: 1b135c6da061 ("drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault")
Signed-off-by: Pierre-Eric Pelloux-Prayer 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit 5cdc219fe86a1720aa4b5b4f42f11913146e6a93)

drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault

2026-04-03T18:55:07+00:00

This is tricky to implement right and we're going to need
it from the devcoredump.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled

2026-04-03T17:59:15+00:00

In function amdgpu_userq_gem_va_unmap_validate call
dma_resv_wait_timeout directly. Also since we are waiting
forever we should not be having any return value and hence
no handling needed.

Suggested-by: Christian König 
Signed-off-by: Sunil Khatri 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Handle GPU page faults correctly on non-4K page systems

2026-03-24T17:33:49+00:00

During a GPU page fault, the driver restores the SVM range and then maps it
into the GPU page tables. The current implementation passes a GPU-page-size
(4K-based) PFN to svm_range_restore_pages() to restore the range.

SVM ranges are tracked using system-page-size PFNs. On systems where the
system page size is larger than 4K, using GPU-page-size PFNs to restore the
range causes two problems:

Range lookup fails:
Because the restore function receives PFNs in GPU (4K) units, the SVM
range lookup does not find the existing range. This will result in a
duplicate SVM range being created.

VMA lookup failure:
The restore function also tries to locate the VMA for the faulting address.
It converts the GPU-page-size PFN into an address using the system page
size, which results in an incorrect address on non-4K page-size systems.
As a result, the VMA lookup fails with the message: "address 0xxxx VMA is
removed".

This patch passes the system-page-size PFN to svm_range_restore_pages() so
that the SVM range is restored correctly on non-4K page systems.

Acked-by: Christian König 
Signed-off-by: Donet Tom 
Signed-off-by: Alex Deucher

drm/amdgpu: prevent immediate PASID reuse case

2026-03-23T18:10:39+00:00

PASID resue could cause interrupt issue when process
immediately runs into hw state left by previous
process exited with the same PASID, it's possible that
page faults are still pending in the IH ring buffer when
the process exits and frees up its PASID. To prevent the
case, it uses idr cyclic allocator same as kernel pid's.

Signed-off-by: Eric Huang 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Move amdgpu_vm_is_bo_always_valid() before first use

2026-03-17T21:47:47+00:00

Smatch reports that 'bo' could be NULL in amdgpu_vm_bo_update(), even
though amdgpu_vm_is_bo_always_valid() already checks for a NULL BO.

Move amdgpu_vm_is_bo_always_valid() earlier in the file so the helper
definition appears before its first use. This allows static analysis
tools to see the NULL check performed by the helper and avoids the
warning.

Suggested-by: Tvrtko Ursulin 
Cc: Dan Carpenter 
Cc: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
Reviewed-by: Tvrtko Ursulin 
Signed-off-by: Alex Deucher

drm/amdgpu: rework how we handle TLB fences

2026-03-17T21:43:17+00:00

Add a new VM flag to indicate whether or not we need
a TLB fence.  Userqs (KFD or KGD) require a TLB fence.
A TLB fence is not strictly required for kernel queues,
but it shouldn't hurt.  That said, enabling this
unconditionally should be fine, but it seems to tickle
some issues in KIQ/MES.  Only enable them for KFD,
or when KGD userq queues are enabled (currently via module
parameter).

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749
Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update")
Cc: Christian König 
Cc: Prike Liang 
Reviewed-by: Prike Liang 
Signed-off-by: Alex Deucher

Revert "drm/amdgpu: revert to old status lock handling v4"

2026-03-17T14:28:47+00:00

This reverts commit 7a9419ab42699fd3d4c857ef81ae097d8d8d5899.

Reverting due to some of the probable issues caused by this change
and CI is blocked.

Signed-off-by: Sunil Khatri 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher

Merge tag 'amd-drm-next-7.1-2026-03-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

2026-03-16T06:50:53+00:00

amd-drm-next-7.1-2026-03-12:

amdgpu:
- SMU13 fix
- SMU14 fix
- Fixes for bring up hw testing
- Kerneldoc fix
- GC12 idle power fix for compute workloads
- DCCG fixes
- UserQ fixes
- Move test for fbdev object to a generic helper
- GC 12.1 updates
- Use struct drm_edid in non-DC code
- Include IP discovery data in devcoredump
- SMU 13.x updates
- Misc cleanups
- DML 2.1 fixes
- Enable NV12/P010 support on primary planes
- Enable color encoding and color range on overlay planes
- DC underflow fixes
- HWSS fast path fixes
- Replay fixes
- DCN 4.2 updates
- Support newer IP discovery tables
- LSDMA 7.1 support
- IH 7.1 fixes
- SoC v1 updates
- GC12.1 updates
- PSP 15 updates
- XGMI fixes
- GPUVM locking fix

amdkfd:
- Fix missing BO unreserve in an error path

radeon:
- Move test for fbdev object to a generic helper

From: Alex Deucher 
Link: https://patch.msgid.link/20260312184425.3875669-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie

drm/amdgpu: revert to old status lock handling v4

2026-03-11T17:58:08+00:00

It turned out that protecting the status of each bo_va with a
spinlock was just hiding problems instead of solving them.

Revert the whole approach, add a separate stats_lock and lockdep
assertions that the correct reservation lock is held all over the place.

This not only allows for better checks if a state transition is properly
protected by a lock, but also switching back to using list macros to
iterate over the state of lists protected by the dma_resv lock of the
root PD.

v2: re-add missing check
v3: split into two patches
v4: re-apply by fixing holding the VM lock at the right places.

Signed-off-by: Christian König 
Reviewed-by: Alex Deucher 
Reviewed-by: Sunil Khatri 
Signed-off-by: Alex Deucher