linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h, branch linux-6.13.y

drm/amdgpu: Unlocked unmap only clear page table leaves

2025-04-20T08:17:48+00:00

[ Upstream commit 23b645231eeffdaf44021debac881d2f26824150 ]

SVM migration unmap pages from GPU and then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for range
length >= huge page and free PTB bo, update mapping to alloc new PT bo.
There is race bug that the freed entry bo maybe still on the pt_free
list, reused when updating mapping and then freed, leave invalid PDE
entry and cause GPU page fault.

By setting the update to clear only one PDE entry or clear PTB, to
avoid unmap to free PTE bo. This fixes the race bug and improve the
unmap and map to GPU performance. Update mapping to huge page will
still free the PTB bo.

With this change, the vm->pt_freed list and work is not needed. Add
WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the
PTB.

Signed-off-by: Philip Yang 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin

drm/amdgpu: stop tracking visible memory stats

2024-11-04T16:32:40+00:00

Since on modern systems all of vram can be made visible anyways, to
simplify the new implementation, drops tracking how much memory is
visible for now. If this is really needed we can add it back on top of
the new implementation, or just report all the BOs as visible.

Signed-off-by: Yunxiang Li 
Reviewed-by: Christian König 
Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Alex Deucher

drm/amdgpu: Use drm_print_memory_stats helper from fdinfo

2024-10-08T13:43:25+00:00

Convert fdinfo memory stats to use the common drm_print_memory_stats
helper.

This achieves alignment with the common keys as documented in
drm-usage-stats.rst, adding specifically drm-total- key the driver was
missing until now.

Additionally I made the code stop skipping total size for objects which
currently do not have a backing store, and I added resident, active and
purgeable reporting.

Legacy keys have been preserved, with the outlook of only potentially
removing only the drm-memory- when the time gets right.

The example output now looks like this:

 pos:	0
 flags:	02100002
 mnt_id:	24
 ino:	1239
 drm-driver:	amdgpu
 drm-client-id:	4
 drm-pdev:	0000:04:00.0
 pasid:	32771
 drm-total-cpu:	0
 drm-shared-cpu:	0
 drm-active-cpu:	0
 drm-resident-cpu:	0
 drm-purgeable-cpu:	0
 drm-total-gtt:	2392 KiB
 drm-shared-gtt:	0
 drm-active-gtt:	0
 drm-resident-gtt:	2392 KiB
 drm-purgeable-gtt:	0
 drm-total-vram:	44564 KiB
 drm-shared-vram:	31952 KiB
 drm-active-vram:	0
 drm-resident-vram:	44564 KiB
 drm-purgeable-vram:	0
 drm-memory-vram:	44564 KiB
 drm-memory-gtt: 	2392 KiB
 drm-memory-cpu: 	0 KiB
 amd-memory-visible-vram:	44564 KiB
 amd-evicted-vram:	0 KiB
 amd-evicted-visible-vram:	0 KiB
 amd-requested-vram:	44564 KiB
 amd-requested-visible-vram:	11952 KiB
 amd-requested-gtt:	2392 KiB
 drm-engine-compute:	46464671 ns

v2:
 * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE.

Acked-by: Daniel Vetter 
Reviewed-by: Alex Deucher 
Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Rob Clark 
Signed-off-by: Alex Deucher

drm/amdgpu: re-work VM syncing

2024-09-06T21:36:24+00:00

Rework how VM operations synchronize to submissions. Provide an
amdgpu_sync container to the backends instead of an reservation
object and fill in the amdgpu_sync object in the higher layers
of the code.

No intended functional change, just prepares for upcomming changes.

Signed-off-by: Christian König 
Reviewed-by: Friedrich Vock 
Acked-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Change kfd/svm page fault drain handling

2024-08-23T14:55:13+00:00

When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and
not handle any incoming pages fault of this process until a deferred work item
got executed by default system wq. The time period of "not handle page fault"
can be long and is unpredicable. That is advese to kfd performance on page
faults recovery.

This patch uses time stamp of incoming page fault to decide to drop or recover
page fault. When app unmap vm ranges kfd records each gpu device's ih ring
current time stamp. These time stamps are used at kfd page fault recovery
routine.

Any page fault happened on unmapped ranges after unmap events is application
bug that accesses vm range after unmap. It is not driver work to cover that.

By using time stamp of page fault do not need drain page faults at deferred
work. So, the time period that kfd does not handle page faults is reduced
and can be controlled.

Signed-off-by: Xiaogang.Chen 
Reviewed-by: Philip Yang 
Signed-off-by: Alex Deucher

drm/amdgpu: add additional VM bits

2024-06-14T19:20:56+00:00

Add additional VM PTE bits.

Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_VG10

2024-06-05T15:25:13+00:00

This patch changes the implementation of AMDGPU_PTE_MTYPE_VG10,
clear the bits before setting the new one.

Suggested-by: Alex Deucher 
Signed-off-by: longlyao 
Signed-off-by: Shane Xiao 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_NV10

2024-06-05T15:02:43+00:00

This patch changes the implementation of AMDGPU_PTE_MTYPE_NV10,
clear the bits before setting the new one.

Suggested-by: Alex Deucher 
Signed-off-by: longlyao 
Signed-off-by: Shane Xiao 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_GFX12

2024-06-05T14:57:24+00:00

This patch changes the implementation of AMDGPU_PTE_MTYPE_GFX12,
clear the bits before setting the new one.
This fixed the potential issue that GFX12 setting memory to NC.

v2: Clear mtype field before setting the new one (Alex)
v3: Fix typo (Felix)

Suggested-by: Alex Deucher 
Signed-off-by: longlyao 
Signed-off-by: Shane Xiao 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Add amdgpu_bo_is_vm_bo helper

2024-05-17T21:40:38+00:00

Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
 * Rename helper and move to amdgpu_vm. (Christian)

v3:
 * Use Christian's kerneldoc.

v4:
 * Fixed logic inversion in amdgpu_vm_bo_get_memory.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher