linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c, branch linux-6.10.y

drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2

2024-06-19T18:17:25+00:00

This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").

Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.

The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.

When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.

The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.

v2: improve the commit message

Signed-off-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
CC: stable@vger.kernel.org

drm/amdgpu: Simplify the allocation of fence slab caches

2024-02-22T15:28:19+00:00

Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Reviewed-by: Christian König 
Signed-off-by: Kunwu Chan 
Signed-off-by: Alex Deucher

drm/amdgpu: add amdgpu runpm usage trace for separate funcs

2023-11-17T14:30:51+00:00

Add trace for amdgpu runpm separate funcs usage and this will
help debugging on the case of runpm usage missed to dereference.
In the normal case the runpm usage count referred by one kind
of functionality pairwise and usage should be changed from 1 to 0,
otherwise there will be an issue in the amdgpu runpm usage
dereference.

Signed-off-by: Prike Liang 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Use function for IP version check

2023-09-20T16:23:28+00:00

Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix

2023-08-16T15:34:57+00:00

GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.

[  203.349571] [drm] Fence fallback timer expired on ring sdma0
[  203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[  203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0

For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.

Signed-off-by: Tim Huang 
Reviewed-by: Mario Limonciello 
Signed-off-by: Alex Deucher

drm/amdgpu: mark force completed fences with -ECANCELED

2023-06-15T15:37:55+00:00

When we force complete fences we should mark them as canceled.

Signed-off-by: Christian König 
Reviewed-by: Luben Tuikov 
Signed-off-by: Alex Deucher

drm/amdgpu: add amdgpu_error_* debugfs file

2023-06-15T15:37:54+00:00

This allows us to insert some error codes into the bottom of the pipeline
on an engine.

Signed-off-by: Christian König 
Reviewed-by: Luben Tuikov 
Signed-off-by: Alex Deucher

drm/amdgpu: remove unnecessary (void*) conversions

2023-06-09T14:40:12+00:00

No need cast (void*) to (struct amdgpu_device *).

Signed-off-by: Su Hui 
Signed-off-by: Alex Deucher

drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged

2023-06-09T13:39:09+00:00

When performing device unbind or halt, we have disabled all irqs at the
very begining like amdgpu_pci_remove or amdgpu_device_halt. So
amdgpu_irq_put for irqs stored in fence driver should not be called
any more, otherwise, below calltrace will arrive.

[  139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu]
[  139.114655] Call Trace:
[  139.114655]  
[  139.114657]  amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu]
[  139.114836]  amdgpu_device_fini_hw+0xb6/0x350 [amdgpu]
[  139.114955]  amdgpu_driver_unload_kms+0x51/0x70 [amdgpu]
[  139.115075]  amdgpu_pci_remove+0x63/0x160 [amdgpu]
[  139.115193]  ? __pm_runtime_resume+0x64/0x90
[  139.115195]  pci_device_remove+0x3a/0xb0
[  139.115197]  device_remove+0x43/0x70
[  139.115198]  device_release_driver_internal+0xbd/0x140

Signed-off-by: Guchun Chen 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: improve wait logic at fence polling

2023-06-09T13:39:06+00:00

Accomplish this by reading the seq number right away instead of sleep
for 5us. There are certain cases where the fence is ready almost
immediately. Sleep number granularity was also reduced as the majority
of the kiq tlb flush takes between 2us to 6us.

Signed-off-by: Alex Sierra 
Acked-by: Felix Kuehling 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher