<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c, branch linux-6.10.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2</title>
<updated>2024-06-19T18:17:25+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2024-06-05T11:27:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8bd82363e2ee2eb3a9a8ea1fa94ebe1900d05a71'/>
<id>8bd82363e2ee2eb3a9a8ea1fa94ebe1900d05a71</id>
<content type='text'>
This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").

Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.

The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.

When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.

The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.

v2: improve the commit message

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
CC: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").

Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.

The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.

When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.

The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.

v2: improve the commit message

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
CC: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Simplify the allocation of fence slab caches</title>
<updated>2024-02-22T15:28:19+00:00</updated>
<author>
<name>Kunwu Chan</name>
<email>chentao@kylinos.cn</email>
</author>
<published>2024-02-21T09:59:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3d14cb026323f47dded416178c1cfa6125eae7d4'/>
<id>3d14cb026323f47dded416178c1cfa6125eae7d4</id>
<content type='text'>
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Kunwu Chan &lt;chentao@kylinos.cn&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Kunwu Chan &lt;chentao@kylinos.cn&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: add amdgpu runpm usage trace for separate funcs</title>
<updated>2023-11-17T14:30:51+00:00</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2023-11-09T03:37:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=425285d39afddaf4a9dab36045b816af0cc3e400'/>
<id>425285d39afddaf4a9dab36045b816af0cc3e400</id>
<content type='text'>
Add trace for amdgpu runpm separate funcs usage and this will
help debugging on the case of runpm usage missed to dereference.
In the normal case the runpm usage count referred by one kind
of functionality pairwise and usage should be changed from 1 to 0,
otherwise there will be an issue in the amdgpu runpm usage
dereference.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add trace for amdgpu runpm separate funcs usage and this will
help debugging on the case of runpm usage missed to dereference.
In the normal case the runpm usage count referred by one kind
of functionality pairwise and usage should be changed from 1 to 0,
otherwise there will be an issue in the amdgpu runpm usage
dereference.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Use function for IP version check</title>
<updated>2023-09-20T16:23:28+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2023-09-11T08:18:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4e8303cf2c4dd27374a16a8881ec1a1cd5baf86f'/>
<id>4e8303cf2c4dd27374a16a8881ec1a1cd5baf86f</id>
<content type='text'>
Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix</title>
<updated>2023-08-16T15:34:57+00:00</updated>
<author>
<name>Tim Huang</name>
<email>Tim.Huang@amd.com</email>
</author>
<published>2023-08-14T07:13:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=603b9a575d571557cf2de0d745d88b7c59b35be7'/>
<id>603b9a575d571557cf2de0d745d88b7c59b35be7</id>
<content type='text'>
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.

[  203.349571] [drm] Fence fallback timer expired on ring sdma0
[  203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[  203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0

For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.

Signed-off-by: Tim Huang &lt;Tim.Huang@amd.com&gt;
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.

[  203.349571] [drm] Fence fallback timer expired on ring sdma0
[  203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[  203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0

For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.

Signed-off-by: Tim Huang &lt;Tim.Huang@amd.com&gt;
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: mark force completed fences with -ECANCELED</title>
<updated>2023-06-15T15:37:55+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2023-04-17T10:52:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0a33b11d26c6b7e975b54d469a739ffac29f67ab'/>
<id>0a33b11d26c6b7e975b54d469a739ffac29f67ab</id>
<content type='text'>
When we force complete fences we should mark them as canceled.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we force complete fences we should mark them as canceled.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: add amdgpu_error_* debugfs file</title>
<updated>2023-06-15T15:37:54+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2023-04-19T10:51:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b13eb02ba8ba7617d41212121891756da31f1d8b'/>
<id>b13eb02ba8ba7617d41212121891756da31f1d8b</id>
<content type='text'>
This allows us to insert some error codes into the bottom of the pipeline
on an engine.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allows us to insert some error codes into the bottom of the pipeline
on an engine.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: remove unnecessary (void*) conversions</title>
<updated>2023-06-09T14:40:12+00:00</updated>
<author>
<name>Su Hui</name>
<email>suhui@nfschina.com</email>
</author>
<published>2023-05-15T01:34:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=109b4d8cfe4279da1cbcbcd99ae54cb2b2aee521'/>
<id>109b4d8cfe4279da1cbcbcd99ae54cb2b2aee521</id>
<content type='text'>
No need cast (void*) to (struct amdgpu_device *).

Signed-off-by: Su Hui &lt;suhui@nfschina.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No need cast (void*) to (struct amdgpu_device *).

Signed-off-by: Su Hui &lt;suhui@nfschina.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged</title>
<updated>2023-06-09T13:39:09+00:00</updated>
<author>
<name>Guchun Chen</name>
<email>guchun.chen@amd.com</email>
</author>
<published>2023-05-09T08:15:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3083b1007d4b8d377f8e2b5ce349a275a2fff725'/>
<id>3083b1007d4b8d377f8e2b5ce349a275a2fff725</id>
<content type='text'>
When performing device unbind or halt, we have disabled all irqs at the
very begining like amdgpu_pci_remove or amdgpu_device_halt. So
amdgpu_irq_put for irqs stored in fence driver should not be called
any more, otherwise, below calltrace will arrive.

[  139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu]
[  139.114655] Call Trace:
[  139.114655]  &lt;TASK&gt;
[  139.114657]  amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu]
[  139.114836]  amdgpu_device_fini_hw+0xb6/0x350 [amdgpu]
[  139.114955]  amdgpu_driver_unload_kms+0x51/0x70 [amdgpu]
[  139.115075]  amdgpu_pci_remove+0x63/0x160 [amdgpu]
[  139.115193]  ? __pm_runtime_resume+0x64/0x90
[  139.115195]  pci_device_remove+0x3a/0xb0
[  139.115197]  device_remove+0x43/0x70
[  139.115198]  device_release_driver_internal+0xbd/0x140

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When performing device unbind or halt, we have disabled all irqs at the
very begining like amdgpu_pci_remove or amdgpu_device_halt. So
amdgpu_irq_put for irqs stored in fence driver should not be called
any more, otherwise, below calltrace will arrive.

[  139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu]
[  139.114655] Call Trace:
[  139.114655]  &lt;TASK&gt;
[  139.114657]  amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu]
[  139.114836]  amdgpu_device_fini_hw+0xb6/0x350 [amdgpu]
[  139.114955]  amdgpu_driver_unload_kms+0x51/0x70 [amdgpu]
[  139.115075]  amdgpu_pci_remove+0x63/0x160 [amdgpu]
[  139.115193]  ? __pm_runtime_resume+0x64/0x90
[  139.115195]  pci_device_remove+0x3a/0xb0
[  139.115197]  device_remove+0x43/0x70
[  139.115198]  device_release_driver_internal+0xbd/0x140

Signed-off-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: improve wait logic at fence polling</title>
<updated>2023-06-09T13:39:06+00:00</updated>
<author>
<name>Alex Sierra</name>
<email>alex.sierra@amd.com</email>
</author>
<published>2023-04-24T19:27:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6e87c4229513904295674b84b6e2d12951567191'/>
<id>6e87c4229513904295674b84b6e2d12951567191</id>
<content type='text'>
Accomplish this by reading the seq number right away instead of sleep
for 5us. There are certain cases where the fence is ready almost
immediately. Sleep number granularity was also reduced as the majority
of the kiq tlb flush takes between 2us to 6us.

Signed-off-by: Alex Sierra &lt;alex.sierra@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Accomplish this by reading the seq number right away instead of sleep
for 5us. There are certain cases where the fence is ready almost
immediately. Sleep number granularity was also reduced as the majority
of the kiq tlb flush takes between 2us to 6us.

Signed-off-by: Alex Sierra &lt;alex.sierra@amd.com&gt;
Acked-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
