<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c, branch v7.1-rc4</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>drm/amdgpu: fix userq hang detection and reset</title>
<updated>2026-05-11T21:47:11+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-20T14:08:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0071e01c61aa73f22edd5252f0a3e2c2eff744d6'/>
<id>0071e01c61aa73f22edd5252f0a3e2c2eff744d6</id>
<content type='text'>
Fix lock inversions pointed out by Prike and Sunil. The hang detection
timeout *CAN'T* grab locks under which we wait for fences, especially
not the userq_mutex lock.

Then instead of this completely broken handling with the
hang_detect_fence just cancel the work when fences are processed and
re-start if necessary.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix lock inversions pointed out by Prike and Sunil. The hang detection
timeout *CAN'T* grab locks under which we wait for fences, especially
not the userq_mutex lock.

Then instead of this completely broken handling with the
hang_detect_fence just cancel the work when fences are processed and
re-start if necessary.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues</title>
<updated>2026-05-11T21:47:04+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-20T13:13:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d0053441ad7eaf7920b71e2263097ece53c5af34'/>
<id>d0053441ad7eaf7920b71e2263097ece53c5af34</id>
<content type='text'>
Well the reset handling seems broken on multiple levels.

As first step of fixing this remove most calls to the hang detection.
That function should only be called after we run into a timeout! And *NOT*
as random check spread over the code in multiple places.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Well the reset handling seems broken on multiple levels.

As first step of fixing this remove most calls to the hang detection.
That function should only be called after we run into a timeout! And *NOT*
as random check spread over the code in multiple places.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: rework amdgpu_userq_signal_ioctl v3</title>
<updated>2026-05-11T21:46:43+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-16T13:32:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=44e5bc73bdcbec115cfe1c4b4394d25eb3deaff2'/>
<id>44e5bc73bdcbec115cfe1c4b4394d25eb3deaff2</id>
<content type='text'>
This one was fortunately not looking so bad as the wait ioctl path, but
there were still a few things which could be fixed/improved:

1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that
   before taking the userq_lock.
2. Use a new mutex as protection for the fence_drv_xa so that we can do
   memory allocations while holding it.
3. Starting the reset timer is unnecessary when the fence is already
   signaled when we create it.
4. Cleanup error handling, avoid trying to free the queue when we don't
   even got one.

v2: fix incorrect usage of xa_find, destroy the new mutex on error
v3: cleanup ref ordering

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This one was fortunately not looking so bad as the wait ioctl path, but
there were still a few things which could be fixed/improved:

1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that
   before taking the userq_lock.
2. Use a new mutex as protection for the fence_drv_xa so that we can do
   memory allocations while holding it.
3. Starting the reset timer is unnecessary when the fence is already
   signaled when we create it.
4. Cleanup error handling, avoid trying to free the queue when we don't
   even got one.

v2: fix incorrect usage of xa_find, destroy the new mutex on error
v3: cleanup ref ordering

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset</title>
<updated>2026-05-11T21:46:34+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-20T18:18:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d5971c5c34303a00bf841a902ca00a703602c500'/>
<id>d5971c5c34303a00bf841a902ca00a703602c500</id>
<content type='text'>
The purpose of a GPU reset is to make sure that fence can be signaled
again and the signal and resume workers can make progress again.

So waiting for the resume worker or any fence in the GPU reset path is
just utterly nonsense.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit fcd5f065eab46993af43442fd77ee8d9eb9c5bdf)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The purpose of a GPU reset is to make sure that fence can be signaled
again and the signal and resume workers can make progress again.

So waiting for the resume worker or any fence in the GPU reset path is
just utterly nonsense.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit fcd5f065eab46993af43442fd77ee8d9eb9c5bdf)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: clean up the userq unmap error handler</title>
<updated>2026-04-28T19:51:18+00:00</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2026-04-27T12:06:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8f935acbc18ff7ad09cb812528b28c59c78f10f9'/>
<id>8f935acbc18ff7ad09cb812528b28c59c78f10f9</id>
<content type='text'>
amdgpu_userq_unmap_helper() already handles the unmap error case.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 66cb6579990b633ccc7300c27011d837b9a58da0)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
amdgpu_userq_unmap_helper() already handles the unmap error case.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 66cb6579990b633ccc7300c27011d837b9a58da0)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: rework userq fence signal processing</title>
<updated>2026-04-28T19:43:57+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-20T14:08:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d2f272a36e1b4b857165021cfb2689a92efff2f5'/>
<id>d2f272a36e1b4b857165021cfb2689a92efff2f5</id>
<content type='text'>
Move more code into a common userq function.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 12f52fab11500d0dce7d23c71909eaf0cf9aa701)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move more code into a common userq function.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 12f52fab11500d0dce7d23c71909eaf0cf9aa701)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: avoid double drm_exec_fini() in userq validate</title>
<updated>2026-04-24T15:08:58+00:00</updated>
<author>
<name>Hongyan Xu</name>
<email>getshell@seu.edu.cn</email>
</author>
<published>2026-04-22T12:38:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=508babf310365f1107a2e8831c267c292a286818'/>
<id>508babf310365f1107a2e8831c267c292a286818</id>
<content type='text'>
When new_addition is true, amdgpu_userq_vm_validate() calls
drm_exec_fini(&amp;exec) before iterating over the collected HMM ranges and
calling amdgpu_ttm_tt_get_user_pages().

If amdgpu_ttm_tt_get_user_pages() fails in that path, the code jumps to
unlock_all and calls drm_exec_fini(&amp;exec) a second time on the same
exec object. drm_exec_fini() is not idempotent: it frees exec-&gt;objects
and may also drop exec-&gt;contended and finalize the ww acquire context.

Route that error path directly to the range cleanup once exec has
already been finalized.

Fixes: 42f148788469 ("drm/amdgpu/userqueue: validate userptrs for userqueues")
Issue found using a prototype static analysis tool
and confirmed by code review.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Hongyan Xu &lt;getshell@seu.edu.cn&gt;
Signed-off-by: Slavin Liu &lt;220245772@seu.edu.cn&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 2802952e4a07306da6ebe813ff1acacc5691851a)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When new_addition is true, amdgpu_userq_vm_validate() calls
drm_exec_fini(&amp;exec) before iterating over the collected HMM ranges and
calling amdgpu_ttm_tt_get_user_pages().

If amdgpu_ttm_tt_get_user_pages() fails in that path, the code jumps to
unlock_all and calls drm_exec_fini(&amp;exec) a second time on the same
exec object. drm_exec_fini() is not idempotent: it frees exec-&gt;objects
and may also drop exec-&gt;contended and finalize the ww acquire context.

Route that error path directly to the range cleanup once exec has
already been finalized.

Fixes: 42f148788469 ("drm/amdgpu/userqueue: validate userptrs for userqueues")
Issue found using a prototype static analysis tool
and confirmed by code review.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Hongyan Xu &lt;getshell@seu.edu.cn&gt;
Signed-off-by: Slavin Liu &lt;220245772@seu.edu.cn&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 2802952e4a07306da6ebe813ff1acacc5691851a)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/userq: unpin and unref doorbell and wptr outside mutex</title>
<updated>2026-04-17T19:41:12+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-04-13T12:53:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b250a43bf57e544071a834a7f4223dcc58270a6b'/>
<id>b250a43bf57e544071a834a7f4223dcc58270a6b</id>
<content type='text'>
In amdgpu_userq_destroy once unmap_helpder is called within mutex
there is no need to hold mutex.

This helps in avoiding a deadlock between doorbell and wptr ww mutex
and we could unpin and unref these bos outside mutex safely.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In amdgpu_userq_destroy once unmap_helpder is called within mutex
there is no need to hold mutex.

This helps in avoiding a deadlock between doorbell and wptr ww mutex
and we could unpin and unref these bos outside mutex safely.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/userq: use pm_runtime_resume_and_get and fix err handling</title>
<updated>2026-04-17T19:41:12+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-04-11T08:11:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d3a9fe4584ffb4717e5362d8259794c6220fc465'/>
<id>d3a9fe4584ffb4717e5362d8259794c6220fc465</id>
<content type='text'>
Use pm_runtime_resume_and_get instead of pm_runtime_get_sync as it
return error but put the reference in the function itself.

In goto statements we need to drop the pm reference too.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use pm_runtime_resume_and_get instead of pm_runtime_get_sync as it
return error but put the reference in the function itself.

In goto statements we need to drop the pm reference too.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/userq: unmap_helper dont return the queue state</title>
<updated>2026-04-17T19:41:12+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-04-13T06:16:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1e8b7062d2a8f7cecdbf7ae3fd07efc49a300d0f'/>
<id>1e8b7062d2a8f7cecdbf7ae3fd07efc49a300d0f</id>
<content type='text'>
We check for return value of amdgpu_userq_unmap_helper and
compare it against the queue-&gt;state which is logically
wrong and we should just check for failure and do the needfull.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We check for return value of amdgpu_userq_unmap_helper and
compare it against the queue-&gt;state which is logically
wrong and we should just check for failure and do the needfull.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
