<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c, branch v7.1-rc5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>drm/amdgpu: fix handling in amdgpu_userq_create</title>
<updated>2026-05-19T16:25:32+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-27T14:31:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b6fe4ff340560ecf39e10733366f85832550699a'/>
<id>b6fe4ff340560ecf39e10733366f85832550699a</id>
<content type='text'>
Well mostly the same issues the other code had as well:

1. Memory allocation while holding the userq_mutex lock is forbidden!
2. Things were created/started/published in the wrong order.
3. The reset lock was taken in the wrong order and seems to be
   unecessary in the first place.
4. Error messages on invalid input parameters can spam the logs.
5. Error messages on memory allocation failures are usually superflous
   as well.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 89e50de5654dbe7a137e03d78629542e17ba7202)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Well mostly the same issues the other code had as well:

1. Memory allocation while holding the userq_mutex lock is forbidden!
2. Things were created/started/published in the wrong order.
3. The reset lock was taken in the wrong order and seems to be
   unecessary in the first place.
4. Error messages on invalid input parameters can spam the logs.
5. Error messages on memory allocation failures are usually superflous
   as well.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 89e50de5654dbe7a137e03d78629542e17ba7202)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: userq_va_mapped should remain true once done</title>
<updated>2026-05-19T16:15:49+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-05-13T07:59:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b6a28b77b88e776a8cc7739005718e03c4c9be57'/>
<id>b6a28b77b88e776a8cc7739005718e03c4c9be57</id>
<content type='text'>
Multiple queues needs these bo_va objects belonging to
the same uq_mgr. So once they are mapped lets not unmap
them as at any point of time any of the queues might be
using it.

Also userq_va_mapped should be a boolean than atomic.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 5c02889ea22575c3bcfdf212e65fac316cbc6c6a)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Multiple queues needs these bo_va objects belonging to
the same uq_mgr. So once they are mapped lets not unmap
them as at any point of time any of the queues might be
using it.

Also userq_va_mapped should be a boolean than atomic.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 5c02889ea22575c3bcfdf212e65fac316cbc6c6a)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: remove va cursors for all mappings</title>
<updated>2026-05-19T16:09:01+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-05-12T16:59:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d892a6eca7a847dd57ede6e55d7494a01b43fa5f'/>
<id>d892a6eca7a847dd57ede6e55d7494a01b43fa5f</id>
<content type='text'>
va_cursor struct needs to be cleaned even if the mapping
has been removed already.

Also simplify it by make it a void function as return value
check isn't needed as its called during tear down.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 4d35a45c9b4c1ac5b6e3219f83c3db706b675fa2)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
va_cursor struct needs to be cleaned even if the mapping
has been removed already.

Also simplify it by make it a void function as return value
check isn't needed as its called during tear down.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 4d35a45c9b4c1ac5b6e3219f83c3db706b675fa2)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/userq: update the vm task info during signal ioctl</title>
<updated>2026-05-19T16:08:02+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-05-12T10:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0be97436bf249503b2c5898e7459744962b70f15'/>
<id>0be97436bf249503b2c5898e7459744962b70f15</id>
<content type='text'>
Pagefaults does not have process information correctly populated
as vm-&gt;task is not set during vm_init but should be updated while
real submission. So setting that up during signal_ioctl to get
the correct submission process details.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a9b14d88b4d83e21ab965f23d1fb7b07b87e0517)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pagefaults does not have process information correctly populated
as vm-&gt;task is not set during vm_init but should be updated while
real submission. So setting that up during signal_ioctl to get
the correct submission process details.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a9b14d88b4d83e21ab965f23d1fb7b07b87e0517)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/userq: cancel reset work while tear down in progress</title>
<updated>2026-05-19T16:07:52+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-05-12T09:22:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=291df3dc7d10855c26841b8f42843bc996d097cf'/>
<id>291df3dc7d10855c26841b8f42843bc996d097cf</id>
<content type='text'>
While tear down of a userq_mgr is happening when all the queues
are free we should cancel any reset work if pending before exiting.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 160164609f71f774c4f661227a9b7a370a86b112)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While tear down of a userq_mgr is happening when all the queues
are free we should cancel any reset work if pending before exiting.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 160164609f71f774c4f661227a9b7a370a86b112)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: rework userq reset work handling</title>
<updated>2026-05-19T16:07:42+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-21T10:39:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c8ed2de0f2ee842d108ef96c125e931ea82b453c'/>
<id>c8ed2de0f2ee842d108ef96c125e931ea82b453c</id>
<content type='text'>
It is illegal to schedule reset work from another reset work!

Fix this by scheduling the userq reset work directly on the work queue
of the reset domain.

Not fully tested, I leave that to the IGT test cases.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit fd9200ccefab94f27877d1943761d6b0ccbd89c8)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is illegal to schedule reset work from another reset work!

Fix this by scheduling the userq reset work directly on the work queue
of the reset domain.

Not fully tested, I leave that to the IGT test cases.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit fd9200ccefab94f27877d1943761d6b0ccbd89c8)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/userq: pin mqd and fw object bo to avoid eviction</title>
<updated>2026-05-19T16:07:36+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2026-05-08T10:28:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=be045c5c8305fca1214ebdd4b8433ac80635339b'/>
<id>be045c5c8305fca1214ebdd4b8433ac80635339b</id>
<content type='text'>
mqd and fw objects are queue core objects which should remain
valid and never be unmapped and evicted for user queues to work
properly.

During eviction if these buffers are evicted the hw continue to
use the invalid addresses and caused page faults and system hung.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a3bbf32a336939a1d21b9561f8e53333b684b7ef)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mqd and fw objects are queue core objects which should remain
valid and never be unmapped and evicted for user queues to work
properly.

During eviction if these buffers are evicted the hw continue to
use the invalid addresses and caused page faults and system hung.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a3bbf32a336939a1d21b9561f8e53333b684b7ef)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix userq hang detection and reset</title>
<updated>2026-05-11T21:47:11+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-20T14:08:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0071e01c61aa73f22edd5252f0a3e2c2eff744d6'/>
<id>0071e01c61aa73f22edd5252f0a3e2c2eff744d6</id>
<content type='text'>
Fix lock inversions pointed out by Prike and Sunil. The hang detection
timeout *CAN'T* grab locks under which we wait for fences, especially
not the userq_mutex lock.

Then instead of this completely broken handling with the
hang_detect_fence just cancel the work when fences are processed and
re-start if necessary.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix lock inversions pointed out by Prike and Sunil. The hang detection
timeout *CAN'T* grab locks under which we wait for fences, especially
not the userq_mutex lock.

Then instead of this completely broken handling with the
hang_detect_fence just cancel the work when fences are processed and
re-start if necessary.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues</title>
<updated>2026-05-11T21:47:04+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-20T13:13:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d0053441ad7eaf7920b71e2263097ece53c5af34'/>
<id>d0053441ad7eaf7920b71e2263097ece53c5af34</id>
<content type='text'>
Well the reset handling seems broken on multiple levels.

As first step of fixing this remove most calls to the hang detection.
That function should only be called after we run into a timeout! And *NOT*
as random check spread over the code in multiple places.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Well the reset handling seems broken on multiple levels.

As first step of fixing this remove most calls to the hang detection.
That function should only be called after we run into a timeout! And *NOT*
as random check spread over the code in multiple places.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: rework amdgpu_userq_signal_ioctl v3</title>
<updated>2026-05-11T21:46:43+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-04-16T13:32:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=44e5bc73bdcbec115cfe1c4b4394d25eb3deaff2'/>
<id>44e5bc73bdcbec115cfe1c4b4394d25eb3deaff2</id>
<content type='text'>
This one was fortunately not looking so bad as the wait ioctl path, but
there were still a few things which could be fixed/improved:

1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that
   before taking the userq_lock.
2. Use a new mutex as protection for the fence_drv_xa so that we can do
   memory allocations while holding it.
3. Starting the reset timer is unnecessary when the fence is already
   signaled when we create it.
4. Cleanup error handling, avoid trying to free the queue when we don't
   even got one.

v2: fix incorrect usage of xa_find, destroy the new mutex on error
v3: cleanup ref ordering

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This one was fortunately not looking so bad as the wait ioctl path, but
there were still a few things which could be fixed/improved:

1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that
   before taking the userq_lock.
2. Use a new mutex as protection for the fence_drv_xa so that we can do
   memory allocations while holding it.
3. Starting the reset timer is unnecessary when the fence is already
   signaled when we create it.
4. Cleanup error handling, avoid trying to free the queue when we don't
   even got one.

v2: fix incorrect usage of xa_find, destroy the new mutex on error
v3: cleanup ref ordering

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)
</pre>
</div>
</content>
</entry>
</feed>
