<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c, branch v6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>drm/amd/amdkfd: Evict all queues even HWS remove queue failed</title>
<updated>2025-03-12T18:59:21+00:00</updated>
<author>
<name>Yifan Zha</name>
<email>Yifan.Zha@amd.com</email>
</author>
<published>2025-03-05T05:14:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0882ca4eecfe8b0013f339144acf886a0a0de41f'/>
<id>0882ca4eecfe8b0013f339144acf886a0a0de41f</id>
<content type='text'>
[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
Then remaining queues are not evicted and in active state.

After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
So remove queue will be failed again.

[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain-&gt;sem.

v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.

Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Yifan Zha &lt;Yifan.Zha@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 42c854b8fb0cce512534aa2b7141948e80c6ebb0)
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
Then remaining queues are not evicted and in active state.

After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
So remove queue will be failed again.

[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain-&gt;sem.

v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.

Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Yifan Zha &lt;Yifan.Zha@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 42c854b8fb0cce512534aa2b7141948e80c6ebb0)
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1</title>
<updated>2025-01-28T21:22:02+00:00</updated>
<author>
<name>Jay Cornwall</name>
<email>jay.cornwall@amd.com</email>
</author>
<published>2025-01-16T20:36:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f214b7beb00621b983e67ce97477afc3ab4b38f4'/>
<id>f214b7beb00621b983e67ce97477afc3ab4b38f4</id>
<content type='text'>
The purpose of halt_if_hws_hang is to preserve GPU state for driver
debugging when queue preemption fails. Issuing per-queue reset may
kill wavefronts which caused the preemption failure.

Signed-off-by: Jay Cornwall &lt;jay.cornwall@amd.com&gt;
Reviewed-by: Jonathan Kim &lt;Jonathan.Kim@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.12.x
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The purpose of halt_if_hws_hang is to preserve GPU state for driver
debugging when queue preemption fails. Issuing per-queue reset may
kill wavefronts which caused the preemption failure.

Signed-off-by: Jay Cornwall &lt;jay.cornwall@amd.com&gt;
Reviewed-by: Jonathan Kim &lt;Jonathan.Kim@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.12.x
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Failed to check various return code</title>
<updated>2024-12-18T17:18:11+00:00</updated>
<author>
<name>Andrew Martin</name>
<email>Andrew.Martin@amd.com</email>
</author>
<published>2024-12-10T16:50:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=88a45aa6083be000dc18c38a339acb1fd2f9831c'/>
<id>88a45aa6083be000dc18c38a339acb1fd2f9831c</id>
<content type='text'>
This patch checks and warns if pdd is NULL.

Signed-off-by: Andrew Martin &lt;Andrew.Martin@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch checks and warns if pdd is NULL.

Signed-off-by: Andrew Martin &lt;Andrew.Martin@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: pause autosuspend when creating pdd</title>
<updated>2024-12-10T15:26:18+00:00</updated>
<author>
<name>Jesse.zhang@amd.com</name>
<email>Jesse.zhang@amd.com</email>
</author>
<published>2024-12-05T09:41:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=438b39ac74e2a9dc0a5c9d653b7d8066877e86b1'/>
<id>438b39ac74e2a9dc0a5c9d653b7d8066877e86b1</id>
<content type='text'>
When using MES creating a pdd will require talking to the GPU to
setup the relevant context. The code here forgot to wake up the GPU
in case it was in suspend, this causes KVM to EFAULT for passthrough
GPU for example. This issue can be masked if the GPU was woken up by
other things (e.g. opening the KMS node) first and have not yet gone to sleep.

v4: do the allocation of proc_ctx_bo in a lazy fashion
when the first queue is created in a process (Felix)

Signed-off-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Reviewed-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When using MES creating a pdd will require talking to the GPU to
setup the relevant context. The code here forgot to wake up the GPU
in case it was in suspend, this causes KVM to EFAULT for passthrough
GPU for example. This issue can be masked if the GPU was woken up by
other things (e.g. opening the KMS node) first and have not yet gone to sleep.

v4: do the allocation of proc_ctx_bo in a lazy fashion
when the first queue is created in a process (Felix)

Signed-off-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Reviewed-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling</title>
<updated>2024-11-11T17:22:39+00:00</updated>
<author>
<name>Shaoyun Liu</name>
<email>shaoyun.liu@amd.com</email>
</author>
<published>2024-10-04T16:00:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=af5661c7c708b1923a1761fe12527c2b85ad47ba'/>
<id>af5661c7c708b1923a1761fe12527c2b85ad47ba</id>
<content type='text'>
Add back kfd queues in start scheduling that originally been
removed on stop scheduling.

Signed-off-by: Shaoyun Liu &lt;shaoyun.liu@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add back kfd queues in start scheduling that originally been
removed on stop scheduling.

Signed-off-by: Shaoyun Liu &lt;shaoyun.liu@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: remove extra use of volatile</title>
<updated>2024-10-24T22:04:49+00:00</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2024-10-22T13:48:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b6890efb597a19cc8bb45e0c2375292fd1f338de'/>
<id>b6890efb597a19cc8bb45e0c2375292fd1f338de</id>
<content type='text'>
as the adding of mb() should be sufficient in function unmap_queues_cpsch,
remove the add of volatile type as recommended

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
as the adding of mb() should be sufficient in function unmap_queues_cpsch,
remove the add of volatile type as recommended

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: fix the hang caused by the write reorder to fence_addr</title>
<updated>2024-10-22T21:50:39+00:00</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2024-10-17T08:20:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8834456163a1b372a85891751e51cafbf443a2d8'/>
<id>8834456163a1b372a85891751e51cafbf443a2d8</id>
<content type='text'>
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Copy wave state only for compute queue</title>
<updated>2024-10-07T18:09:21+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2024-10-03T15:53:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dafc87dcdc3bc50ac72c59156d64ed5267ad28e2'/>
<id>dafc87dcdc3bc50ac72c59156d64ed5267ad28e2</id>
<content type='text'>
get_wave_state is not defined for sdma queue, copy_context_work_handler
calls it for sdma queue will crash.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Tested-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
get_wave_state is not defined for sdma queue, copy_context_work_handler
calls it for sdma queue will crash.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Tested-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix CU occupancy for GFX 9.4.3</title>
<updated>2024-09-25T16:56:07+00:00</updated>
<author>
<name>Mukul Joshi</name>
<email>mukul.joshi@amd.com</email>
</author>
<published>2024-09-20T18:59:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e45b011d2c4146442a388113657b70f0c7cad09b'/>
<id>e45b011d2c4146442a388113657b70f0c7cad09b</id>
<content type='text'>
Make CU occupancy calculations work on GFX 9.4.3 by
updating the logic to handle multiple XCCs correctly.

Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make CU occupancy calculations work on GFX 9.4.3 by
updating the logic to handle multiple XCCs correctly.

Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Update logic for CU occupancy calculations</title>
<updated>2024-09-25T16:56:00+00:00</updated>
<author>
<name>Mukul Joshi</name>
<email>mukul.joshi@amd.com</email>
</author>
<published>2024-09-16T18:33:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6ae9e1aba97e4cdaa31a0bfdc07497ad0e915c84'/>
<id>6ae9e1aba97e4cdaa31a0bfdc07497ad0e915c84</id>
<content type='text'>
Currently, the code uses the IH_VMID_X_LUT register to map
a queue's vmid to the corresponding PASID. This logic is racy
since CP can update the VMID-PASID mapping anytime especially
when there are more processes than number of vmids. Update the
logic to calculate CU occupancy by matching doorbell offset of
the queue with valid wave counts against the process's queues.

Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, the code uses the IH_VMID_X_LUT register to map
a queue's vmid to the corresponding PASID. This logic is racy
since CP can update the VMID-PASID mapping anytime especially
when there are more processes than number of vmids. Update the
logic to calculate CU occupancy by matching doorbell offset of
the queue with valid wave counts against the process's queues.

Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
