<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h, branch v5.8</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>drm/amdkfd: Enable over-subscription with &gt;1 GWS queue</title>
<updated>2020-04-28T20:20:30+00:00</updated>
<author>
<name>Joseph Greathouse</name>
<email>Joseph.Greathouse@amd.com</email>
</author>
<published>2019-09-18T19:49:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b8020b0304c8f44e5e29f0b1a04d31e0bf68d26a'/>
<id>b8020b0304c8f44e5e29f0b1a04d31e0bf68d26a</id>
<content type='text'>
The current GWS usage model will only allows a single GWS-enabled
process to be active on the GPU at once. This ensures that a
barrier-using kernel gets a known amount of GPU hardware, to
prevent deadlock due to inability to go beyond the GWS barrier.

The HWS watches how many GWS entries are assigned to each process,
and goes into over-subscription mode when two processes need more
than the 64 that are available. The current KFD method for working
with this is to allocate all 64 GWS entries to each GWS-capable
process.

When more than one GWS-enabled process is in the runlist, we must
make sure the runlist is in over-subscription mode, so that the
HWS gets a chained RUN_LIST packet and continues scheduling
kernels.

Signed-off-by: Joseph Greathouse &lt;Joseph.Greathouse@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current GWS usage model will only allows a single GWS-enabled
process to be active on the GPU at once. This ensures that a
barrier-using kernel gets a known amount of GPU hardware, to
prevent deadlock due to inability to go beyond the GWS barrier.

The HWS watches how many GWS entries are assigned to each process,
and goes into over-subscription mode when two processes need more
than the 64 that are available. The current KFD method for working
with this is to allocate all 64 GWS entries to each GWS-capable
process.

When more than one GWS-enabled process is in the runlist, we must
make sure the runlist is in over-subscription mode, so that the
HWS gets a chained RUN_LIST packet and continues scheduling
kernels.

Signed-off-by: Joseph Greathouse &lt;Joseph.Greathouse@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Delete unnecessary unmap queue package submissions</title>
<updated>2020-02-26T19:20:33+00:00</updated>
<author>
<name>Yong Zhao</name>
<email>Yong.Zhao@amd.com</email>
</author>
<published>2020-02-05T21:53:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c7637c95abebc811d2479a5b924859025def544a'/>
<id>c7637c95abebc811d2479a5b924859025def544a</id>
<content type='text'>
The previous way of using SDMA queue count to infer whether we should unmap
SDMA engines has bugs. The reason it did not cause issues is because MEC
firmware unmaps all queues (CP + SDMA) when a unmap package for compute
engine is received. Becasue of that, only one unmap queue package
is needed, instead of one unmap queue package for CP and each SDMA engine,
which results in much simpler driver code.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The previous way of using SDMA queue count to infer whether we should unmap
SDMA engines has bugs. The reason it did not cause issues is because MEC
firmware unmaps all queues (CP + SDMA) when a unmap package for compute
engine is received. Becasue of that, only one unmap queue package
is needed, instead of one unmap queue package for CP and each SDMA engine,
which results in much simpler driver code.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Count active CP queues directly</title>
<updated>2020-02-26T19:20:13+00:00</updated>
<author>
<name>Yong Zhao</name>
<email>Yong.Zhao@amd.com</email>
</author>
<published>2020-02-05T19:48:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b42902f4af8fecd2781f71ad166e8a807edb1053'/>
<id>b42902f4af8fecd2781f71ad166e8a807edb1053</id>
<content type='text'>
The previous code of calculating active CP queues is problematic if
some SDMA queues are inactive. Fix that by counting CP queues directly.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The previous code of calculating active CP queues is problematic if
some SDMA queues are inactive. Fix that by counting CP queues directly.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Avoid ambiguity by indicating it's cp queue</title>
<updated>2020-02-26T19:20:05+00:00</updated>
<author>
<name>Yong Zhao</name>
<email>Yong.Zhao@amd.com</email>
</author>
<published>2020-01-30T23:35:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e6945304187deae0a28ebc65008ec11277f1c0f0'/>
<id>e6945304187deae0a28ebc65008ec11277f1c0f0</id>
<content type='text'>
The queues represented in queue_bitmap are only CP queues.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The queues represented in queue_bitmap are only CP queues.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Rename queue_count to active_queue_count</title>
<updated>2020-02-26T19:19:38+00:00</updated>
<author>
<name>Yong Zhao</name>
<email>Yong.Zhao@amd.com</email>
</author>
<published>2020-01-30T23:25:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=81b820b304a07fa4de9d46f095475555881af8fe'/>
<id>81b820b304a07fa4de9d46f095475555881af8fe</id>
<content type='text'>
The name is easier to understand the code.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The name is easier to understand the code.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Improve HWS hang detection and handling</title>
<updated>2020-01-07T16:54:56+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2019-12-20T07:07:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=09c34e8d7a631cf541c82f46ee73433c28b6ce0e'/>
<id>09c34e8d7a631cf541c82f46ee73433c28b6ce0e</id>
<content type='text'>
Move HWS hang detection into unmap_queues_cpsch to catch hangs in all
cases. If this happens during a reset, don't schedule another reset
because the reset already in progress is expected to take care of it.

Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Tested-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: shaoyunl  &lt;shaoyun.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move HWS hang detection into unmap_queues_cpsch to catch hangs in all
cases. If this happens during a reset, don't schedule another reset
because the reset already in progress is expected to take care of it.

Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Tested-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: shaoyunl  &lt;shaoyun.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Remove unused variable</title>
<updated>2020-01-07T16:54:50+00:00</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2019-12-20T07:01:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=63e088acfc330c7031c4c07e8503337d1f36639a'/>
<id>63e088acfc330c7031c4c07e8503337d1f36639a</id>
<content type='text'>
dqm-&gt;pipeline_mem wasn't used anywhere.

Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: shaoyunl  &lt;shaoyun.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dqm-&gt;pipeline_mem wasn't used anywhere.

Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: shaoyunl  &lt;shaoyun.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: don't use dqm lock during device reset/suspend/resume</title>
<updated>2019-10-25T20:50:10+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2019-10-18T14:15:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2c99a547bcf9bb8532abd2953479949018449f93'/>
<id>2c99a547bcf9bb8532abd2953479949018449f93</id>
<content type='text'>
If device reset/suspend/resume failed for some reason, dqm lock is
hold forever and this causes deadlock. Below is a kernel backtrace when
application open kfd after suspend/resume failed.

Instead of holding dqm lock in pre_reset and releasing dqm lock in
post_reset, add dqm-&gt;sched_running flag which is modified in
dqm-&gt;ops.start and dqm-&gt;ops.stop. The flag doesn't need lock protection
because write/read are all inside dqm lock.

For HWS case, map_queues_cpsch and unmap_queues_cpsch checks
sched_running flag before sending the updated runlist.

v2: For no-HWS case, when device is stopped, don't call
load/destroy_mqd for eviction, restore and create queue, and avoid
debugfs dump hdqs.

Backtrace of dqm lock deadlock:

[Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
than 120 seconds.
[Thu Oct 17 16:43:37 2019]       Not tainted
5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
[Thu Oct 17 16:43:37 2019] "echo 0 &gt;
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Oct 17 16:43:37 2019] rocminfo        D    0  3024   2947
0x80000000
[Thu Oct 17 16:43:37 2019] Call Trace:
[Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
[Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
[Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
[Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
[Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
[Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]
kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
[Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
[amdgpu]
[Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
[Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
[Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
[Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
[Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
[Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
[Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
[Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
[Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
[Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
[Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Suggested-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If device reset/suspend/resume failed for some reason, dqm lock is
hold forever and this causes deadlock. Below is a kernel backtrace when
application open kfd after suspend/resume failed.

Instead of holding dqm lock in pre_reset and releasing dqm lock in
post_reset, add dqm-&gt;sched_running flag which is modified in
dqm-&gt;ops.start and dqm-&gt;ops.stop. The flag doesn't need lock protection
because write/read are all inside dqm lock.

For HWS case, map_queues_cpsch and unmap_queues_cpsch checks
sched_running flag before sending the updated runlist.

v2: For no-HWS case, when device is stopped, don't call
load/destroy_mqd for eviction, restore and create queue, and avoid
debugfs dump hdqs.

Backtrace of dqm lock deadlock:

[Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
than 120 seconds.
[Thu Oct 17 16:43:37 2019]       Not tainted
5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
[Thu Oct 17 16:43:37 2019] "echo 0 &gt;
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Oct 17 16:43:37 2019] rocminfo        D    0  3024   2947
0x80000000
[Thu Oct 17 16:43:37 2019] Call Trace:
[Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
[Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
[Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
[Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
[Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
[Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]
kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
[Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
[amdgpu]
[Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
[Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
[Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
[Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
[Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
[Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
[Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
[Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
[Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
[Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
[Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Suggested-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Record vmid pasid mapping in the driver for non HWS mode</title>
<updated>2019-10-03T14:11:03+00:00</updated>
<author>
<name>Yong Zhao</name>
<email>Yong.Zhao@amd.com</email>
</author>
<published>2019-09-26T03:49:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d9d4623c87e9318549e0c85133c38909d9e8c47a'/>
<id>d9d4623c87e9318549e0c85133c38909d9e8c47a</id>
<content type='text'>
This makes possible the vmid pasid mapping query through software.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This makes possible the vmid pasid mapping query through software.

Signed-off-by: Yong Zhao &lt;Yong.Zhao@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: Add navi10 support to amdkfd. (v3)</title>
<updated>2019-06-21T23:59:24+00:00</updated>
<author>
<name>Philip Cox</name>
<email>Philip.Cox@amd.com</email>
</author>
<published>2019-05-30T04:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=14328aa58ce523a59996c5a82681c43ec048cc33'/>
<id>14328aa58ce523a59996c5a82681c43ec048cc33</id>
<content type='text'>
KFD (kernel fusion driver) is the kernel driver
for the compute backend for usermode compute
stack.

v2: squash in updates (Alex)
v3: squash in rebase fixes (Alex)

Signed-off-by: Oak Zeng &lt;Oak.Zeng@amd.com&gt;
Signed-off-by: Philip Cox &lt;Philip.Cox@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
KFD (kernel fusion driver) is the kernel driver
for the compute backend for usermode compute
stack.

v2: squash in updates (Alex)
v3: squash in rebase fixes (Alex)

Signed-off-by: Oak Zeng &lt;Oak.Zeng@amd.com&gt;
Signed-off-by: Philip Cox &lt;Philip.Cox@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
