<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch linux-5.0.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>drm/amdgpu: shadow in shadow_list without tbo.mem.start cause page fault in sriov TDR</title>
<updated>2019-05-16T17:40:17+00:00</updated>
<author>
<name>wentalou</name>
<email>Wentao.Lou@amd.com</email>
</author>
<published>2019-04-12T07:01:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2d56b3f53ac8d9310a0ac20af663e1a44050f868'/>
<id>2d56b3f53ac8d9310a0ac20af663e1a44050f868</id>
<content type='text'>
[ Upstream commit b575f10dbd6f84c2c8744ff1f486bfae1e4f6f38 ]

shadow was added into shadow_list by amdgpu_bo_create_shadow.
meanwhile, shadow-&gt;tbo.mem was not fully configured.
tbo.mem would be fully configured by amdgpu_vm_sdma_map_table until calling amdgpu_vm_clear_bo.
If sriov TDR occurred between amdgpu_bo_create_shadow and amdgpu_vm_sdma_map_table,
amdgpu_device_recover_vram would deal with shadow without tbo.mem.start.

Signed-off-by: Wentao Lou &lt;Wentao.Lou@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit b575f10dbd6f84c2c8744ff1f486bfae1e4f6f38 ]

shadow was added into shadow_list by amdgpu_bo_create_shadow.
meanwhile, shadow-&gt;tbo.mem was not fully configured.
tbo.mem would be fully configured by amdgpu_vm_sdma_map_table until calling amdgpu_vm_clear_bo.
If sriov TDR occurred between amdgpu_bo_create_shadow and amdgpu_vm_sdma_map_table,
amdgpu_device_recover_vram would deal with shadow without tbo.mem.start.

Signed-off-by: Wentao Lou &lt;Wentao.Lou@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list</title>
<updated>2019-05-10T16:36:09+00:00</updated>
<author>
<name>wentalou</name>
<email>Wentao.Lou@amd.com</email>
</author>
<published>2019-04-02T09:13:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4b5f2b0ce17cf960925f94fcfe9cd1cb452ee580'/>
<id>4b5f2b0ce17cf960925f94fcfe9cd1cb452ee580</id>
<content type='text'>
[ Upstream commit 1712fb1a2f6829150032ac76eb0e39b82a549cfb ]

amdgpu_bo_restore_shadow would assign zero to r if succeeded.
r would remain zero if there is only one node in shadow_list.
current code would always return failure when r &lt;= 0.
restart the timeout for each wait was a rather problematic bug as well.
The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.

Signed-off-by: Wentao Lou &lt;Wentao.Lou@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 1712fb1a2f6829150032ac76eb0e39b82a549cfb ]

amdgpu_bo_restore_shadow would assign zero to r if succeeded.
r would remain zero if there is only one node in shadow_list.
current code would always return failure when r &lt;= 0.
restart the timeout for each wait was a rather problematic bug as well.
The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop.

Signed-off-by: Wentao Lou &lt;Wentao.Lou@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/sriov:Correct pfvf exchange logic</title>
<updated>2019-01-02T20:24:48+00:00</updated>
<author>
<name>Emily Deng</name>
<email>Emily.Deng@amd.com</email>
</author>
<published>2018-12-29T09:46:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b8cf66182eddb22e9c7539821ed6eecdb4f86d1a'/>
<id>b8cf66182eddb22e9c7539821ed6eecdb4f86d1a</id>
<content type='text'>
The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu
reset.

Signed-off-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-By: Xiangliang Yu &lt;Xiangliang.Yu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu
reset.

Signed-off-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-By: Xiangliang Yu &lt;Xiangliang.Yu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/virtual_dce: No need to pin the cursor bo</title>
<updated>2019-01-02T20:24:45+00:00</updated>
<author>
<name>Emily Deng</name>
<email>Emily.Deng@amd.com</email>
</author>
<published>2018-12-26T10:08:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=baf3c982dfbf7b0742039e6fef3f1fe1ba4079ab'/>
<id>baf3c982dfbf7b0742039e6fef3f1fe1ba4079ab</id>
<content type='text'>
For virtual display feature, no need to pin cursor bo.

Signed-off-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: Huang Rui &lt;ray.huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For virtual display feature, no need to pin cursor bo.

Signed-off-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: Huang Rui &lt;ray.huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang</title>
<updated>2018-12-14T17:04:38+00:00</updated>
<author>
<name>wentalou</name>
<email>Wentao.Lou@amd.com</email>
</author>
<published>2018-12-07T05:53:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7b184b006185215daf4e911f8de212964c99a514'/>
<id>7b184b006185215daf4e911f8de212964c99a514</id>
<content type='text'>
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Signed-off-by: Wentao Lou &lt;Wentao.Lou@amd.com&gt;
Reviewed-by: Shaoyun Liu &lt;Shaoyun.Liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.

Signed-off-by: Wentao Lou &lt;Wentao.Lou@amd.com&gt;
Reviewed-by: Shaoyun Liu &lt;Shaoyun.Liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Enable GPU recovery by default for CI</title>
<updated>2018-12-12T19:26:40+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2018-12-11T20:31:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fc42d47ce0118e2f59a67ac0b0da56f9dc454bd9'/>
<id>fc42d47ce0118e2f59a67ac0b0da56f9dc454bd9</id>
<content type='text'>
I retested Bonaire (gfx7 dGPU) and it works fine.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I retested Bonaire (gfx7 dGPU) and it works fine.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/si: fix SI after doorbell rework</title>
<updated>2018-12-05T22:49:50+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2018-12-03T02:47:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=223577753b54acf0033de9585340909a0ef05e68'/>
<id>223577753b54acf0033de9585340909a0ef05e68</id>
<content type='text'>
SI does not use doorbells, move asic doorbell init later
asic check.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108920
Reviewed-by: Oak Zeng &lt;Oak.Zeng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SI does not use doorbells, move asic doorbell init later
asic check.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108920
Reviewed-by: Oak Zeng &lt;Oak.Zeng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement concurrent asic reset for XGMI.</title>
<updated>2018-12-03T16:15:14+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2018-11-29T20:14:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d4535e2c018bba71b49edeb5e396183920f5d341'/>
<id>d4535e2c018bba71b49edeb5e396183920f5d341</id>
<content type='text'>
Use per hive wq to concurrently send reset commands to all nodes
in the hive.

v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop  the hive reset for each node loop if there
is a reset failure on any of the nodes.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use per hive wq to concurrently send reset commands to all nodes
in the hive.

v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop  the hive reset for each node loop if there
is a reset failure on any of the nodes.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Handle xgmi device removal.</title>
<updated>2018-12-03T16:15:08+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2018-11-29T17:21:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a82400b57abb6aff068bb3b21d1cccd63acbb863'/>
<id>a82400b57abb6aff068bb3b21d1cccd63acbb863</id>
<content type='text'>
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.

v2: Remove creation of dedicated wq for XGMI hive reset.
v3: Use the gmc.xgmi.supported flag

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.

v2: Remove creation of dedicated wq for XGMI hive reset.
v3: Use the gmc.xgmi.supported flag

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix num_doorbell calculation issue</title>
<updated>2018-11-30T17:01:04+00:00</updated>
<author>
<name>Oak Zeng</name>
<email>ozeng@amd.com</email>
</author>
<published>2018-11-30T15:33:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=88dc26e46beb964d8c4d80f7eb33bef51fc70c9a'/>
<id>88dc26e46beb964d8c4d80f7eb33bef51fc70c9a</id>
<content type='text'>
When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not *2.

Signed-off-by: Oak Zeng &lt;ozeng@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not *2.

Signed-off-by: Oak Zeng &lt;ozeng@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
