<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch linux-6.10.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic</title>
<updated>2024-09-12T09:13:05+00:00</updated>
<author>
<name>Yunxiang Li</name>
<email>Yunxiang.Li@amd.com</email>
</author>
<published>2024-04-22T19:04:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e30b013e24da0f23b5a102f5208bdfdaaa4fd05e'/>
<id>e30b013e24da0f23b5a102f5208bdfdaaa4fd05e</id>
<content type='text'>
[ Upstream commit 6e4aa08fa9c6c0c027fc86f242517c925d159393 ]

The retry loop for SRIOV reset have refcount and memory leak issue.
Depending on which function call fails it can potentially call
amdgpu_amdkfd_pre/post_reset different number of times and causes
kfd_locked count to be wrong. This will block all future attempts at
opening /dev/kfd. The retry loop also leakes resources by calling
amdgpu_virt_init_data_exchange multiple times without calling the
corresponding fini function.

Align with the bare-metal reset path which doesn't have these issues.
This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
reset loop and calling amdgpu_device_pre_asic_reset each retry which
properly free the resources from previous try by calling
amdgpu_virt_fini_data_exchange.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 6e4aa08fa9c6c0c027fc86f242517c925d159393 ]

The retry loop for SRIOV reset have refcount and memory leak issue.
Depending on which function call fails it can potentially call
amdgpu_amdkfd_pre/post_reset different number of times and causes
kfd_locked count to be wrong. This will block all future attempts at
opening /dev/kfd. The retry loop also leakes resources by calling
amdgpu_virt_init_data_exchange multiple times without calling the
corresponding fini function.

Align with the bare-metal reset path which doesn't have these issues.
This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
reset loop and calling amdgpu_device_pre_asic_reset each retry which
properly free the resources from previous try by calling
amdgpu_virt_fini_data_exchange.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Add reset_context flag for host FLR</title>
<updated>2024-09-12T09:13:05+00:00</updated>
<author>
<name>Yunxiang Li</name>
<email>Yunxiang.Li@amd.com</email>
</author>
<published>2024-04-22T18:44:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3adb4ae45e42a9f537071a03f9814f9c6c4f32ed'/>
<id>3adb4ae45e42a9f537071a03f9814f9c6c4f32ed</id>
<content type='text'>
[ Upstream commit 25c01191c2555351922e5515b6b6d31357975031 ]

There are other reset sources that pass NULL as the job pointer, such as
amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if
the FLR comes from the host does not work.

Add a flag in reset_context to explicitly mark host triggered reset, and
set this flag when we receive host reset notification.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Stable-dep-of: 6e4aa08fa9c6 ("drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 25c01191c2555351922e5515b6b6d31357975031 ]

There are other reset sources that pass NULL as the job pointer, such as
amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if
the FLR comes from the host does not work.

Add a flag in reset_context to explicitly mark host triggered reset, and
set this flag when we receive host reset notification.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Stable-dep-of: 6e4aa08fa9c6 ("drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix two reset triggered in a row</title>
<updated>2024-09-12T09:13:05+00:00</updated>
<author>
<name>Yunxiang Li</name>
<email>Yunxiang.Li@amd.com</email>
</author>
<published>2024-04-22T18:59:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1f490704c6169e95585c1cf47299dbf54d401469'/>
<id>1f490704c6169e95585c1cf47299dbf54d401469</id>
<content type='text'>
[ Upstream commit f4322b9f8ad5f9f62add288c785d2e10bb6a5efe ]

Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.

Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.

Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Stable-dep-of: 6e4aa08fa9c6 ("drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit f4322b9f8ad5f9f62add288c785d2e10bb6a5efe ]

Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.

Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.

Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Stable-dep-of: 6e4aa08fa9c6 ("drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix dereference after null check</title>
<updated>2024-09-08T05:56:29+00:00</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2024-05-08T06:51:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=00b9594d6310eb33e14d3f07b54866499efe0d50'/>
<id>00b9594d6310eb33e14d3f07b54866499efe0d50</id>
<content type='text'>
[ Upstream commit b1f7810b05d1950350ac2e06992982974343e441 ]

check the pointer hive before use.

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Reviewed-by: Tim Huang &lt;Tim.Huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit b1f7810b05d1950350ac2e06992982974343e441 ]

check the pointer hive before use.

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Reviewed-by: Tim Huang &lt;Tim.Huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: Check tbo resource pointer</title>
<updated>2024-09-08T05:56:25+00:00</updated>
<author>
<name>Asad Kamal</name>
<email>asad.kamal@amd.com</email>
</author>
<published>2024-04-25T18:26:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e8765364d4f3aaf88c7abe0a4fc99089d059ab49'/>
<id>e8765364d4f3aaf88c7abe0a4fc99089d059ab49</id>
<content type='text'>
[ Upstream commit 6cd2b872643bb29bba01a8ac739138db7bd79007 ]

Validate tbo resource pointer, skip if NULL

Signed-off-by: Asad Kamal &lt;asad.kamal@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 6cd2b872643bb29bba01a8ac739138db7bd79007 ]

Validate tbo resource pointer, skip if NULL

Signed-off-by: Asad Kamal &lt;asad.kamal@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Add lock around VF RLCG interface</title>
<updated>2024-08-14T13:34:14+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-05-27T20:10:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e1ab38e99d1607f80a1670a399511a56464c0253'/>
<id>e1ab38e99d1607f80a1670a399511a56464c0253</id>
<content type='text'>
[ Upstream commit e864180ee49b4d30e640fd1e1d852b86411420c9 ]

flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.

Both of these threads access registers through the VF
RLCG interface during VF Full Access. Add a lock around this interface
to prevent race conditions between these threads.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit e864180ee49b4d30e640fd1e1d852b86411420c9 ]

flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.

Both of these threads access registers through the VF
RLCG interface during VF Full Access. Add a lock around this interface
to prevent race conditions between these threads.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit</title>
<updated>2024-08-03T06:59:50+00:00</updated>
<author>
<name>Friedrich Vock</name>
<email>friedrich.vock@gmx.de</email>
</author>
<published>2024-05-14T07:06:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9084114e338597f8102335e427d08e6af0f3dae0'/>
<id>9084114e338597f8102335e427d08e6af0f3dae0</id>
<content type='text'>
[ Upstream commit 0cdb3f9740844b9d95ca413e3fcff11f81223ecf ]

The special case for VM passthrough doesn't check adev-&gt;nbio.funcs
before dereferencing it. If GPUs that don't have an NBIO block are
passed through, this leads to a NULL pointer dereference on startup.

Signed-off-by: Friedrich Vock &lt;friedrich.vock@gmx.de&gt;
Fixes: 1bece222eabe ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid")
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 0cdb3f9740844b9d95ca413e3fcff11f81223ecf ]

The special case for VM passthrough doesn't check adev-&gt;nbio.funcs
before dereferencing it. If GPUs that don't have an NBIO block are
passed through, this leads to a NULL pointer dereference on startup.

Signed-off-by: Friedrich Vock &lt;friedrich.vock@gmx.de&gt;
Fixes: 1bece222eabe ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid")
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix pci state save during mode-1 reset</title>
<updated>2024-06-25T18:13:12+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-06-18T08:34:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=74fa02c4a5ea1ade5156a6ce494d3ea83881c2d8'/>
<id>74fa02c4a5ea1ade5156a6ce494d3ea83881c2d8</id>
<content type='text'>
Cache the PCI state before bus master is disabled. The saved state is
later used for other cases like restoring config space after mode-2
reset.

Fixes: 5c03e5843e6b ("drm/amdgpu:add smu mode1/2 support for aldebaran")
Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Feifei Xu &lt;Feifei.Xu@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cache the PCI state before bus master is disabled. The saved state is
later used for other cases like restoring config space after mode-2
reset.

Fixes: 5c03e5843e6b ("drm/amdgpu:add smu mode1/2 support for aldebaran")
Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Feifei Xu &lt;Feifei.Xu@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()</title>
<updated>2024-05-29T21:01:49+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2024-05-15T15:25:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ba46b3bda296c4f82b061ac40b90f49d2a00a380'/>
<id>ba46b3bda296c4f82b061ac40b90f49d2a00a380</id>
<content type='text'>
Use current speed/width on devices which don't support
dynamic PCIe switching.

Fixes: 466a7d115326 ("drm/amd: Use the first non-dGPU PCI device for BW limits")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use current speed/width on devices which don't support
dynamic PCIe switching.

Fixes: 466a7d115326 ("drm/amd: Use the first non-dGPU PCI device for BW limits")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: skip ip dump if devcoredump flag is set</title>
<updated>2024-04-26T21:22:44+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2024-04-25T10:15:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2a8f7464d33c759c3848737399d155a6c83c1ffa'/>
<id>2a8f7464d33c759c3848737399d155a6c83c1ffa</id>
<content type='text'>
Do not dump the ip registers during driver reload
in passthrough environment.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do not dump the ip registers during driver reload
in passthrough environment.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
