<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v6.2-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>drm/amdgpu: skip MES for S0ix as well since it's part of GFX</title>
<updated>2022-12-20T18:08:12+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2022-12-16T16:42:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=afa6646b1c5d3affd541f76bd7476e4b835a9174'/>
<id>afa6646b1c5d3affd541f76bd7476e4b835a9174</id>
<content type='text'>
It's also part of gfxoff.

Cc: stable@vger.kernel.org # 6.0, 6.1
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's also part of gfxoff.

Cc: stable@vger.kernel.org # 6.0, 6.1
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Add an extra evict_resource call during device_suspend.</title>
<updated>2022-12-13T22:10:32+00:00</updated>
<author>
<name>Shikang Fan</name>
<email>shikang.fan@amd.com</email>
</author>
<published>2022-12-08T11:53:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=47ea20762bb7875a62e10433a3cd5d34e9133f47'/>
<id>47ea20762bb7875a62e10433a3cd5d34e9133f47</id>
<content type='text'>
- evict_resource is taking too long causing sriov full access mode timeout.
  So, add an extra evict_resource in the beginning as an early evict.

Signed-off-by: Shikang Fan &lt;shikang.fan@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- evict_resource is taking too long causing sriov full access mode timeout.
  So, add an extra evict_resource in the beginning as an early evict.

Signed-off-by: Shikang Fan &lt;shikang.fan@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix potential double free and null pointer dereference</title>
<updated>2022-11-29T16:03:37+00:00</updated>
<author>
<name>Liang He</name>
<email>windhl@126.com</email>
</author>
<published>2022-11-22T04:28:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dfd0287bd3920e132a8dae2a0ec3d92eaff5f2dd'/>
<id>dfd0287bd3920e132a8dae2a0ec3d92eaff5f2dd</id>
<content type='text'>
In amdgpu_get_xgmi_hive(), we should not call kfree() after
kobject_put() as the PUT will call kfree().

In amdgpu_device_ip_init(), we need to check the returned *hive*
which can be NULL before we dereference it.

Signed-off-by: Liang He &lt;windhl@126.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In amdgpu_get_xgmi_hive(), we should not call kfree() after
kobject_put() as the PUT will call kfree().

In amdgpu_device_ip_init(), we need to check the returned *hive*
which can be NULL before we dereference it.

Signed-off-by: Liang He &lt;windhl@126.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix for suspend/resume kiq fence fallback under sriov</title>
<updated>2022-11-23T15:31:31+00:00</updated>
<author>
<name>Shikang Fan</name>
<email>shikang.fan@amd.com</email>
</author>
<published>2022-11-18T09:35:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3c22c1ead6b2e6a9c0f2eeef143948f5d701dd08'/>
<id>3c22c1ead6b2e6a9c0f2eeef143948f5d701dd08</id>
<content type='text'>
- in device_resume, sriov configure interrupt should be in full access,
  so release_full_gpu should be done after kfd_resume.
- remove the previous workaround solution for sriov.

Fixes: ec4927d463cb ("drm/amdgpu: fix for suspend/resume sequence under sriov")
Signed-off-by: Shikang Fan &lt;shikang.fan@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- in device_resume, sriov configure interrupt should be in full access,
  so release_full_gpu should be done after kfd_resume.
- remove the previous workaround solution for sriov.

Fixes: ec4927d463cb ("drm/amdgpu: fix for suspend/resume sequence under sriov")
Signed-off-by: Shikang Fan &lt;shikang.fan@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix pci device refcount leak</title>
<updated>2022-11-23T14:47:13+00:00</updated>
<author>
<name>Yang Yingliang</name>
<email>yangyingliang@huawei.com</email>
</author>
<published>2022-11-17T15:00:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b85e285e3d6352b02947fc1b72303673dfacb0aa'/>
<id>b85e285e3d6352b02947fc1b72303673dfacb0aa</id>
<content type='text'>
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().

So before returning from amdgpu_device_resume|suspend_display_audio(),
pci_dev_put() is called to avoid refcount leak.

Fixes: 3f12acc8d6d4 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3")
Reviewed-by: Evan Quan &lt;evan.quan@amd.com&gt;
Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().

So before returning from amdgpu_device_resume|suspend_display_audio(),
pci_dev_put() is called to avoid refcount leak.

Fixes: 3f12acc8d6d4 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3")
Reviewed-by: Evan Quan &lt;evan.quan@amd.com&gt;
Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'amd-drm-next-6.2-2022-11-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-next</title>
<updated>2022-11-22T03:41:11+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2022-11-22T03:41:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fc58764bbf602b65a6f63c53e5fd6feae76c510c'/>
<id>fc58764bbf602b65a6f63c53e5fd6feae76c510c</id>
<content type='text'>
amd-drm-next-6.2-2022-11-18:

amdgpu:
- SR-IOV fixes
- Clean up DC checks
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- Don't enable degamma on asics which don't support it
- IP discovery fixes
- BACO fixes
- Fix vbios allocation handling when vkms is enabled
- Drop buggy tdr advanced mode GPU reset handling
- Fix the build when DCN is not set in kconfig
- MST DSC fixes
- Userptr fixes
- FRU and RAS EEPROM fixes
- VCN 4.x RAS support
- Aldrebaran CU occupancy reporting fix
- PSP ring cleanup

amdkfd:
- Memory limit fix
- Enable cooperative launch on gfx 10.3

amd-drm-next-6.2-2022-11-11:

amdgpu:
- SMU 13.x updates
- GPUVM TLB race fix
- DCN 3.1.4 updates
- DCN 3.2.x updates
- PSR fixes
- Kerneldoc fix
- Vega10 fan fix
- GPUVM locking fixes in error pathes
- BACO fix for Beige Goby
- EEPROM I2C address cleanup
- GFXOFF fix
- Fix DC memory leak in error pathes
- Flexible array updates
- Mtype fix for GPUVM PTEs
- Move Kconfig into amdgpu directory
- SR-IOV updates
- Fix possible memory leak in CS IOCTL error path

amdkfd:
- Fix possible memory overrun
- CRIU fixes

radeon:
- ACPI ref count fix
- HDA audio notifier support
- Move Kconfig into radeon directory

UAPI:
- Add new GEM_CREATE flags to help to transition more KFD functionality to the DRM UAPI.
  These are used internally in the driver to align location based memory coherency
  requirements from memory allocated in the KFD with how we manage GPUVM PTEs.  They
  are currently blocked in the GEM_CREATE IOCTL as we don't have a user right now.
  They are just used internally in the kernel driver for now for existing KFD memory
  allocations. So a change to the UAPI header, but no functional change in the UAPI.

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20221118170807.6505-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
amd-drm-next-6.2-2022-11-18:

amdgpu:
- SR-IOV fixes
- Clean up DC checks
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- Don't enable degamma on asics which don't support it
- IP discovery fixes
- BACO fixes
- Fix vbios allocation handling when vkms is enabled
- Drop buggy tdr advanced mode GPU reset handling
- Fix the build when DCN is not set in kconfig
- MST DSC fixes
- Userptr fixes
- FRU and RAS EEPROM fixes
- VCN 4.x RAS support
- Aldrebaran CU occupancy reporting fix
- PSP ring cleanup

amdkfd:
- Memory limit fix
- Enable cooperative launch on gfx 10.3

amd-drm-next-6.2-2022-11-11:

amdgpu:
- SMU 13.x updates
- GPUVM TLB race fix
- DCN 3.1.4 updates
- DCN 3.2.x updates
- PSR fixes
- Kerneldoc fix
- Vega10 fan fix
- GPUVM locking fixes in error pathes
- BACO fix for Beige Goby
- EEPROM I2C address cleanup
- GFXOFF fix
- Fix DC memory leak in error pathes
- Flexible array updates
- Mtype fix for GPUVM PTEs
- Move Kconfig into amdgpu directory
- SR-IOV updates
- Fix possible memory leak in CS IOCTL error path

amdkfd:
- Fix possible memory overrun
- CRIU fixes

radeon:
- ACPI ref count fix
- HDA audio notifier support
- Move Kconfig into radeon directory

UAPI:
- Add new GEM_CREATE flags to help to transition more KFD functionality to the DRM UAPI.
  These are used internally in the driver to align location based memory coherency
  requirements from memory allocated in the KFD with how we manage GPUVM PTEs.  They
  are currently blocked in the GEM_CREATE IOCTL as we don't have a user right now.
  They are just used internally in the kernel driver for now for existing KFD memory
  allocations. So a change to the UAPI header, but no functional change in the UAPI.

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20221118170807.6505-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Enable mode-1 reset for RAS recovery in fatal error mode</title>
<updated>2022-11-17T23:07:52+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2022-11-08T09:11:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1a11a65d5395ccdcd07f19a75da82a3d74c368dd'/>
<id>1a11a65d5395ccdcd07f19a75da82a3d74c368dd</id>
<content type='text'>
The patch is enabling mode-1 reset for RAS recovery in fatal error mode.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The patch is enabling mode-1 reset for RAS recovery in fatal error mode.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'drm-misc-next-2022-11-10-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next</title>
<updated>2022-11-15T21:17:32+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2022-11-15T07:29:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e291f2f585313efa5200cce655e17c94906e50a'/>
<id>4e291f2f585313efa5200cce655e17c94906e50a</id>
<content type='text'>
drm-misc-next for 6.2:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
- atomic-helper: Add begin_fb_access and end_fb_access hooks
- fb-helper: Rework to move fb emulation into helpers
- scheduler: rework entity flush, kill and fini
- ttm: Optimize pool allocations

Driver Changes:
- amdgpu: scheduler rework
- hdlcd: Switch to DRM-managed resources
- ingenic: Fix registration error path
- lcdif: FIFO threshold tuning
- meson: Fix return type of cvbs' mode_valid
- ofdrm: multiple fixes (kconfig, types, endianness)
- sun4i: A100 and D1 support
- panel:
  - New Panel: Jadard JD9365DA-H3

Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;

From: Maxime Ripard &lt;maxime@cerno.tech&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20221110083612.g63eaocoaa554soh@houat
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
drm-misc-next for 6.2:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
- atomic-helper: Add begin_fb_access and end_fb_access hooks
- fb-helper: Rework to move fb emulation into helpers
- scheduler: rework entity flush, kill and fini
- ttm: Optimize pool allocations

Driver Changes:
- amdgpu: scheduler rework
- hdlcd: Switch to DRM-managed resources
- ingenic: Fix registration error path
- lcdif: FIFO threshold tuning
- meson: Fix return type of cvbs' mode_valid
- ofdrm: multiple fixes (kconfig, types, endianness)
- sun4i: A100 and D1 support
- panel:
  - New Panel: Jadard JD9365DA-H3

Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;

From: Maxime Ripard &lt;maxime@cerno.tech&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20221110083612.g63eaocoaa554soh@houat
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: stop resubmittting jobs in amdgpu_pci_resume</title>
<updated>2022-11-15T20:25:45+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2022-10-26T10:48:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0788a47e7cec7ebdcb1ad8912754b8b8b06ee915'/>
<id>0788a47e7cec7ebdcb1ad8912754b8b8b06ee915</id>
<content type='text'>
The state of VRAM is unreliable due to a PCI event like AER, link reset
or DPC.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The state of VRAM is unreliable due to a PCI event like AER, link reset
or DPC.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: stop resubmitting jobs for GPU reset v2</title>
<updated>2022-11-15T20:25:37+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2022-10-26T10:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6868a2c46560670efc0d1f2b446cc57edcaf960d'/>
<id>6868a2c46560670efc0d1f2b446cc57edcaf960d</id>
<content type='text'>
Re-submitting IBs by the kernel has many problems because pre-
requisite state is not automatically re-created as well. In
other words neither binary semaphores nor things like ring
buffer pointers are in the state they should be when the
hardware starts to work on the IBs again.

Additional to that even after more than 5 years of
developing this feature it is still not stable and we have
massively problems getting the reference counts right.

As discussed with user space developers this behavior is not
helpful in the first place. For graphics and multimedia
workloads it makes much more sense to either completely
re-create the context or at least re-submitting the IBs
from userspace.

For compute use cases re-submitting is also not very
helpful since userspace must rely on the accuracy of
the result.

Because of this we stop this practice and instead just
properly note that the fence submission was canceled. The
only use case we keep the re-submission for now is SRIOV
and function level resets.

v2: as suggested by Sshaoyun stop resubmitting jobs even for SRIOV

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Re-submitting IBs by the kernel has many problems because pre-
requisite state is not automatically re-created as well. In
other words neither binary semaphores nor things like ring
buffer pointers are in the state they should be when the
hardware starts to work on the IBs again.

Additional to that even after more than 5 years of
developing this feature it is still not stable and we have
massively problems getting the reference counts right.

As discussed with user space developers this behavior is not
helpful in the first place. For graphics and multimedia
workloads it makes much more sense to either completely
re-create the context or at least re-submitting the IBs
from userspace.

For compute use cases re-submitting is also not very
helpful since userspace must rely on the accuracy of
the result.

Because of this we stop this practice and instead just
properly note that the fence submission was canceled. The
only use case we keep the re-submission for now is SRIOV
and function level resets.

v2: as suggested by Sshaoyun stop resubmitting jobs even for SRIOV

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
