<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c, branch v6.13</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'drm-misc-next-2024-09-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next</title>
<updated>2024-10-08T23:03:46+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2024-10-08T23:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7fefa1edc23b360457b8986dbbf879698eeb1400'/>
<id>7fefa1edc23b360457b8986dbbf879698eeb1400</id>
<content type='text'>
drm-misc-next for v6.12:

UAPI Changes:
- Add panthor/DEV_QUERY_TIMESTAMP_INFO query.

Cross-subsystem Changes:
- Updated dt bindings.
- Add documentation explaining default errnos for fences.
- Mark dma-buf heaps creation functions as __init.

Core Changes:
- Split DSC helpers from DP helpers.
- Clang build fixes for drm/mm test.
- Remove simple pipeline support for gem-vram,
  no longer any users left after converting bochs.
- Add erno to drm_sched_start to distinguish between GPU and queue
  reset.
- Add drm_framebuffer testcases.
- Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n.
- Use read_trylock instead of read_lock in dma_fence_begin_signalling to
  quiesce lockdep.

Driver Changes:
- Assorted small fixes and updates for tegra, host1x, imagination,
  nouveau, panfrost, panthor, panel/ili9341, mali, exynos,
  panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a,
  bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050,
  panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d.
- Add bridge/TI TDP158.
- Assorted documentation updates.
- Convert bochs from simple drm to gem shmem, and check modes
  against available memory.
- Many VC4 fixes, most related to scaling and YUV support.
- Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS.
- Rockchip 4k@60 support.

Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
From: Maarten Lankhorst &lt;maarten.lankhorst@linux.intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
drm-misc-next for v6.12:

UAPI Changes:
- Add panthor/DEV_QUERY_TIMESTAMP_INFO query.

Cross-subsystem Changes:
- Updated dt bindings.
- Add documentation explaining default errnos for fences.
- Mark dma-buf heaps creation functions as __init.

Core Changes:
- Split DSC helpers from DP helpers.
- Clang build fixes for drm/mm test.
- Remove simple pipeline support for gem-vram,
  no longer any users left after converting bochs.
- Add erno to drm_sched_start to distinguish between GPU and queue
  reset.
- Add drm_framebuffer testcases.
- Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n.
- Use read_trylock instead of read_lock in dma_fence_begin_signalling to
  quiesce lockdep.

Driver Changes:
- Assorted small fixes and updates for tegra, host1x, imagination,
  nouveau, panfrost, panthor, panel/ili9341, mali, exynos,
  panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a,
  bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050,
  panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d.
- Add bridge/TI TDP158.
- Assorted documentation updates.
- Convert bochs from simple drm to gem shmem, and check modes
  against available memory.
- Many VC4 fixes, most related to scaling and YUV support.
- Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS.
- Rockchip 4k@60 support.

Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
From: Maarten Lankhorst &lt;maarten.lankhorst@linux.intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: get rid of bogus includes of fdtable.h</title>
<updated>2024-09-10T17:44:30+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-06-04T01:49:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4c3140fea6277e1478ba206e8f45b9cf7a128cb9'/>
<id>4c3140fea6277e1478ba206e8f45b9cf7a128cb9</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: add optional errno to drm_sched_start()</title>
<updated>2024-09-06T16:05:52+00:00</updated>
<author>
<name>Christian König</name>
<email>ckoenig.leichtzumerken@gmail.com</email>
</author>
<published>2024-08-26T12:25:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b2ef808786d93df36585cee42cfb973fc41636eb'/>
<id>b2ef808786d93df36585cee42cfb973fc41636eb</id>
<content type='text'>
The current implementation of drm_sched_start uses a hardcoded
-ECANCELED to dispose of a job when the parent/hw fence is NULL.
This results in drm_sched_job_done being called with -ECANCELED for
each job with a NULL parent in the pending list, making it difficult
to distinguish between recovery methods, whether a queue reset or a
full GPU reset was used.

To improve this, we first try a soft recovery for timeout jobs and
use the error code -ENODATA. If soft recovery fails, we proceed with
a queue reset, where the error code remains -ENODATA for the job.
Finally, for a full GPU reset, we use error codes -ECANCELED or
-ETIME. This patch adds an error code parameter to drm_sched_start,
allowing us to differentiate between queue reset and GPU reset
failures. This enables user mode and test applications to validate
the expected correctness of the requested operation. After a
successful queue reset, the only way to continue normal operation is
to call drm_sched_job_done with the specific error code -ENODATA.

v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain
    and amdgpu_device_unlock_reset_domain to allow user mode to track
    the queue reset status and distinguish between queue reset and
    GPU reset.
v2: Christian suggested using the error codes -ENODATA for queue reset
    and -ECANCELED or -ETIME for GPU reset, returned to
    amdgpu_cs_wait_ioctl.
v3: To meet the requirements, we introduce a new function
    drm_sched_start_ex with an additional parameter to set
    dma_fence_set_error, allowing us to handle the specific error
    codes appropriately and dispose of bad jobs with the selected
    error code depending on whether it was a queue reset or GPU reset.
v4: Alex suggested using a new name, drm_sched_start_with_recovery_error,
    which more accurately describes the function's purpose.
    Additionally, it was recommended to add documentation details
    about the new method.
v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex)
v6 (chk): rebase on upstream changes, cleanup the commit message,
          drop the new function again and update all callers,
          apply the errno also to scheduler fences with hw fences
v7 (chk): rebased

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current implementation of drm_sched_start uses a hardcoded
-ECANCELED to dispose of a job when the parent/hw fence is NULL.
This results in drm_sched_job_done being called with -ECANCELED for
each job with a NULL parent in the pending list, making it difficult
to distinguish between recovery methods, whether a queue reset or a
full GPU reset was used.

To improve this, we first try a soft recovery for timeout jobs and
use the error code -ENODATA. If soft recovery fails, we proceed with
a queue reset, where the error code remains -ENODATA for the job.
Finally, for a full GPU reset, we use error codes -ECANCELED or
-ETIME. This patch adds an error code parameter to drm_sched_start,
allowing us to differentiate between queue reset and GPU reset
failures. This enables user mode and test applications to validate
the expected correctness of the requested operation. After a
successful queue reset, the only way to continue normal operation is
to call drm_sched_job_done with the specific error code -ENODATA.

v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain
    and amdgpu_device_unlock_reset_domain to allow user mode to track
    the queue reset status and distinguish between queue reset and
    GPU reset.
v2: Christian suggested using the error codes -ENODATA for queue reset
    and -ECANCELED or -ETIME for GPU reset, returned to
    amdgpu_cs_wait_ioctl.
v3: To meet the requirements, we introduce a new function
    drm_sched_start_ex with an additional parameter to set
    dma_fence_set_error, allowing us to handle the specific error
    codes appropriately and dispose of bad jobs with the selected
    error code depending on whether it was a queue reset or GPU reset.
v4: Alex suggested using a new name, drm_sched_start_with_recovery_error,
    which more accurately describes the function's purpose.
    Additionally, it was recommended to add documentation details
    about the new method.
v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex)
v6 (chk): rebase on upstream changes, cleanup the commit message,
          drop the new function again and update all callers,
          apply the errno also to scheduler fences with hw fences
v7 (chk): rebased

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'amd-drm-next-6.12-2024-08-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next</title>
<updated>2024-08-27T12:33:12+00:00</updated>
<author>
<name>Daniel Vetter</name>
<email>daniel.vetter@ffwll.ch</email>
</author>
<published>2024-08-27T12:33:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e55ef65510a401862b902dc979441ea10ae25c61'/>
<id>e55ef65510a401862b902dc979441ea10ae25c61</id>
<content type='text'>
amd-drm-next-6.12-2024-08-26:

amdgpu:
- SDMA devcoredump support
- DCN 4.0.1 updates
- DC SUBVP fixes
- Refactor OPP in DC
- Refactor MMHUBBUB in DC
- DC DML 2.1 updates
- DC FAMS2 updates
- RAS updates
- GFX12 updates
- VCN 4.0.3 updates
- JPEG 4.0.3 updates
- Enable wave kill (soft recovery) for compute queues
- Clean up CP error interrupt handling
- Enable CP bad opcode interrupts
- VCN 4.x fixes
- VCN 5.x fixes
- GPU reset fixes
- Fix vbios embedded EDID size handling
- SMU 14.x updates
- Misc code cleanups and spelling fixes
- VCN devcoredump support
- ISP MFD i2c support
- DC vblank fixes
- GFX 12 fixes
- PSR fixes
- Convert vbios embedded EDID to drm_edid
- DCN 3.5 updates
- DMCUB updates
- Cursor fixes
- Overdrive support for SMU 14.x
- GFX CP padding optimizations
- DCC fixes
- DSC fixes
- Preliminary per queue reset infrastructure
- Initial per queue reset support for GFX 9
- Initial per queue reset support for GFX 7, 8
- DCN 3.2 fixes
- DP MST fixes
- SR-IOV fixes
- GFX 9.4.3/4 devcoredump support
- Add process isolation framework
- Enable process isolation support for GFX 9.4.3/4
- Take IOMMU remapping into account for P2P DMA checks

amdkfd:
- CRIU fixes
- Improved input validation for user queues
- HMM fix
- Enable process isolation support for GFX 9.4.3/4
- Initial per queue reset support for GFX 9
- Allow users to target recommended SDMA engines

radeon:
- remove .load and drm_dev_alloc
- Fix vbios embedded EDID size handling
- Convert vbios embedded EDID to drm_edid
- Use GEM references instead of TTM
- r100 cp init cleanup
- Fix potential overflows in evergreen CS offset tracking

UAPI:
- KFD support for targetting queues on recommended SDMA engines
  Proposed userspace:
  https://github.com/ROCm/ROCR-Runtime/commit/2f588a24065f41c208c3701945e20be746d8faf7
  https://github.com/ROCm/ROCR-Runtime/commit/eb30a5bbc7719c6ffcf2d2dd2878bc53a47b3f30

drm/buddy:
- Add start address support for trim function

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
amd-drm-next-6.12-2024-08-26:

amdgpu:
- SDMA devcoredump support
- DCN 4.0.1 updates
- DC SUBVP fixes
- Refactor OPP in DC
- Refactor MMHUBBUB in DC
- DC DML 2.1 updates
- DC FAMS2 updates
- RAS updates
- GFX12 updates
- VCN 4.0.3 updates
- JPEG 4.0.3 updates
- Enable wave kill (soft recovery) for compute queues
- Clean up CP error interrupt handling
- Enable CP bad opcode interrupts
- VCN 4.x fixes
- VCN 5.x fixes
- GPU reset fixes
- Fix vbios embedded EDID size handling
- SMU 14.x updates
- Misc code cleanups and spelling fixes
- VCN devcoredump support
- ISP MFD i2c support
- DC vblank fixes
- GFX 12 fixes
- PSR fixes
- Convert vbios embedded EDID to drm_edid
- DCN 3.5 updates
- DMCUB updates
- Cursor fixes
- Overdrive support for SMU 14.x
- GFX CP padding optimizations
- DCC fixes
- DSC fixes
- Preliminary per queue reset infrastructure
- Initial per queue reset support for GFX 9
- Initial per queue reset support for GFX 7, 8
- DCN 3.2 fixes
- DP MST fixes
- SR-IOV fixes
- GFX 9.4.3/4 devcoredump support
- Add process isolation framework
- Enable process isolation support for GFX 9.4.3/4
- Take IOMMU remapping into account for P2P DMA checks

amdkfd:
- CRIU fixes
- Improved input validation for user queues
- HMM fix
- Enable process isolation support for GFX 9.4.3/4
- Initial per queue reset support for GFX 9
- Allow users to target recommended SDMA engines

radeon:
- remove .load and drm_dev_alloc
- Fix vbios embedded EDID size handling
- Convert vbios embedded EDID to drm_edid
- Use GEM references instead of TTM
- r100 cp init cleanup
- Fix potential overflows in evergreen CS offset tracking

UAPI:
- KFD support for targetting queues on recommended SDMA engines
  Proposed userspace:
  https://github.com/ROCm/ROCR-Runtime/commit/2f588a24065f41c208c3701945e20be746d8faf7
  https://github.com/ROCm/ROCR-Runtime/commit/eb30a5bbc7719c6ffcf2d2dd2878bc53a47b3f30

drm/buddy:
- Add start address support for trim function

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdkfd: support per-queue reset on gfx9</title>
<updated>2024-08-06T14:43:18+00:00</updated>
<author>
<name>Jonathan Kim</name>
<email>Jonathan.Kim@amd.com</email>
</author>
<published>2024-06-25T15:22:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ee0a469cf9175aeb6131c0476c4a4a8eb5997dfa'/>
<id>ee0a469cf9175aeb6131c0476c4a4a8eb5997dfa</id>
<content type='text'>
Support per-queue reset for GFX9.  The recommendation is for the driver
to target reset the HW queue via a SPI MMIO register write.

Since this requires pipe and HW queue info and MEC FW is limited to
doorbell reports of hung queues after an unmap failure, scan the HW
queue slots defined by SET_RESOURCES first to identify the user queue
candidates to reset.

Only signal reset events to processes that have had a queue reset.

If queue reset fails, fall back to GPU reset.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Support per-queue reset for GFX9.  The recommendation is for the driver
to target reset the HW queue via a SPI MMIO register write.

Since this requires pipe and HW queue info and MEC FW is limited to
doorbell reports of hung queues after an unmap failure, scan the HW
queue slots defined by SET_RESOURCES first to identify the user queue
candidates to reset.

Only signal reset events to processes that have had a queue reset.

If queue reset fails, fall back to GPU reset.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/scheduler: remove full_recover from drm_sched_start</title>
<updated>2024-07-25T12:05:12+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2024-07-19T15:24:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=83b501c1799a96a41e163973e88826253ffadfb3'/>
<id>83b501c1799a96a41e163973e88826253ffadfb3</id>
<content type='text'>
This was basically just another one of amdgpus hacks. The parameter
allowed to restart the scheduler without turning fence signaling on
again.

That this is absolutely not a good idea should be obvious by now since
the fences will then just sit there and never signal.

While at it cleanup the code a bit.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This was basically just another one of amdgpus hacks. The parameter
allowed to restart the scheduler without turning fence signaling on
again.

That this is absolutely not a good idea should be obvious by now since
the fences will then just sit there and never signal.

While at it cleanup the code a bit.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix the warning info in mode1 reset</title>
<updated>2024-01-31T22:34:05+00:00</updated>
<author>
<name>Ma Jun</name>
<email>Jun.Ma2@amd.com</email>
</author>
<published>2024-01-05T06:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bb34bc2cd3ee284d7992df24a3f7d24f61a59268'/>
<id>bb34bc2cd3ee284d7992df24a3f7d24f61a59268</id>
<content type='text'>
Fix the warning info below during mode1 reset.
[  +0.000004] Call Trace:
[  +0.000004]  &lt;TASK&gt;
[  +0.000006]  ? show_regs+0x6e/0x80
[  +0.000011]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000005]  ? __warn+0x91/0x150
[  +0.000009]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000006]  ? report_bug+0x19d/0x1b0
[  +0.000013]  ? handle_bug+0x46/0x80
[  +0.000012]  ? exc_invalid_op+0x1d/0x80
[  +0.000011]  ? asm_exc_invalid_op+0x1f/0x30
[  +0.000014]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000007]  ? __flush_work.isra.0+0x208/0x390
[  +0.000007]  ? _prb_read_valid+0x216/0x290
[  +0.000008]  __cancel_work_timer+0x11d/0x1a0
[  +0.000007]  ? try_to_grab_pending+0xe8/0x190
[  +0.000012]  cancel_work_sync+0x14/0x20
[  +0.000008]  amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[  +0.000032]  amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
 - Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Signed-off-by: Ma Jun &lt;Jun.Ma2@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix the warning info below during mode1 reset.
[  +0.000004] Call Trace:
[  +0.000004]  &lt;TASK&gt;
[  +0.000006]  ? show_regs+0x6e/0x80
[  +0.000011]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000005]  ? __warn+0x91/0x150
[  +0.000009]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000006]  ? report_bug+0x19d/0x1b0
[  +0.000013]  ? handle_bug+0x46/0x80
[  +0.000012]  ? exc_invalid_op+0x1d/0x80
[  +0.000011]  ? asm_exc_invalid_op+0x1f/0x30
[  +0.000014]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000007]  ? __flush_work.isra.0+0x208/0x390
[  +0.000007]  ? _prb_read_valid+0x216/0x290
[  +0.000008]  __cancel_work_timer+0x11d/0x1a0
[  +0.000007]  ? try_to_grab_pending+0xe8/0x190
[  +0.000012]  cancel_work_sync+0x14/0x20
[  +0.000008]  amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[  +0.000032]  amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
 - Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Signed-off-by: Ma Jun &lt;Jun.Ma2@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'amd-drm-next-6.8-2023-12-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next</title>
<updated>2023-12-05T02:11:41+00:00</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2023-12-05T02:11:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5edfd7d94b0310b74136b666551f1d23711ed445'/>
<id>5edfd7d94b0310b74136b666551f1d23711ed445</id>
<content type='text'>
amd-drm-next-6.8-2023-12-01:

amdgpu:
- Add new 64 bit sequence number infrastructure.
  This will ultimately be used for user queue synchronization.
- GPUVM updates
- Misc code cleanups
- RAS updates
- DCN 3.5 updates
- Rework PCIe link speed handling
- Document GPU reset types
- DMUB fixes
- eDP fixes
- NBIO 7.9 updates
- NBIO 7.11 updates
- SubVP updates
- DCN 3.1.4 fixes
- ABM fixes
- AGP aperture fix
- DCN 3.1.5 fix
- Fix some potential error path memory leaks
- Enable PCIe PMEs
- Add XGMI, PCIe state dumping for aqua vanjaram
- GFX11 golden register updates
- Misc display fixes

amdkfd:
- Migrate TLB flushing logic to amdgpu
- Trap handler fixes
- Fix restore workers handling on suspend and reset
- Fix possible memory leak in pqm_uninit()

radeon:
- Fix some possible overflows in command buffer checking
- Check for errors in ring_lock

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231201181743.5313-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
amd-drm-next-6.8-2023-12-01:

amdgpu:
- Add new 64 bit sequence number infrastructure.
  This will ultimately be used for user queue synchronization.
- GPUVM updates
- Misc code cleanups
- RAS updates
- DCN 3.5 updates
- Rework PCIe link speed handling
- Document GPU reset types
- DMUB fixes
- eDP fixes
- NBIO 7.9 updates
- NBIO 7.11 updates
- SubVP updates
- DCN 3.1.4 fixes
- ABM fixes
- AGP aperture fix
- DCN 3.1.5 fix
- Fix some potential error path memory leaks
- Enable PCIe PMEs
- Add XGMI, PCIe state dumping for aqua vanjaram
- GFX11 golden register updates
- Misc display fixes

amdkfd:
- Migrate TLB flushing logic to amdgpu
- Trap handler fixes
- Fix restore workers handling on suspend and reset
- Fix possible memory leak in pqm_uninit()

radeon:
- Fix some possible overflows in command buffer checking
- Check for errors in ring_lock

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231201181743.5313-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>amdgpu: Adjust kmalloc_array calls for new -Walloc-size</title>
<updated>2023-11-17T14:29:54+00:00</updated>
<author>
<name>Sam James</name>
<email>sam@gentoo.org</email>
</author>
<published>2023-11-05T16:06:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b5a52d2afe1b75f9d51461bb235ca40735e99fe7'/>
<id>b5a52d2afe1b75f9d51461bb235ca40735e99fe7</id>
<content type='text'>
GCC 14 introduces a new -Walloc-size included in -Wextra which errors out
on various files in drivers/gpu/drm/amd/amdgpu like:
```
amdgpu_amdkfd_gfx_v8.c:241:15: error: allocation of insufficient size ‘4’ for type ‘uint32_t[2]’ {aka ‘unsigned int[2]'} with size ‘8’ [-Werror=alloc-size]
```

This is because each HQD_N_REGS is actually a uint32_t[2]. Move the * 2 to
the size argument so GCC sees we're allocating enough.

Originally did 'sizeof(uint32_t) * 2' for the size but a friend suggested
'sizeof(**dump)' better communicates the intent.

Link: https://lore.kernel.org/all/87wmuwo7i3.fsf@gentoo.org/
Signed-off-by: Sam James &lt;sam@gentoo.org&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
GCC 14 introduces a new -Walloc-size included in -Wextra which errors out
on various files in drivers/gpu/drm/amd/amdgpu like:
```
amdgpu_amdkfd_gfx_v8.c:241:15: error: allocation of insufficient size ‘4’ for type ‘uint32_t[2]’ {aka ‘unsigned int[2]'} with size ‘8’ [-Werror=alloc-size]
```

This is because each HQD_N_REGS is actually a uint32_t[2]. Move the * 2 to
the size argument so GCC sees we're allocating enough.

Originally did 'sizeof(uint32_t) * 2' for the size but a friend suggested
'sizeof(**dump)' better communicates the intent.

Link: https://lore.kernel.org/all/87wmuwo7i3.fsf@gentoo.org/
Signed-off-by: Sam James &lt;sam@gentoo.org&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Add drm_sched_wqueue_* helpers</title>
<updated>2023-11-01T21:29:20+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2023-10-31T03:24:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=35963cf2cd25eeea8bdb4d02853dac1e66fb13a0'/>
<id>35963cf2cd25eeea8bdb4d02853dac1e66fb13a0</id>
<content type='text'>
Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
  - s/sched_wqueue/sched_wqueue (Luben)
  - Remove the extra white line after the return-statement (Luben)
  - update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
  - s/sched_wqueue/sched_wqueue (Luben)
  - Remove the extra white line after the return-statement (Luben)
  - update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
