linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v6.9

drm/amd/pm: fix the high voltage issue after unload

2024-04-10T03:26:32+00:00

fix the high voltage issue after unload on smu 13.0.10

Signed-off-by: Kenneth Feng 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amd: Flush GFXOFF requests in prepare stage

2024-03-27T12:55:54+00:00

If the system hasn't entered GFXOFF when suspend starts it can cause
hangs accessing GC and RLC during the suspend stage.

Cc:  # 6.1.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc:  # 6.1.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc:  # 6.1.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc:  # 6.1.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend")
Cc:  # 6.6.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc:  # 6.6.y: cb11ca3233aa ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc:  # 6.6.y: 2ceec37b0e3d ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc:  # 6.6.y: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend")
Cc:  # 6.1+
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132
Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Reviewed-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
Signed-off-by: Alex Deucher

Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()"

2024-03-20T17:12:59+00:00

This patch causes the following iounmap erorr and calltrace
iounmap: bad address 00000000d0b3631f

The original patch was unjustified because amdgpu_device_fini_sw() will
always cleanup the rmmio mapping.

This reverts commit eb4f139888f636614dab3bcce97ff61cefc4b3a7.

Signed-off-by: Ma Jun 
Suggested-by: Christian König 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: disable ring_muxer if mcbp is off

2024-03-06T20:24:49+00:00

Using the ring_muxer without preemption adds overhead for no
reason since mcbp cannot be triggered.

Moving back to a single queue in this case also helps when
high priority app are used: in this case the gpu_scheduler
priority handling will work as expected - much better than
ring_muxer with its 2 independant schedulers competing for
the same hardware queue.

This change requires moving amdgpu_device_set_mcbp above
amdgpu_device_ip_early_init because we use adev->gfx.mcbp.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Reviewed-by: Alex Deucher 
Acked-by: Christian König 
Acked-by: Jiadong Zhu 
Signed-off-by: Alex Deucher

drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()

2024-02-27T16:06:58+00:00

This ensures that the memory mapped by ioremap for adev->rmmio, is
properly handled in amdgpu_device_init(). If the function exits early
due to an error, the memory is unmapped. If the function completes
successfully, the memory remains mapped.

Reported by smatch:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4337 amdgpu_device_init() warn: 'adev->rmmio' from ioremap() not released on lines: 4035,4045,4051,4058,4068,4337

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Add fatal error detected flag

2024-02-26T16:14:24+00:00

For a RAS error that needs a full reset to recover, set the fatal error
status. Clear the status once the device is reset.

Signed-off-by: Lijo Lazar 
Reviewed-by: Asad Kamal 
Signed-off-by: Alex Deucher

Revert "drm/amd: flush any delayed gfxoff on suspend entry"

2024-02-13T13:59:50+00:00

commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF control to be used more heavily and the
codepath that was removed from commit 0dee72639533 ("drm/amd: flush any
delayed gfxoff on suspend entry") now can be exercised at suspend again.

Users report that by using GNOME to suspend the lockscreen trigger will
cause SDMA traffic and the system can deadlock.

This reverts commit 0dee726395333fea833eaaf838bc80962df886c8.

Acked-by: Alex Deucher 
Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Signed-off-by: Mario Limonciello 
Signed-off-by: Alex Deucher

drm/amd: Stop evicting resources on APUs in suspend

2024-02-13T13:59:49+00:00

commit 5095d5418193 ("drm/amd: Evict resources during PM ops prepare()
callback") intentionally moved the eviction of resources to earlier in
the suspend process, but this introduced a subtle change that it occurs
before adev->in_s0ix or adev->in_s3 are set. This meant that APUs
actually started to evict resources at suspend time as well.

Explicitly set s0ix or s3 in the prepare() stage, and unset them if the
prepare() stage failed.

v2: squash in warning fix from Stephen Rothwell

Reported-by: Jürg Billeter 
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038
Fixes: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback")
Acked-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
Signed-off-by: Alex Deucher

drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriov

2024-01-31T19:05:18+00:00

Need to resume ras during gpu reset for
gfx v9_4_3 sriov

Signed-off-by: YiPeng Chai 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: Fix the warning info in mode1 reset

2024-01-31T14:40:42+00:00

Fix the warning info below during mode1 reset.
[  +0.000004] Call Trace:
[  +0.000004]  
[  +0.000006]  ? show_regs+0x6e/0x80
[  +0.000011]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000005]  ? __warn+0x91/0x150
[  +0.000009]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000006]  ? report_bug+0x19d/0x1b0
[  +0.000013]  ? handle_bug+0x46/0x80
[  +0.000012]  ? exc_invalid_op+0x1d/0x80
[  +0.000011]  ? asm_exc_invalid_op+0x1f/0x30
[  +0.000014]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000007]  ? __flush_work.isra.0+0x208/0x390
[  +0.000007]  ? _prb_read_valid+0x216/0x290
[  +0.000008]  __cancel_work_timer+0x11d/0x1a0
[  +0.000007]  ? try_to_grab_pending+0xe8/0x190
[  +0.000012]  cancel_work_sync+0x14/0x20
[  +0.000008]  amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[  +0.000032]  amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
 - Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher 
Signed-off-by: Vitaly Prosyak 
Signed-off-by: Ma Jun 
Signed-off-by: Alex Deucher