linux.git/drivers/gpu/drm/amd/amdgpu, branch v6.5

drm/amd: flush any delayed gfxoff on suspend entry

2023-08-16T19:46:39+00:00

DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry.  This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.

To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.

commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.

This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code.  Remove that dead code.

Signed-off-by: Mario Limonciello 
Signed-off-by: Tim Huang 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix

2023-08-16T19:46:39+00:00

GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.

[  203.349571] [drm] Fence fallback timer expired on ring sdma0
[  203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[  203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0

For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.

Signed-off-by: Tim Huang 
Reviewed-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: skip xcp drm device allocation when out of drm resource

2023-08-16T19:46:39+00:00

Return 0 when drm device alloc failed with -ENOSPC in
order to  allow amdgpu drive loading. But the xcp without
drm device node assigned won't be visiable in user space.
This helps amdgpu driver loading on system which has more
than 64 nodes, the current limitation.

The proposal to add more drm nodes is discussed in public,
which will support up to 2^20 nodes totally.
kernel drm:
https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/
libdrm:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305

Signed-off-by: James Zhu 
Acked-by: Christian König 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: disable mcbp if parameter zero is set

2023-08-16T19:46:33+00:00

The parameter amdgpu_mcbp shall have priority against the default value
calculated from the chip version.
User could disable mcbp by setting the parameter mcbp as zero.

v2: do not trigger preemption in sw ring muxer when mcbp is disabled.

Signed-off-by: Jiadong Zhu 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV

2023-08-09T14:56:14+00:00

This is only required for SR-IOV world switches, but it
adds additional latency leading to reduced performance in
some benchmarks.  Disable for now on bare metal.

Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: fix possible UAF in amdgpu_cs_pass1()

2023-08-09T14:55:14+00:00

Since the gang_size check is outside of chunk parsing
loop, we need to reset i before we free the chunk data.

Suggested by Ye Zhang (@VAR10CK) of Baidu Security.

Reviewed-by: Guchun Chen 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: Match against exact bootloader status

2023-08-09T14:36:34+00:00

On PSP v13.x ASICs, boot loader will set only the MSB to 1 and clear the
least significant bits for any command submission. Hence match against
the exact register value, otherwise a register value of all 0xFFs also
could falsely indicate that boot loader is ready. Also, from PSP v13.0.6
and newer, bits[7:0] will be used to indicate command error status.

Signed-off-by: Lijo Lazar 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amd: Disable S/G for APUs when 64GB or more host memory

2023-08-09T14:34:01+00:00

Users report a white flickering screen on multiple systems that
is tied to having 64GB or more memory.  When S/G is enabled pages
will get pinned to both VRAM carve out and system RAM leading to
this.

Until it can be fixed properly, disable S/G when 64GB of memory or
more is detected.  This will force pages to be pinned into VRAM.
This should fix white screen flickers but if VRAM pressure is
encountered may lead to black screens.  It's a trade-off for now.

Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Cc: Hamza Mahfooz 
Cc: Roman Li 
Cc:  # 6.1.y: bf0207e172703 ("drm/amdgpu: add S/G display parameter")
Cc:  # 6.4.y
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2735
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Signed-off-by: Mario Limonciello 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Restore HQD persistent state register

2023-07-25T20:26:38+00:00

On GFX v9.4.3, compute queue MQD is populated using the values in HQD
persistent state register. Hence don't clear the values on module
unload, instead restore it to the default reset value so that MQD is
initialized correctly during next module load. In particular, preload
flag needs to be set on compute queue MQD, otherwise it could cause
uninitialized values being used at device reset state resulting in EDC.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
Reviewed-by: Asad Kamal 
Signed-off-by: Alex Deucher

drm/amd: Fix an error handling mistake in psp_sw_init()

2023-07-25T20:16:57+00:00

If the second call to amdgpu_bo_create_kernel() fails, the memory
allocated from the first call should be cleared.  If the third call
fails, the memory from the second call should be cleared.

Fixes: b95b5391684b ("drm/amdgpu/psp: move PSP memory alloc from hw_init to sw_init")
Signed-off-by: Mario Limonciello 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher