linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu.h, branch v6.5

drm/amd: Disable S/G for APUs when 64GB or more host memory

2023-08-09T14:34:01+00:00

Users report a white flickering screen on multiple systems that
is tied to having 64GB or more memory.  When S/G is enabled pages
will get pinned to both VRAM carve out and system RAM leading to
this.

Until it can be fixed properly, disable S/G when 64GB of memory or
more is detected.  This will force pages to be pinned into VRAM.
This should fix white screen flickers but if VRAM pressure is
encountered may lead to black screens.  It's a trade-off for now.

Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Cc: Hamza Mahfooz 
Cc: Roman Li 
Cc:  # 6.1.y: bf0207e172703 ("drm/amdgpu: add S/G display parameter")
Cc:  # 6.4.y
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2735
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Signed-off-by: Mario Limonciello 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amd: Move helper for dynamic speed switch check out of smu13

2023-07-12T16:09:54+00:00

This helper is used for checking if the connected host supports
the feature, it can be moved into generic code to be used by other
smu implementations as well.

Signed-off-by: Mario Limonciello 
Reviewed-by: Evan Quan 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org # 6.1.x

drm/amd/pm: avoid unintentional shutdown due to temperature momentary fluctuation

2023-06-30T17:12:16+00:00

An intentional delay is added on soft ctf triggered. Then there will
be a double check for the GPU temperature before taking further
action. This can avoid unintended shutdown due to temperature
momentary fluctuation.

Signed-off-by: Evan Quan 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher

drm/amdgpu: Modify for_each_inst macro

2023-06-30T17:11:35+00:00

Modify it such that it doesn't change the instance mask parameter.

Signed-off-by: Lijo Lazar 
Acked-by: Victor Skvortsov 
Signed-off-by: Alex Deucher

drm/amdgpu: add option params to enforce process isolation between graphics and compute

2023-06-09T16:49:48+00:00

enforce process isolation between graphics and compute via using the same reserved vmid.

v2: remove params "struct amdgpu_vm *vm" from
    amdgpu_vmid_alloc_reserved and amdgpu_vmid_free_reserved.

Signed-off-by: Chong Li 
Reviewed-by: Christian Koenig 
Signed-off-by: Alex Deucher

drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-06-09T16:35:15+00:00

On GFX9.4.1, the implicit wait count instruction on s_barrier is
disabled by default in the driver during normal operation for
performance requirements.

There is a hardware bug in GFX9.4.1 where if the implicit wait count
instruction after an s_barrier instruction is disabled, any wave that
hits an exception may step over the s_barrier when returning from the
trap handler with the barrier logic having no ability to be
aware of this, thereby causing other waves to wait at the barrier
indefinitely resulting in a shader hang.  This bug has been corrected
for GFX9.4.2 and onward.

Since the debugger subscribes to hardware exceptions, in order to avoid
this bug, the debugger must enable implicit wait count on s_barrier
for a debug session and disable it on detach.

In order to change this setting in the in the device global SQ_CONFIG
register, the GFX pipeline must be idle.  GFX9.4.1 as a compute device
will either dispatch work through the compute ring buffers used for
image post processing or through the hardware scheduler by the KFD.

Have the KGD suspend and drain the compute ring buffer, then suspend the
hardware scheduler and block any future KFD process job requests before
changing the implicit wait count setting.  Once set, resume all work.

Signed-off-by: Jonathan Kim 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: fix vga_set_state NULL pointer issue

2023-06-09T14:41:40+00:00

Fix NULL pointer issue for vga_set_state function
as not all the ASIC need this operation.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: convert logical instance mask to physical one

2023-06-09T14:37:08+00:00

Convert instance mask for the convenience of RAS TA.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
Reviewed-by: Stanley.Yang 
Signed-off-by: Alex Deucher

drm/amdgpu: find partition ID when open device

2023-06-09T13:59:27+00:00

Find partition ID when open device from render device minor.

Signed-off-by: Christian König 
Signed-off-by: James Zhu 
Reviewed-and-tested-by: Philip Yang
Signed-off-by: Alex Deucher

drm/amdgpu: support partition drm devices

2023-06-09T13:59:20+00:00

Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3).

This is a temporary solution and will be superceded.

Signed-off-by: Christian König 
Signed-off-by: James Zhu 
Reviewed-and-tested-by: Philip Yang
Signed-off-by: Alex Deucher