linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c, branch v5.18

drm/amd: Don't reset dGPUs if the system is going to s2idle

2022-05-18T19:24:35+00:00

An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.

This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu:
always reset the asic in suspend (v2)").  A few other commits have
gone on top of the ASIC reset, but this still doesn't work on the A+A
configuration in s2idle.

Avoid doing the reset on dGPUs specifically when using s2idle.

Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008
Reviewed-by: Alex Deucher 
Signed-off-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: don't runtime suspend if there are displays attached (v3)

2022-04-27T21:18:53+00:00

We normally runtime suspend when there are displays attached if they
are in the DPMS off state, however, if something wakes the GPU
we send a hotplug event on resume (in case any displays were connected
while the GPU was in suspend) which can cause userspace to light
up the displays again soon after they were turned off.

Prior to
commit 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."),
the driver took a runtime pm reference when the fbdev emulation was
enabled because we didn't implement proper shadowing support for
vram access when the device was off so the device never runtime
suspended when there was a console bound.  Once that commit landed,
we now utilize the core fb helper implementation which properly
handles the emulation, so runtime pm now suspends in cases where it did
not before.  Ultimately, we need to sort out why runtime suspend in not
working in this case for some users, but this should restore similar
behavior to before.

v2: move check into runtime_suspend
v3: wake ups -> wakeups in comment, retain pm_runtime behavior in
    runtime_idle callback

Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Link: https://lore.kernel.org/r/20220403132322.51c90903@darkstar.example.org/
Tested-by: Michele Ballabio 
Reviewed-by: Evan Quan 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: Ensure HDA function is suspended before ASIC reset

2022-04-14T02:21:58+00:00

DP/HDMI audio on AMD PRO VII stops working after S3:
[  149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
[  149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
[  149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
[  149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
[  150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
...
[  155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535

The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the asic in
suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for
reset in S3 ") doesn't help, so the issue is something different.

Assuming that to make HDA resume to D0 fully realized, it needs to be
successfully put to D3 first. And this guesswork proves working, by
moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
function is in D3.

Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Kai-Heng Feng 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdkfd: Fix Incorrect VMIDs passed to HWS

2022-03-25T16:40:25+00:00

Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS

v2: squash in warning fix (Alex)

Signed-off-by: Tushar Patel 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu/vcn: Add vcn firmware log

2022-03-04T18:03:30+00:00

vcn fwlog is for debugging purpose only,
by default, it is disabled.

Signed-off-by: Ruijing Dong 
Reviewed-by: Leo Liu 
Signed-off-by: Alex Deucher

drm/amdgpu: Bump minor version for hot plug tests enabling.

2022-03-02T23:40:06+00:00

This will allow to enable the tests only after latest fix
after which the tests passed on my system.

I tested on NV21 standalone and Vega 10 and Polaris as
pair with DRI_PRIME.

It's possible there might be still issues on ASICs i don't
have at my posession but that that the point of enbling
the tests finally - if other people during testing will
encounter errors they will report and I will be able to fix.

The releated merge request for enabling libdrm tests suite  is in
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

Merge tag 'amd-drm-next-5.18-2022-02-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

2022-03-01T06:19:02+00:00

amd-drm-next-5.18-2022-02-25:

amdgpu:
- Raven2 suspend/resume fix
- SDMA 5.2.6 updates
- VCN 3.1.2 updates
- SMU 13.0.5 updates
- DCN 3.1.5 updates
- Virtual display fixes
- SMU code cleanup
- Harvest fixes
- Expose benchmark tests via debugfs
- Drop no longer relevant gart aperture tests
- More RAS restructuring
- W=1 fixes
- PSR rework
- DP/VGA adapter fixes
- DP MST fixes
- GPUVM eviction fix
- GPU reset debugfs register dumping support
- Misc display fixes
- SR-IOV fix
- Aldebaran mGPU fix
- Add module parameter to disable XGMI for testing

amdkfd:
- IH ring overflow logging fixes
- CRIU fixes
- Misc fixes

Signed-off-by: Dave Airlie 
From: Alex Deucher 
Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com

Backmerge tag 'v5.17-rc6' into drm-next

2022-02-28T04:57:14+00:00

This backmerges v5.17-rc6 so I can merge some amdgpu and some tegra changes on top.

Signed-off-by: Dave Airlie

drm/amdgpu: Exclude PCI reset method for now.

2022-02-24T22:25:20+00:00

According to my investigation of the state of PCI
reset recently it's not working. The reason is
due to the fact the kernel PCI code rejects SBR
when there are more then one PF under same bridge
which we always have (at least AUDIO PF but usually
more) and that because SBR will reset all the PFS
and devices under the same bridge as you and you
cannot assume they support SBR.
Once we anble FLR support we can reenable this option as
FLR is doable on single PF and doens't have this
restriction.

Reviewed-by: Alex Deucher 
Signed-off-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher

drm/amdgpu: Add use_xgmi_p2p module parameter

2022-02-24T22:25:12+00:00

This parameter controls xGMI p2p communication, which is enabled by
default. However, it can be disabled by setting it to 0. In case xGMI
p2p is disabled in a dGPU, PCIe p2p interface will be used instead.
This parameter is ignored in GPUs that do not support xGMI
p2p configuration.

Signed-off-by: Alex Sierra 
Acked-by: Luben Tuikov 
Acked-by: Harish Kasiviswanathan 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher