linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v6.0.8

drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume

2022-11-03T15:00:21+00:00

commit d61e1d1d5225a9baeb995bcbdb904f66f70ed87e upstream.

In the S2idle suspend/resume phase the gfxoff is keeping functional so
some IP blocks will be likely to reinitialize at gfxoff entry and that
will result in failing to program GC registers.Therefore, let disallow
gfxoff until AMDGPU IPs reinitialized completely.

Signed-off-by: Prike Liang 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org # 5.15.x
Signed-off-by: Greg Kroah-Hartman

drm/amd/pm: disable cstate feature for gpu reset scenario

2022-10-26T10:22:56+00:00

commit 3059cd8c5f797ad83d2b194ae66339f5c007ca43 upstream.

Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan 
Reviewed-by: Hawking Zhang 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher 
Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV

2022-09-27T22:03:36+00:00

- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
  during the suspend time. Furthermore, we cannot request a
  mode 1 reset under SRIOV as VF. Therefore, we will skip it
  as it is called in suspend_noirq() function.

- In the resume code path, we need to send REQ_GPU_INIT to the
  hypervisor and also resume PSP IP block under SRIOV.

Signed-off-by: Bokun Zhang 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: make sure to init common IP before gmc

2022-09-14T18:21:49+00:00

Move common IP init before GMC init so that HDP gets
remapped before GMC init which uses it.

This fixes the Unsupported Request error reported through
AER during driver load. The error happens as a write happens
to the remap offset before real remapping is done.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373

The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.

Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")

Acked-by: Christian König 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: ensure no PCIe peer access for CPU XGMI iolinks

2022-08-30T21:07:43+00:00

[Why] Devices with CPU XGMI iolink do not support PCIe peer access.

Signed-off-by: Alex Sierra 
Acked-by: Alex Deucher 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: Remove the additional kfd pre reset call for sriov

2022-08-19T21:06:38+00:00

The additional call is caused by merge conflict

Reviewed-by: Felix Kuehling 
Signed-off-by: shaoyunl 
Signed-off-by: Alex Deucher

drm/amdgpu: fix hive reference leak when adding xgmi device

2022-08-19T21:06:21+00:00

Only amdgpu_get_xgmi_hive but no amdgpu_put_xgmi_hive
which will leak the hive reference.

Signed-off-by: YiPeng Chai 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: Avoid another list of reset devices

2022-08-10T19:07:14+00:00

A list of devices to be reset is already created in
amdgpu_device_gpu_recover function. Creating another list with the
same nodes is incorrect and not supported in list_head. Instead, pass
the device list as part of reset context.

Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2)
Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: move mes self test after drm sched re-started

2022-07-28T20:28:54+00:00

mes self test rely on vm mapping, move it after
drm sched re-started so that vm mapping can work
during gpu reset.

Signed-off-by: Jack Xiao 
Acked-and-tested-by: Evan Quan 
Signed-off-by: Alex Deucher

drm/amdgpu: drop non-necessary call trace dump

2022-07-28T20:28:54+00:00

This extra call trace dump comes out in every gpu reset.
And it gives people a wrong impression that something
went wrong. Although actually there was not.

Signed-off-by: Evan Quan 
Acked-by: Christian König 
Signed-off-by: Alex Deucher