linux-stable.git/drivers/gpu/drm/amd/amdgpu, branch v5.15.2

drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8"

2021-11-06T13:13:31+00:00

commit c8365dbda056578eebe164bf110816b1a39b4b7f upstream.

This reverts commit 728e7e0cd61899208e924472b9e641dbeb0775c4.

Further discussion reveals that this feature is severely broken
and needs to be reverted ASAP.

GPU reset can never be delayed by userspace even for debugging or
otherwise we can run into in kernel deadlocks.

Signed-off-by: Christian König 
Acked-by: Alex Deucher 
Acked-by: Nirmoy Das 
Signed-off-by: Alex Deucher 
Signed-off-by: Greg Kroah-Hartman

drm/amdkfd: fix boot failure when iommu is disabled in Picasso.

2021-11-06T13:13:30+00:00

commit afd18180c07026f94a80ff024acef5f4159084a4 upstream.

When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2
init will fail. But this failure should not block amdgpu driver init.

Reported-by: youling 
Tested-by: youling 
Signed-off-by: Yifan Zhang 
Reviewed-by: James Zhu 
Signed-off-by: Alex Deucher 
Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: support B0&B1 external revision id for yellow carp

2021-10-20T19:27:31+00:00

B0 internal rev_id is 0x01, B1 internal rev_id is 0x02.
The external rev_id for B0 and B1 is 0x20.
The original expression is not suitable for B1.

v2: squash in fix for display code (Alex)

Signed-off-by: Aaron Liu 
Reviewed-by: Huang Rui 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume

2021-10-05T17:02:31+00:00

In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.

To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.

Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky 
Signed-off-by: Guchun Chen 
Reviewed-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher

drm/amdgpu: init iommu after amdkfd device init

2021-10-05T17:02:20+00:00

This patch is to fix clinfo failure in Raven/Picasso:

Number of platforms: 1
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 2.2 AMD-APP (3364.0)
  Platform Name: AMD Accelerated Parallel Processing
  Platform Vendor: Advanced Micro Devices, Inc.
  Platform Extensions: cl_khr_icd cl_amd_event_callback

  Platform Name: AMD Accelerated Parallel Processing Number of devices: 0

Signed-off-by: Yifan Zhang 
Reviewed-by: James Zhu 
Tested-by: James Zhu 
Acked-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: During s0ix don't wait to signal GFXOFF

2021-10-05T14:55:07+00:00

In the rare event when GFX IP suspend coincides with a s0ix entry, don't
schedule a delayed work, instead signal PMFW immediately to allow GFXOFF
entry. GFXOFF is a prerequisite for s0ix entry. PMFW needs to be
signaled about GFXOFF status before amd-pmc module passes OS HINT
to PMFW telling that everything is ready for a safe s0ix entry.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1712

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Reviewed-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdkfd: fix a potential ttm->sg memory leak

2021-10-05T14:53:45+00:00

Memory is allocated for ttm->sg by kmalloc in kfd_mem_dmamap_userptr,
but isn't freed by kfree in kfd_mem_dmaunmap_userptr. Free it!

Fixes: 264fb4d332f5 ("drm/amdgpu: Add multi-GPU DMA mapping helpers")

Signed-off-by: Lang Yu 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix

2021-09-28T18:40:27+00:00

In the s2idle stress test sdma resume fail occasionally,in the
failed case GPU is in the gfxoff state.This issue may introduce
by firmware miss handle doorbell S/R and now temporary fix the issue
by forcing exit gfxoff for sdma resume.

Signed-off-by: Prike Liang 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: check tiling flags when creating FB on GFX8-

2021-09-28T18:40:19+00:00

On GFX9+, format modifiers are always enabled and ensure the
frame-buffers can be scanned out at ADDFB2 time.

On GFX8-, format modifiers are not supported and no other check
is performed. This means ADDFB2 IOCTLs will succeed even if the
tiling isn't supported for scan-out, and will result in garbage
displayed on screen [1].

Fix this by adding a check for tiling flags for GFX8 and older.
The check is taken from radeonsi in Mesa (see how is_displayable
is populated in gfx6_compute_surface).

Changes in v2: use drm_WARN_ONCE instead of drm_WARN (Michel)

[1]: https://github.com/swaywm/wlroots/issues/3185

Signed-off-by: Simon Ser 
Acked-by: Michel Dänzer 
Cc: Alex Deucher 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Bas Nieuwenhuizen 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: correct initial cp_hqd_quantum for gfx9

2021-09-28T18:39:29+00:00

didn't read the value of mmCP_HQD_QUANTUM from correct
register offset

Signed-off-by: Hawking Zhang 
Reviewed-by: Le Ma 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org