linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu.h, branch linux-5.10.y

drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume

2025-08-28T14:22:55+00:00

[ Upstream commit 248b061689a40f4fed05252ee2c89f87cf26d7d8 ]

In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.

To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.

Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky 
Signed-off-by: Guchun Chen 
Reviewed-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
[Shivani: Modified to apply on 5.10.y]
Signed-off-by: Shivani Agarwal 
Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: Increase tlb flush timeout for sriov

2022-09-05T08:28:58+00:00

[ Upstream commit 373008bfc9cdb0f050258947fa5a095f0657e1bc ]

[Why]
During multi-vf executing benchmark (Luxmark) observed kiq error timeout.
It happenes because all of VFs do the tlb invalidation at the same time.
Although each VF has the invalidate register set, from hardware side
the invalidate requests are queue to execute.

[How]
In case of 12 VF increase timeout on 12*100ms

Signed-off-by: Dusica Milinkovic 
Acked-by: Shaoyun Liu 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin

drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10

2021-09-22T10:27:55+00:00

commit 67a44e659888569a133a8f858c8230e9d7aad1d5 upstream.

Seems like newer cards can have even more instances now.
Found by UBSAN: array-index-out-of-bounds in
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29
index 8 is out of range for type 'uint32_t *[8]'

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697
Cc: stable@vger.kernel.org
Signed-off-by: Ernst Sjöstrand 
Signed-off-by: Alex Deucher 
Signed-off-by: Greg Kroah-Hartman

drm/amd/display: Add a backlight module option

2021-03-17T16:06:18+00:00

commit 7a46f05e5e163c00e41892e671294286e53fe15c upstream.

There seem devices that don't work with the aux channel backlight
control.  For allowing such users to test with the other backlight
control method, provide a new module option, aux_backlight, to specify
enabling or disabling the aux backport support explicitly.  As
default, the aux support is detected by the hardware capability.

v2: make the backlight option generic in case we add future
backlight types (Alex)

BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1180749
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1438
Reviewed-by: Nicholas Kazlauskas 
Signed-off-by: Takashi Iwai 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: support indirect access reg outside of mmio bar (v2)

2020-10-01T14:42:55+00:00

support both direct and indirect accessor in unified
helper functions.

v2: Retire indirect mmio access via mm_index/data

Signed-off-by: Hawking Zhang 
Reviewed-by: Christian König 
Reviewed-by: Alex Deucher 
Reviewed-by: Kevin Wang 
Reviewed-by: Guchun Chen 
Signed-off-by: Alex Deucher

drm/amdgpu: add helper function for indirect reg access (v3)

2020-10-01T14:42:13+00:00

Add helper function in order to remove RREG32/WREG32
in current pcie_rreg/wreg function for soc15 and
onwards adapters.
PCIE_INDEX/DATA pairs are used to access regsiters
outside of mmio bar in the helper functions.
The new helper functions help remove the recursion
of amdgpu_mm_rreg/wreg from pcie_rreg/wreg and
provide the oppotunity to centralize direct and
indirect access in a single function.

v2: Fixed typo and refine the comments

v3: Remove unnecessary volatile local variable

Signed-off-by: Hawking Zhang 
Reviewed-by: Alex Deucher 
Reviewed-by: Kevin Wang 
Reviewed-by: Christian König 
Reviewed-by: Guchun Chen 
Signed-off-by: Alex Deucher

drm/amdgpu: use function pointer for gfxhub functions

2020-09-30T17:50:13+00:00

gfxhub functions are now called from function pointers,
instead of from asic-specific functions.

Signed-off-by: Oak Zeng 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Fix consecutive DPC recovery failures.

2020-09-15T21:25:04+00:00

Cache the PCI state on boot and before each case where we might
loose it.

v2: Add pci_restore_state while caching the PCI state to avoid
breaking PCI core logic for stuff like suspend/resume.

v3: Extract pci_restore_state from amdgpu_device_cache_pci_state
to avoid superflous restores during GPU resets and suspend/resumes.

v4: Style fixes.

Signed-off-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher

drm/amdgpu: Avoid accessing HW when suspending SW state

2020-09-15T21:24:39+00:00

At this point the ASIC is already post reset by the HW/PSP
so the HW not in proper state to be configured for suspension,
some blocks might be even gated and so best is to avoid touching it.

v2: Rename in_dpc to more meaningful name

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Implement DPC recovery

2020-09-15T21:24:32+00:00

Add PCI Downstream Port Containment (DPC) with
basic recovery functionality

v2: remove pci_save_state to avoid breaking suspend/resume
v3: Fix style comments
v4: Improve description.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher