linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c, branch linux-5.17.y

drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()

2022-04-13T17:27:27+00:00

[ Upstream commit 1647b54ed55d4d48c7199d439f8834626576cbe9 ]

This post-op should be a pre-op so that we do not pass -1 as the bit
number to test_bit().  The current code will loop downwards from 63 to
-1.  After changing to a pre-op, it loops from 63 to 0.

Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c")
Signed-off-by: Dan Carpenter 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin

drm/amdgpu: During s0ix don't wait to signal GFXOFF

2021-10-05T14:55:07+00:00

In the rare event when GFX IP suspend coincides with a s0ix entry, don't
schedule a delayed work, instead signal PMFW immediately to allow GFXOFF
entry. GFXOFF is a prerequisite for s0ix entry. PMFW needs to be
signaled about GFXOFF status before amd-pmc module passes OS HINT
to PMFW telling that everything is ready for a safe s0ix entry.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1712

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Reviewed-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu: Cancel delayed work when GFXOFF is disabled

2021-08-20T16:09:44+00:00

schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.

This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).

To fix this, call cancel_delayed_work_sync when the disable count
transitions from 0 to 1, and only schedule the delayed work on the
reverse transition, not if the disable count was already 0. This makes
sure the delayed work doesn't run at unexpected times, and allows it to
be lock-free.

v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
  mod_delayed_work.
v3:
* Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
v4:
* Fix race condition between amdgpu_gfx_off_ctrl incrementing
  adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
  checking for it to be 0 (Evan Quan)

Cc: stable@vger.kernel.org
Reviewed-by: Evan Quan 
Reviewed-by: Lijo Lazar  # v3
Acked-by: Christian König  # v3
Signed-off-by: Michel Dänzer 
Signed-off-by: Alex Deucher

drm/amd/amdgpu: remove unnecessary RAS context field

2021-08-16T19:35:55+00:00

Delete ras_if->name in the RAS ctx structure and remove related lines.

Signed-off-by: Candice Li 
Reviewed-by: John Clements 
Signed-off-by: Alex Deucher

drm/amdgpu: Conditionally reset RAS counters on boot

2021-05-20T02:38:11+00:00

Only clear RAS error counters if perestent EDC harvesting is not supported

Reviewed-by: Hawking Zhang 
Signed-off-by: John Clements 
Signed-off-by: Alex Deucher

drm/amdgpu: split gfx callbacks into ras and non-ras ones

2021-04-09T20:51:22+00:00

gfx ras is only available in cerntain ip generations.

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
Reviewed-by: John Clements 
Signed-off-by: Alex Deucher

drm/amd/pm: unify the interface for gfx state setting

2021-04-09T20:46:51+00:00

No need to have special handling for swSMU supported ASICs.

Signed-off-by: Evan Quan 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher

drm/amdgpu: add the sched_score to amdgpu_ring_init

2021-04-09T20:44:56+00:00

Allow separate ring to share the same scheduler score.

No functional change.

Signed-off-by: Christian König 
Reviewed-and-Tested-by: Leo Liu 
Signed-off-by: Alex Deucher

drm/amdgpu: wrap kiq ring ops with kiq spinlock

2021-04-09T20:35:31+00:00

KIQ ring is being operated by kfd as well as amdgpu.
KFD is using kiq lock, we should the same from amdgpu side
as well.

Signed-off-by: Nirmoy Das 
Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: add codes to capture invalid hardware access when recovery

2021-04-09T20:34:53+00:00

When recovery thread has begun GPU reset, there should be not other
threads to access hardware, otherwise system randomly hang.

v2 (chk): rewritten from scratch, use trylock and lockdep instead of
hand wiring the logic.

v3: add in_irq check

v4: change to check in_task

Signed-off-by: Dennis Li 
Signed-off-by: Christian König 
Reviewed-by: Christian König 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher