linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c, branch v5.14

drm/amdgpu: Cancel delayed work when GFXOFF is disabled

2021-08-20T17:35:42+00:00

schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.

This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).

To fix this, call cancel_delayed_work_sync when the disable count
transitions from 0 to 1, and only schedule the delayed work on the
reverse transition, not if the disable count was already 0. This makes
sure the delayed work doesn't run at unexpected times, and allows it to
be lock-free.

v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
  mod_delayed_work.
v3:
* Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
v4:
* Fix race condition between amdgpu_gfx_off_ctrl incrementing
  adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
  checking for it to be 0 (Evan Quan)

Cc: stable@vger.kernel.org
Reviewed-by: Evan Quan 
Reviewed-by: Lijo Lazar  # v3
Acked-by: Christian König  # v3
Signed-off-by: Michel Dänzer 
Signed-off-by: Alex Deucher

drm/amdgpu: Conditionally reset RAS counters on boot

2021-05-20T02:38:11+00:00

Only clear RAS error counters if perestent EDC harvesting is not supported

Reviewed-by: Hawking Zhang 
Signed-off-by: John Clements 
Signed-off-by: Alex Deucher

drm/amdgpu: split gfx callbacks into ras and non-ras ones

2021-04-09T20:51:22+00:00

gfx ras is only available in cerntain ip generations.

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
Reviewed-by: John Clements 
Signed-off-by: Alex Deucher

drm/amd/pm: unify the interface for gfx state setting

2021-04-09T20:46:51+00:00

No need to have special handling for swSMU supported ASICs.

Signed-off-by: Evan Quan 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher

drm/amdgpu: add the sched_score to amdgpu_ring_init

2021-04-09T20:44:56+00:00

Allow separate ring to share the same scheduler score.

No functional change.

Signed-off-by: Christian König 
Reviewed-and-Tested-by: Leo Liu 
Signed-off-by: Alex Deucher

drm/amdgpu: wrap kiq ring ops with kiq spinlock

2021-04-09T20:35:31+00:00

KIQ ring is being operated by kfd as well as amdgpu.
KFD is using kiq lock, we should the same from amdgpu side
as well.

Signed-off-by: Nirmoy Das 
Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: add codes to capture invalid hardware access when recovery

2021-04-09T20:34:53+00:00

When recovery thread has begun GPU reset, there should be not other
threads to access hardware, otherwise system randomly hang.

v2 (chk): rewritten from scratch, use trylock and lockdep instead of
hand wiring the logic.

v3: add in_irq check

v4: change to check in_task

Signed-off-by: Dennis Li 
Signed-off-by: Christian König 
Reviewed-by: Christian König 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: harvest edc status when connected to host via xGMI

2021-03-24T03:00:41+00:00

When connected to a host via xGMI, system fatal errors may trigger
warm reset, driver has no change to query edc status before reset.
Therefore in this case, driver should harvest previous error loging
registers during boot, instead of only resetting them.

v2:
1. IP's ras_manager object is created when its ras feature is enabled,
so change to query edc status after amdgpu_ras_late_init called

2. change to enable watchdog timer after finishing gfx edc init

Signed-off-by: Dennis Li 
Reivewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: enable only one high prio compute queue

2021-02-09T20:26:56+00:00

For high priority compute to work properly we need to enable
wave limiting on gfx pipe. Wave limiting is done through writing
into mmSPI_WCL_PIPE_PERCENT_GFX register. Enable only one high
priority compute queue to avoid race condition between multiple
high priority compute queues writing that register simultaneously.

Signed-off-by: Nirmoy Das 
Acked-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amd/pm: add gfx_state_change_set() for rn gfx power switch (v2)

2020-11-13T22:29:45+00:00

The gfx_state_change_set() funtion can support set GFX power
change status to D0/D3.

v2: make sure to register callback (Alex)

Signed-off-by: Prike Liang 
Reviewed-by: Alex Deucher 
Reviewed-by: Huang Rui 
Signed-off-by: Alex Deucher