linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h, branch linux-5.15.y

drm/amdgpu: split gfx callbacks into ras and non-ras ones

2021-04-09T20:51:22+00:00

gfx ras is only available in cerntain ip generations.

Signed-off-by: Hawking Zhang 
Reviewed-by: Dennis Li 
Reviewed-by: John Clements 
Signed-off-by: Alex Deucher

drm/amdgpu: harvest edc status when connected to host via xGMI

2021-03-24T03:00:41+00:00

When connected to a host via xGMI, system fatal errors may trigger
warm reset, driver has no change to query edc status before reset.
Therefore in this case, driver should harvest previous error loging
registers during boot, instead of only resetting them.

v2:
1. IP's ras_manager object is created when its ras feature is enabled,
so change to query edc status after amdgpu_ras_late_init called

2. change to enable watchdog timer after finishing gfx edc init

Signed-off-by: Dennis Li 
Reivewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: enable watchdog feature for SQ of aldebaran

2021-03-24T02:59:52+00:00

SQ's watchdog timer monitors forward progress, a mask of which waves
caused the watchdog timeout is recorded into ras status registers and
then trigger a system fatal error event.

v2:
1. change *query_timeout_status to *query_sq_timeout_status.
2. move query_sq_timeout_status into amdgpu_ras_do_recovery.
3. add module parameters to enable/disable fatal error event and modify
the watchdog timer.

v3:
1. remove unused parameters of *enable_watchdog_timer

Signed-off-by: Dennis Li 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: add ras support for gfx of aldebaran

2021-03-24T02:59:48+00:00

add edc counter/status reset and query functions for gfx block of
aldebaran.

v2: change to clear edc counter explicitly
aldebaran hardware will not clear edc counter after driver reading them,
so driver should clear them explicitly.

Signed-off-by: Dennis Li 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: enable only one high prio compute queue

2021-02-09T20:26:56+00:00

For high priority compute to work properly we need to enable
wave limiting on gfx pipe. Wave limiting is done through writing
into mmSPI_WCL_PIPE_PERCENT_GFX register. Enable only one high
priority compute queue to avoid race condition between multiple
high priority compute queues writing that register simultaneously.

Signed-off-by: Nirmoy Das 
Acked-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: add amdgpu_gfx_state_change_set() set gfx power change entry (v2)

2020-11-13T22:29:45+00:00

The new amdgpu_gfx_state_change_set() funtion can support set GFX power
change status to D0/D3.

v2: squash in warning fix (Alex)

Signed-off-by: Prike Liang 
Acked-by: Huang Rui 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: fix compute queue priority if num_kcq is less than 4

2020-11-13T05:13:59+00:00

Compute queues are configurable with module param, num_kcq.
amdgpu_gfx_is_high_priority_compute_queue was setting 1st 4 queues to
high priority queue leaving a null drm scheduler in
adev->gpu_sched[hw_ip]["normal_prio"].sched if num_kcq < 5.

This patch tries to fix it by alternating compute queue priority between
normal and high priority.

Fixes: 33abcb1f5a1719b1c (drm/amdgpu: set compute queue priority at mqd_init)
Signed-off-by: Nirmoy Das 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: move amdgpu_num_kcq handling to a helper

2020-10-16T19:11:17+00:00

Add a helper so we can set per asic default values. Also,
the module parameter is currently clamped to 8, but clamp it
per asic just in case some asics have different limits in the
future. Enable the option on gfx6,7 as well for consistency.

Acked-by: Nirmoy Das 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: add interface for setting MGCG perfmon

2020-10-15T16:21:00+00:00

Enable Navi1X MGCG perfmon setting.

Signed-off-by: Evan Quan 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: update athub interrupt harvesting handle

2020-09-22T21:37:38+00:00

GCEA/MMHUB EA error should not result to DF freeze, this is
fixed in next generation, but for some reasons the GCEA/MMHUB
EA error will result to DF freeze in previous generation,
diver should avoid to indicate GCEA/MMHUB EA error as hw fatal
error in kernel message by read GCEA/MMHUB err status registers.

Changed from V1:
    make query_ras_error_status function more general
    make read mmhub er status register more friendly

Changed from V2:
    move ras error status query function into do_recovery workqueue

Changed from V3:
    remove useless code from V2, print GCEA error status
    instance number

Signed-off-by: Stanley.Yang 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher