linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c, branch linux-6.3.y

drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs

2023-05-17T12:02:03+00:00

commit 720b47229a5b24061d1c2e29ddb6043a59178d79 upstream.

The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still
use amdgpu_irq_put to disable this interrupt, which caused the call trace
in this function.

[  102.873958] Call Trace:
[  102.873959]  
[  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu]
[  102.874019]  gfx_v11_0_suspend+0xe/0x20 [amdgpu]
[  102.874072]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu]
[  102.874122]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu]
[  102.874172]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu]
[  102.874223]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu]
[  102.874321]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu]
[  102.874375]  process_one_work+0x21f/0x3f0
[  102.874377]  worker_thread+0x200/0x3e0
[  102.874378]  ? process_one_work+0x3f0/0x3f0
[  102.874379]  kthread+0xfd/0x130
[  102.874380]  ? kthread_complete_and_exit+0x20/0x20
[  102.874381]  ret_from_fork+0x22/0x30

v2:
- Handle umc and gfx ras cases in separated patch
- Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11

v3:
- Improve the subject and code comments
- Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init

v4:
- Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and
SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state
- Check cp_ecc_error_irq.funcs rather than ip version for a more
sustainable life

v5:
- Simplify judgment conditions

Signed-off-by: Horatio Zhang 
Reviewed-by: Hawking Zhang 
Acked-by: Christian König 
Reviewed-by: Guchun Chen 
Reviewed-by: Feifei Xu 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: allow multipipe policy on ASICs with one MEC

2023-01-19T03:48:49+00:00

Always enable multipipe policy on ASICs with GC VERSION > 9.0.0
instead of MEC number > 1.

This will allow multipipe policy on ASICs with one MEC,
e.g., gfx11 APUs.

Signed-off-by: Lang Yu 
Reviewed-by: Aaron Liu 
Reviewed-by: Yifan Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: Perform gpu reset after gfx finishes processing ras poison consumption on gfx_v11_0_3

2023-01-17T21:11:51+00:00

Perform gpu reset after gfx finishes processing
ras poison consumption on gfx_v11_0_3.

V2:
 Move gfx poison consumption handler from hw_ops to ip
 function level.

V3:
 Adjust the calling position of amdgpu_gfx_poison_consumation_handler.

V4:
   Since gfx v11_0_3 does not have .hw_ops instance, the .hw_ops null
 pointer check in amdgpu_ras_interrupt_poison_consumption_handler
 needs to be adjusted.

Signed-off-by: YiPeng Chai 
Reviewed-by: Hawking Zhang 
Reviewed-by: Tao Zhou 
Signed-off-by: Alex Deucher

drm/amdgpu: Add gfx ras function on gfx v11_0_3

2023-01-17T21:11:50+00:00

Add gfx ras function on gfx v11_0_3.

V2:
 1. Add separate source files for gfx v11_0_3.
 2. Create a common function to initialize gfx ras block.

V3:
 1. Rename amdgpu_gfx_ras_block_init to amdgpu_gfx_ras_sw_init.
 2. Adjust the calling position of amdgpu_gfx_ras_sw_init.
 3. Remove gfx_v11_0_3_ras_ops.

V4:
 Revert changes in amdgpu_ras_interrupt_poison_consumption_handler.

V5:
 1. Remove invalid include file in gfx_v11_0_3.c.
 2. Reduce the number of parameters of amdgpu_gfx_ras_sw_init.

Signed-off-by: YiPeng Chai 
Reviewed-by: Hawking Zhang 
Reviewed-by: Tao Zhou 
Signed-off-by: Alex Deucher

drm/amdgpu: use VRAM|GTT for a bunch of kernel allocations

2023-01-03T21:49:54+00:00

Technically all of those can use GTT as well, no need to force things
into VRAM.

Signed-off-by: Christian König 
Signed-off-by: Luben Tuikov 
Acked-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu: complete gfxoff allow signal during suspend without delay

2022-11-09T22:41:42+00:00

change guarantees that gfxoff is allowed before moving further in
s2idle sequence to add more reliablity about gfxoff in amdgpu IP's
suspend flow

Signed-off-by: Harsh Jain 
Reviewed-by: Evan Quan 
Signed-off-by: Alex Deucher

drm/amdgpu: fix compiler warning for amdgpu_gfx_cp_init_microcode

2022-09-29T13:41:46+00:00

Change the type of parameter on amdgpu_gfx_cp_init_microcode to fix
compiler warning.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: add function to init CP microcode

2022-09-29T13:41:43+00:00

Add an common function to init CP related microcode.

Signed-off-by: Likun Gao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amd: Add detailed GFXOFF stats to debugfs

2022-08-16T22:17:31+00:00

Add debugfs interface to log GFXOFF statistics:

- Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the
  time of query since system power-up

- Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop.
  Read it to get average GFXOFF residency % multiplied by 100
  during the last logging interval.

Both features are designed to be keep the values persistent between
suspends.

Signed-off-by: André Almeida 
Signed-off-by: Alex Deucher

drm/amdgpu: reduce reset time

2022-08-16T22:14:31+00:00

In multi container use case, reset time is important, so skip ring
tests and cp halt wait during ip suspending for reset as they are
going to fail and cost more time on reset

v2: add a hang flag to indicate the reset comes from a job timeout,
skip ring test and cp halt wait in this case

v3: move hang flag to adev

Signed-off-by: Victor Zhao 
Acked-by: Andrey Grodzovsky 
Signed-off-by: Alex Deucher