linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c, branch v5.8-rc2

drm/amdgpu: fix documentation around busy_percentage

2020-06-17T21:42:43+00:00

Add rename the gpu busy percentage for consistency and
add the mem busy percentage documentation.

Reviewed-by: Evan Quan 
Reviewed-by: Nirmoy Das 
Signed-off-by: Alex Deucher

drm/amdgpu/pm: update comment to clarify Overdrive interfaces

2020-06-17T21:42:14+00:00

Vega10 and previous asics use one interface, vega20 and newer
use another.

Reviewed-by: Evan Quan 
Acked-by: Nirmoy Das 
Signed-off-by: Alex Deucher

drm/amdgpu/pm: return an error during GPU reset or suspend (v2)

2020-05-29T17:52:16+00:00

Return an error for sysfs and debugfs power interfaces during
gpu reset and suspend.  Prevents access to the hw while it may
be in an unusable state.

v2: squash in fix to drop suspend check

Acked-by: Nirmoy Das 
Signed-off-by: Alex Deucher

drm/amdgpu: fix device attribute node create failed with multi gpu

2020-05-26T19:51:45+00:00

the origin design will use varible of "attr->states" to save node
supported states on current gpu device, but for multi gpu device, when
probe second gpu device, the driver will check attribute node states
from previous gpu device wthether to create attribute node.
it will cause other gpu device create attribute node faild.

1. add member attr_list into amdgpu_device to link supported device attribute node.
2. add new structure "struct amdgpu_device_attr_entry{}" to track device attribute state.
3. drop member "states" from amdgpu_device_attr.

v2:
1. move "attr_list" into amdgpu_pm and rename to "pm_attr_list".
2. refine create & remove device node functions parameter.

fix:
drm/amdgpu: optimize amdgpu device attribute code

Signed-off-by: Kevin Wang 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: add apu flags (v2)

2020-05-22T17:41:53+00:00

Add some APU flags to simplify handling of different APU
variants.  It's easier to understand the special cases
if we use names flags rather than checking device ids and
silicon revisions.

v2: rebase on latest code

Acked-by: Evan Quan 
Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven

2020-05-22T17:41:43+00:00

[Problem description]
1. Boot up picasso platform, launches desktop, Don't do anything (APU enter into "gfxoff" state)
2. Remote login to platform using SSH, then type the command line:
sudo su -c "echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level"
sudo su -c "echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk" (fix SCLK to 1400MHz)
3. Move the mouse around in Window
4. Phenomenon : The screen frozen

Tester will switch sclk level during glmark2 run time.
APU will enter "gfxoff" state intermittently during glmark2 run time.
The system got hanged if fix GFXCLK to 1400MHz when APU is in "gfxoff"
state.

[Debug]
1. Fix SCLK to X MHz
1400: screen frozen, screen black, then OS will reboot.
1300: screen frozen.
1200: screen frozen, screen black.
1100: screen frozen, screen black, then OS will reboot.
1000: screen frozen, screen black.
900: screen frozen, screen black, then OS will reboot.
800: Situation Nomal, issue disappear.
700: Situation Nomal, issue disappear.
2. SBIOS setting: AMD CBS --> SMU Debug Options -->SMU Debug --> "GFX DLDO Psm Margin Control":
50 : Situation Nomal, issue disappear.
45 : Situation Nomal, issue disappear.
40 : Situation Nomal, issue disappear.
35 : Situation Nomal, issue disappear.
30 : screen black.
25 : screen frozen, then blurred screen.
20 : screen frozen.
15 : screen black.
10 : screen frozen.
5 : screen frozen, then blurred screen.
3. Disable GFXOFF feature
Situation Nomal, issue disappear.

[Why]
Through a period of time debugging with Sys Eng team and SMU team, Sys
Eng team said this is voltage/frequency marginal issue not a F/W or H/W
bug. This experiment proves that default targetPsm [for f=1400MHz] is
not sufficient when GFXOFF is enabled on Picasso.

SMU team think it is an odd test conditions to force sclk="1400MHz" when
GPU is in "gfxoff" state，then wake up the GFX. SCLK should be in the
"lowest frequency" when gfxoff.

[How]
Disable gfxoff when setting manual mode.
Enable gfxoff when setting other mode(exiting manual mode) again.

By the way, from the user point of view, now that user switch to manual
mode and force SCLK Frequency, he don't want SCLK be controlled by
workload.It becomes meaningless to "switch to manual mode" if APU enter "gfxoff"
due to lack of workload at this point.

Tips: Same issue observed on Raven.

Signed-off-by: chen gong
Reviewed-by: Alex Deucher
Signed-off-by: Alex Deucher

drm/amdgpu: fix pm sysfs node handling (v2)

2020-05-22T17:41:37+00:00

Fix typos that prevented them from showing up.

v2: switch other files in addition to pp_clk_voltage

Fixes: 4e01847c38f7a5 ("drm/amdgpu: optimize amdgpu device attribute code")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1150
Signed-off-by: Alex Deucher 
Acked-by: Evan Quan

drm/amdgpu: improve error handling in pcie_bw

2020-05-21T22:00:00+00:00

1. Initialize the counters to 0 in case the callback
   fails to initialize them.
2. The counters don't exist on APUs so return an error
   for them.
3. Return an error if the callback doesn't exist.

Reviewed-by: Yong Zhao 
Reviewed-By: Kent Russell 
Signed-off-by: Alex Deucher

drm/amdgpu: off by one in amdgpu_device_attr_create_groups() error handling

2020-05-21T16:48:43+00:00

This loop in the error handling code should start a "i - 1" and end at
"i == 0".  Currently it starts a "i" and ends at "i == 1".  The result
is that it removes one attribute that wasn't created yet, and leaks the
zeroeth attribute.

Fixes: 4e01847c38f7 ("drm/amdgpu: optimize amdgpu device attribute code")
Acked-by: Michael J. Ruhl 
Reviewed-by: Christian König 
Reviewed-by: Kevin Wang 
Signed-off-by: Dan Carpenter 
Signed-off-by: Alex Deucher

drm/amdgpu: cleanup unnecessary virt sriov check in amdgpu attribute

2020-05-21T16:37:19+00:00

the amdgpu device attribute node will be created accordding to sriov vf
mode at runtime.
cleanup unnecessary sriov check in attribute operation function.

Signed-off-by: Kevin Wang 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher