linux.git/drivers/gpu/drm/amd/powerplay, branch v5.8

drm/amd/powerplay: fix a crash when overclocking Vega M

2020-07-21T19:59:32+00:00

Avoid kernel crash when vddci_control is SMU7_VOLTAGE_CONTROL_NONE and
vddci_voltage_table is empty. It has been tested on Intel Hades Canyon
(i7-8809G).

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208489
Fixes: ac7822b0026f ("drm/amd/powerplay: add smumgr support for VEGAM (v2)")
Reviewed-by: Evan Quan 
Signed-off-by: Qiu Wenbo 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amdgpu/powerplay: Modify SMC message name for setting power profile mode

2020-07-14T19:41:51+00:00

I consulted Cai Land(Chuntian.Cai@amd.com), he told me corresponding smc
message name to fSMC_MSG_SetWorkloadMask() is
"PPSMC_MSG_ActiveProcessNotify" in firmware code of Renoir.

Strange though it may seem, but it's a fact.

Signed-off-by: chen gong 
Reviewed-by: Evan Quan 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS

2020-06-25T17:42:15+00:00

I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and
following started to happen on each boot:

	...
	BUG: kernel NULL pointer dereference, address: 0000000000000128
	...
	CPU: 9 PID: 1940 Comm: modprobe Tainted: G            E     5.7.2-200.im0.fc32.x86_64 #1
	Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020
	RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
	...
	Call Trace:
	 i2c_smbus_xfer+0x3d/0xf0
	 i2c_default_probe+0xf3/0x130
	 i2c_detect.isra.0+0xfe/0x2b0
	 ? kfree+0xa3/0x200
	 ? kobject_uevent_env+0x11f/0x6a0
	 ? i2c_detect.isra.0+0x2b0/0x2b0
	 __process_new_driver+0x1b/0x20
	 bus_for_each_dev+0x64/0x90
	 ? 0xffffffffc0f34000
	 i2c_register_driver+0x73/0xc0
	 do_one_initcall+0x46/0x200
	 ? _cond_resched+0x16/0x40
	 ? kmem_cache_alloc_trace+0x167/0x220
	 ? do_init_module+0x23/0x260
	 do_init_module+0x5c/0x260
	 __do_sys_init_module+0x14f/0x170
	 do_syscall_64+0x5b/0xf0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
	...

Error appears when some i2c device driver tries to probe for devices
using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`.
Code supporting this adapter requires `adev->psp.ras.ras` to be not
NULL, which is true only when `amdgpu_ras_init()` detects HW support by
calling `amdgpu_ras_check_supported()`.

Before 9015d60c9ee1, adapter was registered by

	-> amdgpu_device_ip_init()
	  -> amdgpu_ras_recovery_init()
	    -> amdgpu_ras_eeprom_init()
	      -> smu_v11_0_i2c_eeprom_control_init()

after verifying that `adev->psp.ras.ras` is not NULL in
`amdgpu_ras_recovery_init()`. Currently it is registered
unconditionally by

	-> amdgpu_device_ip_init()
	  -> pp_sw_init()
	    -> hwmgr_sw_init()
	      -> vega20_smu_init()
	        -> smu_v11_0_i2c_eeprom_control_init()

Fix simply adds HW support check (ras == NULL => no support) before
calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.

Please note that there is a chance that similar fix is also required for
CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without
RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense
when there is no HW support.

Cc: stable@vger.kernel.org
Fixes: 9015d60c9ee1 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device")
Signed-off-by: Ivan Mironov 
Tested-by: Bjorn Nostvold 
Signed-off-by: Alex Deucher

drm/amdgpu: Replace invalid device ID with a valid device ID

2020-06-11T20:05:09+00:00

Initializes Powertune data for a specific Hawaii card by fixing what
looks like a typo in the code. The device ID 66B1 is not a supported
device ID for this driver, and is not mentioned elsewhere. 67B1 is a
valid device ID, and is a Hawaii Pro GPU.

I have tested on my R9 390 which has device ID 67B1, and it works
fine without problems.

Signed-off-by: Sandeep Raghuraman 
Signed-off-by: Alex Deucher 
Cc: stable@vger.kernel.org

drm/amd/powerplay: ack the SMUToHost interrupt on receive V2

2020-05-29T17:56:30+00:00

There will be no further interrupt without proper ack
for current one.

V2: fix typo to really set ACK bit only

Signed-off-by: Evan Quan 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: add apu flags (v2)

2020-05-22T17:41:53+00:00

Add some APU flags to simplify handling of different APU
variants.  It's easier to understand the special cases
if we use names flags rather than checking device ids and
silicon revisions.

v2: rebase on latest code

Acked-by: Evan Quan 
Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu/smu10: Replace one-element array and use struct_size() helper

2020-05-21T16:48:43+00:00

The current codebase makes use of one-element arrays in the following
form:

struct something {
    int length;
    u8 data[1];
};

struct something *instance;

instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL);
instance->length = size;
memcpy(instance->data, source, size);

but the preferred mechanism to declare variable-length types such as
these ones is a flexible array member[1][2], introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on. So, replace
the one-element array with a flexible-array member.

Also, make use of the new struct_size() helper to properly calculate the
size of struct smu10_voltage_dependency_table.

This issue was found with the help of Coccinelle and, audited and fixed
_manually_.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Acked-by: Christian König 
Signed-off-by: Gustavo A. R. Silva 
Signed-off-by: Alex Deucher

drm/amd/powerplay: unify the prompts on thermal interrupts

2020-05-21T16:48:42+00:00

The prompts will contain pci address(segment/bus/port/function),
severity(warn or error) and some keywords(GPU, amdgpu). Also this
address the issue that pci bus retrieved by PCI_BUS_NUM(adev->pdev->devfn)
is wrong.

Signed-off-by: Evan Quan 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: resolve ras recovery vs smi race condition

2020-05-21T16:46:51+00:00

during ras recovery block smu access via smi

Reviewed-by: Hawking Zhang 
Signed-off-by: John Clements 
Signed-off-by: Alex Deucher

drm/amdgpu: Updated XGMI power down control support check

2020-05-14T21:43:59+00:00

Updated SMC FW version check to determine if XGMI power down control is supported

Reviewed-by: Hawking Zhang 
Signed-off-by: John Clements 
Signed-off-by: Alex Deucher