linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v7.1-rc4

drm/amdgpu: Avoid reset in AMDGPU unload path for APUs with GFX V11 and higher.

2026-04-24T15:10:44+00:00

GFX V11 has GC block as default off IP.
Every time AMDGPU driver sends a request to PMFW
to unload MP1, PMFW will put GC in reset and
power down the voltage.Hence, skipping reset
for APUs with GFX V11 or later to avoid reset
related failures.

Fixes: 34355e61835e ("drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded")
Reviewed-by: Alex Deucher 
Signed-off-by: Shubhankar Milind Sardeshpande 
Signed-off-by: Alex Deucher 
(cherry picked from commit d0a8cadffc818f51d05bc234d8da1af228bc59a3)
Cc: stable@vger.kernel.org

drm/amd: Adjust ASPM support quirk to cover more Intel hosts

2026-04-21T21:03:01+00:00

Some of the same issues identified in commit c770ef19673fb
("drm/amd/amdgpu: disable ASPM in some situations") also affect
Tiger Lake systems with GFX11 connected over USB4. Widen the net
to also match these hosts.

Fixes: d9b3a066dfcd ("drm/amd: Exclude dGPUs in eGPU enclosures from DPM quirks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5145
Reviewed-by: Yang Wang 
Signed-off-by: Mario Limonciello 
Signed-off-by: Alex Deucher 
(cherry picked from commit 0a214d888485b9f35fe03882a92962e6d5697849)

drm/amdgpu: correct single device PCIe reset flow for DPC

2026-04-17T18:49:11+00:00

For triggering the dpc event with a single device, we still need
to set the in_link_reset flag and the dpc status.

Signed-off-by: Ce Sun 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher

drm/amdgpu: rework userq fence driver alloc/destroy

2026-04-03T17:59:28+00:00

The correct fix is to tie the global xa entry lifetime to the
queue lifetime: insert in amdgpu_userq_create() and erase in
amdgpu_userq_cleanup(), both at the well-defined doorbell_index key,
making the operation O(1) and resolve the fence driver UAF problem
by binding the userq driver fence to per queue.

v2: clean up the local variables initialization. (Christian)

Signed-off-by: Prike Liang 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: replace use of system_wq with system_dfl_wq

2026-04-03T17:52:29+00:00

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen after a careful review and conversion of each individual
case, workqueue users must be converted to the better named new workqueues with
no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Suggested-by: Tejun Heo 
Acked-by: Christian König 
Signed-off-by: Marco Crivellari 
Signed-off-by: Alex Deucher

drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq

2026-04-03T17:52:12+00:00

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen after a careful review and conversion of each individual
case, workqueue users must be converted to the better named new workqueues with
no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Suggested-by: Tejun Heo 
Acked-by: Christian König 
Signed-off-by: Marco Crivellari 
Signed-off-by: Alex Deucher

drm/amdgpu: flush coredump work before HW teardown

2026-03-30T18:32:12+00:00

In amdgpu_device_fini_hw(), deferred coredump formatting work may still
be pending when hardware and IP components are being torn down. Since
the work may access device registers and memory that will be freed or
powered off, it must be completed before proceeding.

Add a flush_work() call for adev->coredump_work, guarded by
CONFIG_DEV_COREDUMP, to ensure any pending coredump work finishes
before the device enters the early IP fini stage.

This avoids potential use-after-free or accessing hardware resources
that are no longer available.

Reviewed-by: Lijo Lazar 
Suggested-by: Lijo Lazar 
Signed-off-by: Jesse Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3)

2026-03-23T18:09:28+00:00

amdgpu_device_get_job_timeout_settings() passes a pointer directly
to the global amdgpu_lockup_timeout[] buffer into strsep().
strsep() destructively replaces delimiter characters with '\0'
in-place.

On multi-GPU systems, this function is called once per device.
When a multi-value setting like "0,0,0,-1" is used, the first
GPU's call transforms the global buffer into "0\00\00\0-1". The
second GPU then sees only "0" (terminated at the first '\0'),
parses a single value, hits the single-value fallthrough
(index == 1), and applies timeout=0 to all rings — causing
immediate false job timeouts.

Fix this by copying into a stack-local array before calling
strsep(), so the global module parameter buffer remains intact
across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
(256) bytes, which is safe for the stack.

v2: wrap commit message to 72 columns, add Assisted-by tag.
v3: use stack array with strscpy() instead of kstrdup()/kfree()
    to avoid unnecessary heap allocation (Christian).

This patch was developed with assistance from Claude (claude-opus-4-6).

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Christian König 
Reviewed-by: Alex Deucher 
Signed-off-by: Ruijing Dong 
Signed-off-by: Alex Deucher

drm/amdgpu: move devcoredump generation to a worker

2026-03-17T14:45:20+00:00

Update the way drm_coredump_printer is used based on its documentation
and Xe's code: the main idea is to generate the final version in one go
and then use memcpy to return the chunks requested by the caller of
amdgpu_devcoredump_read.

The generation is moved to a separate worker thread.

This cuts the time to copy the dump from 40s to ~0s on my machine.

---
v3:
- removed adev->coredump_in_progress and instead use work as
  the synchronisation mechanism
- use kvfree instead of kfree
---

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Acked-by: Alex Deucher 
Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Query xgmi info from mmhub if available

2026-03-17T14:32:38+00:00

Query xgmi info from mmhub if available

Signed-off-by: Hawking Zhang 
Reviewed-by: Le Ma 
Reviewed-by: Feifei Xu 
Signed-off-by: Alex Deucher