linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c, branch v7.1-rc7

drm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)

2026-06-03T18:54:05+00:00

Problem:
While developing the amd_close_race IGT test (which intentionally triggers
execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table
entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce
zero diagnostic output. The GPU simply hangs silently for ~10s until the
scheduler timeout fires. There is no way to distinguish an execute
permission fault from any other type of GPU hang.

Root cause:
GFX 10.1.x defaults to noretry=0, which sets
RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers
(gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE,
wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware
loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the
translation indefinitely, expecting software to eventually fix the
permission bits (as happens in SVM/HMM recovery). No interrupt of any kind
reaches the IH ring.

This is different from invalid-page faults (V=0) which DO generate a retry
fault interrupt that the driver can escalate to a no-retry fault. Permission
faults with valid PTEs loop silently forever in hardware.

GFX 10.3+ already defaults to noretry=1, which makes permission faults
generate immediate L2 protection fault interrupts. GFX 10.1.x was
inadvertently left out of this default.

Fix:
Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to
IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line
change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer
generations.

With noretry=1, the existing non-retry fault handler
(gmc_v10_0_process_interrupt) already decodes and prints the full
GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS,
faulting address, VMID, PASID, and process name. No additional logging
code is needed — the fix is purely routing permission faults to the
existing, fully-capable non-retry interrupt handler.

v2: Dropped GFX10-specific logging from gmc_v10_0.c and
kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry
fault handler, but with noretry=1 permission faults take the non-retry
path — the v1 retry handler code was dead and would never execute.

Tested on Navi10 (GFX 10.1.10):
- Execute permission faults now produce immediate, clear output:
    [gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592)
     Process amd_close_race pid 13380 thread amd_close_race pid 13384
      in page at address 0x40001000 from client 0x1b (UTCL2)
    GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881
         PERMISSION_FAULTS: 0x8
- No regressions with properly-mapped GPU workloads

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Signed-off-by: Vitaly Prosyak 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher 
(cherry picked from commit eb21edd24c40d81066753f8ac6f23bce15745395)
Cc: stable@vger.kernel.org

drm/amdgpu: Use asic specific pte_addr_mask

2026-06-03T18:52:37+00:00

For PTE creation use asic specific physical page base address mask

v2: Change variable name from pa_mask to pte_addr_mask

Signed-off-by: Harish Kasiviswanathan 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher 
(cherry picked from commit 2ea989885941a6e5607ef86dbe309e90b7191f21)

drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM

2026-04-24T15:09:11+00:00

When the GART placement is set to AMDGPU_GART_PLACEMENT_LOW:
Make sure that GART does not overlap with VRAM when
VRAM is configured to be in the low address space.

Solve this according to the following logic:
- When GART fits before VRAM, use zero address for GART
- Otherwise, put GART after the end of VRAM, aligned to 4 GiB

Previously, I had assumed this was not possible
so it was OK to not handle it, but now we got a report
from a user who has a board that is configured this way.

Fixes: 917f91d8d8e8 ("drm/amdgpu/gmc: add a way to force a particular placement for GART")
Signed-off-by: Timur Kristóf 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit 3d9de5d86a1658cadb311461b001eb1df67263ad)

drm/amdgpu: Group filling reserve region details

2026-04-03T17:50:09+00:00

Add a function which groups filling of reserve region information. It
may not cover all as info on some regions are still filled outside like
those from atomfirmware tables.

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Add stolen_reserved reserve-region

2026-04-03T17:49:43+00:00

Use reserve region helpers for initializing/reserving stolen_reserved region.

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Add extended stolen vga reserve-region

2026-04-03T17:49:41+00:00

Use reserve region helpers for initializing/reserving extended stolen
vga region.

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Add stolen vga reserve-region

2026-04-03T17:49:36+00:00

Use reserve region helpers for initializing/reserving stolen vga region.

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: add support to query vram info from firmware

2026-03-30T19:02:07+00:00

add support to query vram info from firmware

v2: change APU vram type, add multi-aid check
v3: seperate vram info query function into 3 parts and
    call them in a helper func when requirements
    are met.
v4: calculate vram_width for v9.x

Signed-off-by: Gangliang Xie 
Reviewed-by: Lijo Lazar 
Signed-off-by: Alex Deucher

drm/amdgpu: Use stack variable to fetch nps info

2026-03-23T18:21:34+00:00

Instead of a dynamic allocation, use stack variable and let the caller
pass the maximum ranges that can be held in the buffer.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: statically assign gart windows to ttm entities

2026-02-23T19:16:29+00:00

If multiple entities share the same window we must make sure
that jobs using them are executed sequentially.

This commit gives separate windows to each entity, so jobs
from multiple entities could execute in parallel if needed.
(for now they all use the first sdma engine, so it makes no
difference yet).
The entity stores the gart window offsets to centralize the
"window id" to "window offset" in a single place.

default_entity doesn't get any windows reserved since there is
no use for them.

---
v3:
- renamed gart_window_lock -> lock (Christian)
- added amdgpu_ttm_buffer_entity_init (Christian)
- fixed gart_addr in svm_migrate_gart_map (Felix)
- renamed gart_window_idX -> gart_window_offs[]
- added amdgpu_compute_gart_address
v4:
- u32 -> u64
- added kerneldoc
v5:
- removed gtt_window_lock
- simplified gart window creation and use: entities using a
  single window now uses window #0 instead of #1
- fix dst_addr calculation in kfd_migrate.c
---

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Acked-by: Felix Kuehling 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher