summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2026-03-24drm/amdgpu: fix syncobj leak for amdgpu_gem_va_ioctl()Prike Liang
It requires freeing the syncobj and chain alloction resource. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-24drm/amdgpu/vcn4.0.3: gate per-queue reset by PSP SOS program versionJesse Zhang
Add a PSP SOS firmware compatibility check before enabling VCN per-queue reset on vcn_v4_0_3. Per review, program check is sufficient: when PSP SOS program is 0x01, require fw version >= 0x0036015f; otherwise allow per-queue reset. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-24drm/amdgpu: use DISCOVERY_TMR_SIZE in ACPI TMR fallbackJesse.Zhang
amdgpu_acpi_get_tmr_info() returns the full TMR region size, not the IP discovery table size. Using tmr_size as discovery.size can lead to oversized allocations and probe failure. In the ACPI fallback path, keep discovery.size as DISCOVERY_TMR_SIZE and only use ACPI data for offset calculation. Fixes: 01bdc7e219c4 ("drm/amdgpu: New interface to get IP discovery binary v3") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-24drm/amdgpu: Fix fence put before wait in amdgpu_amdkfd_submit_ibSrinivasan Shanmugam
amdgpu_amdkfd_submit_ib() submits a GPU job and gets a fence from amdgpu_ib_schedule(). This fence is used to wait for job completion. Currently, the code drops the fence reference using dma_fence_put() before calling dma_fence_wait(). If dma_fence_put() releases the last reference, the fence may be freed before dma_fence_wait() is called. This can lead to a use-after-free. Fix this by waiting on the fence first and releasing the reference only after dma_fence_wait() completes. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:697 amdgpu_amdkfd_submit_ib() warn: passing freed memory 'f' (line 696) Fixes: 9ae55f030dc5 ("drm/amdgpu: Follow up change to previous drm scheduler change.") Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: prevent immediate PASID reuse caseEric Huang
PASID resue could cause interrupt issue when process immediately runs into hw state left by previous process exited with the same PASID, it's possible that page faults are still pending in the IH ring buffer when the process exits and frees up its PASID. To prevent the case, it uses idr cyclic allocator same as kernel pid's. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8f1de51f49be692de137c8525106e0fce2d1912d) Cc: stable@vger.kernel.org
2026-03-23drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3)Ruijing Dong
amdgpu_device_get_job_timeout_settings() passes a pointer directly to the global amdgpu_lockup_timeout[] buffer into strsep(). strsep() destructively replaces delimiter characters with '\0' in-place. On multi-GPU systems, this function is called once per device. When a multi-value setting like "0,0,0,-1" is used, the first GPU's call transforms the global buffer into "0\00\00\0-1". The second GPU then sees only "0" (terminated at the first '\0'), parses a single value, hits the single-value fallthrough (index == 1), and applies timeout=0 to all rings — causing immediate false job timeouts. Fix this by copying into a stack-local array before calling strsep(), so the global module parameter buffer remains intact across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH (256) bytes, which is safe for the stack. v2: wrap commit message to 72 columns, add Assisted-by tag. v3: use stack array with strscpy() instead of kstrdup()/kfree() to avoid unnecessary heap allocation (Christian). This patch was developed with assistance from Claude (claude-opus-4-6). Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 94d79f51efecb74be1d88dde66bdc8bfcca17935) Cc: stable@vger.kernel.org
2026-03-23drm/amdgpu: Use stack variable to fetch nps infoLijo Lazar
Instead of a dynamic allocation, use stack variable and let the caller pass the maximum ranges that can be held in the buffer. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: Add smu v15_0_8 ip blockHawking Zhang
Add smu v15_0_8 ip block Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amd/pm: Setup driver pptable for smu 15.0.8Yang Wang
Setup driver pptable and initialize data from static metrics table for smu_v15_0_8 v2: Remove unrelated changes and update description (Lijo) v3: Use ARRAY_SIZE (Lijo) v4: Move structure to header file v5: squash in static metrics support (Asad) Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu/userq: cleanup amdgpu_userq_get/put where not neededSunil Khatri
amdgpu_userq_put/get are not needed in case we already holding the userq_mutex and reference is valid already from queue create time or from signal ioctl. These additional get/put could be a potential reason for deadlock in case the ref count reaches zero and destroy is called which again try to take the userq_mutex. Due to the above change we avoid deadlock between suspend/restore calling destroy queues trying to take userq_mutex again. Cc: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: Add amdgpu_regs_pcie64 debugfs nodeStanley.Yang
Add amdgpu_regs_pcie64 debugfs node to read/write 64bit PCIE registers. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: fix some more bug in amdgpu_gem_va_ioctlChristian König
Some illegal combination of input flags were not checked and we need to take the PDEs into account when returning the fence as well. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: prevent immediate PASID reuse caseEric Huang
PASID resue could cause interrupt issue when process immediately runs into hw state left by previous process exited with the same PASID, it's possible that page faults are still pending in the IH ring buffer when the process exits and frees up its PASID. To prevent the case, it uses idr cyclic allocator same as kernel pid's. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3)Ruijing Dong
amdgpu_device_get_job_timeout_settings() passes a pointer directly to the global amdgpu_lockup_timeout[] buffer into strsep(). strsep() destructively replaces delimiter characters with '\0' in-place. On multi-GPU systems, this function is called once per device. When a multi-value setting like "0,0,0,-1" is used, the first GPU's call transforms the global buffer into "0\00\00\0-1". The second GPU then sees only "0" (terminated at the first '\0'), parses a single value, hits the single-value fallthrough (index == 1), and applies timeout=0 to all rings — causing immediate false job timeouts. Fix this by copying into a stack-local array before calling strsep(), so the global module parameter buffer remains intact across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH (256) bytes, which is safe for the stack. v2: wrap commit message to 72 columns, add Assisted-by tag. v3: use stack array with strscpy() instead of kstrdup()/kfree() to avoid unnecessary heap allocation (Christian). This patch was developed with assistance from Claude (claude-opus-4-6). Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu/gfx11: look at the right prop for gfx queue priorityAlex Deucher
Look at hqd_queue_priority rather than hqd_pipe_priority. In practice, it didn't matter as both were always set for kernel queues, but that will change in the future. Fixes: 2e216b1e6ba2 ("drm/amdgpu/gfx11: handle priority setup for gfx pipe1") Reviewed-by:Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu/gfx10: look at the right prop for gfx queue priorityAlex Deucher
Look at hqd_queue_priority rather than hqd_pipe_priority. In practice, it didn't matter as both were always set for kernel queues, but that will change in the future. Fixes: b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue") Reviewed-by:Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: Skip discovery dump when topology is unavailableSrinivasan Shanmugam
When generating a devcoredump, amdgpu_discovery_dump() prints the IP discovery topology. The function already needs to handle the case where adev->discovery.ip_top is NULL to avoid a crash. Currently, the code prints a section header and an additional message when the topology is unavailable. However, for platforms where discovery is not used, this section is not expected to be present. Printing an extra message adds unnecessary output. Simplify this by skipping the entire section when ip_top is NULL. The NULL check is kept to avoid a crash, but no output is generated when the discovery topology is unavailable. Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: annotate eviction fence signaling pathChristian König
Make sure lockdep sees the dependencies here. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: Avoid NULL dereference in discovery topology coredump path v3Srinivasan Shanmugam
When a GPU fault or timeout happens, the driver creates a devcoredump to collect debug information. During this, amdgpu_devcoredump_format() calls amdgpu_discovery_dump() to print IP discovery data. amdgpu_discovery_dump() uses: adev->discovery.ip_top and then accesses: ip_top->die_kset amdgpu_discovery_dump() uses adev->discovery.ip_top. However, ip_top may be NULL if the discovery topology was never initialized. The current code does not check for this before using ip_top. As a result, when ip_top is NULL, the coredump worker crashes while taking the spinlock for ip_top->die_kset. Fix this by checking for a missing ip_top before walking the discovery topology. If it is unavailable, print a short message in the dump and return safely. - If ip_top is NULL, print a message and skip the dump - Also add the same check in the cleanup path This makes the coredump and cleanup paths safe even when the discovery topology is not available. KASAN trace: [ 522.228252] [IGT] amd_deadlock: starting subtest amdgpu-deadlock-sdma [ 522.240681] [IGT] amd_deadlock: starting dynamic subtest amdgpu-deadlock-sdma ... [ 522.952317] Write of size 4 at addr 0000000000000050 by task kworker/u129:5/5434 [ 522.937526] BUG: KASAN: null-ptr-deref in _raw_spin_lock+0x66/0xc0 [ 522.967659] Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu] ... [ 522.969445] Call Trace: [ 522.969508] _raw_spin_lock+0x66/0xc0 [ 522.969518] ? __pfx__raw_spin_lock+0x10/0x10 [ 522.969534] amdgpu_discovery_dump+0x61/0x530 [amdgpu] [ 522.971346] ? pick_next_task_fair+0x3f6/0x1c60 [ 522.971363] amdgpu_devcoredump_format+0x84f/0x26f0 [amdgpu] [ 522.973188] ? __pfx_amdgpu_devcoredump_format+0x10/0x10 [amdgpu] [ 522.975012] ? psi_task_switch+0x2b5/0x9b0 [ 522.975027] ? __pfx___drm_printfn_coredump+0x10/0x10 [drm] [ 522.975198] ? __pfx___drm_puts_coredump+0x10/0x10 [drm] [ 522.975366] ? __schedule+0x113c/0x38d0 [ 522.975381] amdgpu_devcoredump_deferred_work+0x4c/0x1f0 [amdgpu] v2: Updated commit message - Clarified that ip_top is not freed, it can just be NULL if discovery was not initialized. (Christian/Lijo) v3: Removed the extra drm_warn() for sysfs init failure as sysfs already reports errors. (Christian) Fixes: e81eff80aad6 ("drm/amdgpu: include ip discovery data in devcoredump") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: make amdgpu_user_wait_ioctl more resilent v2Christian König
When the memory allocated by userspace isn't sufficient for all the fences then just wait on them instead of returning an error. v2: use correct variable as pointed out by Sunil Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-23drm/amdgpu: replace WARN with DRM_ERROR for invalid sched priorityJesse.Zhang
amdgpu_sched_ioctl() currently uses WARN(1, ...) when userspace passes an out-of-range context priority value. WARN(1, ...) is unconditional and produces a full stack trace, which is disproportionate for a simple input validation failure -- the invalid value is already rejected with -EINVAL on the next line. Replace WARN(1, ...) with DRM_ERROR() to log the invalid value at an appropriate level without generating a stack dump. The -EINVAL return to userspace is unchanged. No functional change for well-formed userspace callers. v2: - Reworked commit message to focus on appropriate log level for parameter validation - Clarified that -EINVAL behavior is preserved (Vitaly) v3: completely drop that warning. Invalid parameters should never clutter the system log. (Christian) Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: rework how we handle TLB fencesAlex Deucher
Add a new VM flag to indicate whether or not we need a TLB fence. Userqs (KFD or KGD) require a TLB fence. A TLB fence is not strictly required for kernel queues, but it shouldn't hurt. That said, enabling this unconditionally should be fine, but it seems to tickle some issues in KIQ/MES. Only enable them for KFD, or when KGD userq queues are enabled (currently via module parameter). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 69c5fbd2b93b5ced77c6e79afe83371bca84c788) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu: Add client ids for gmcv9 mmhubsLijo Lazar
Initialize client ids for gmcv9 mmhubs Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Add client ids for mmhub v2.xLijo Lazar
Initialize client ids for mmhub v2.x Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Add client ids for mmhub v3.xLijo Lazar
Initialize client ids for mmhub v3.x Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Add client ids for mmhub v4.xLijo Lazar
Initialize client ids for mmhub v4.x Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Add client id helpers to mmhubLijo Lazar
Add data structure and helpers to get client id data of mmhub. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Remove dead negative offset check in ↵Srinivasan Shanmugam
amdgpu_virt_init_critical_region() amdgpu_virt_init_critical_region() stores init_hdr_offset as u64. The subsequent check for init_hdr_offset < 0 is therefore always false. Drop the unreachable validation and rely on the existing check_add_overflow() and VRAM end bounds check for offset validation. This resolves the Smatch warning about comparing an unsigned value against zero. drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:953 amdgpu_virt_init_critical_region() warn: unsigned 'init_hdr_offset' is never less than zero. Fixes: 07009df6494d ("drm/amdgpu: Introduce SRIOV critical regions v2 during VF init") Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Ellen Pan <yunru.pan@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Move amdgpu_vm_is_bo_always_valid() before first useSrinivasan Shanmugam
Smatch reports that 'bo' could be NULL in amdgpu_vm_bo_update(), even though amdgpu_vm_is_bo_always_valid() already checks for a NULL BO. Move amdgpu_vm_is_bo_always_valid() earlier in the file so the helper definition appears before its first use. This allows static analysis tools to see the NULL check performed by the helper and avoids the warning. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Drop redundant queue NULL check in hang detect workerSrinivasan Shanmugam
amdgpu_userq_hang_detect_work() retrieves the queue pointer using container_of() from the embedded work item. Since the work structure is part of struct amdgpu_usermode_queue, the returned queue pointer cannot be NULL in normal execution. Remove the redundant !queue check and keep the validation for queue->userq_mgr. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c:159 amdgpu_userq_hang_detect_work() warn: can 'queue' even be NULL? Fixes: 290f46cf5726 ("drm/amdgpu: Implement user queue reset functionality") Cc: Jesse Zhang <Jesse.Zhang@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu : Update psp 13_0_15 ip block supportMangesh Gadre
Included psp_13_0_15 ip block for RAS Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: rework amdgpu_userq_wait_ioctl v4Christian König
Lockdep was complaining about a number of issues here. Especially lock inversion between syncobj, dma_resv and copying things into userspace. Rework the functionality. Split it up into multiple functions, consistenly use memdup_array_user(), fix the lock inversions and a few more bugs in error handling. v2: drop the dma_fence leak fix, turned out that was actually correct, just not well documented. Apply some more cleanup suggestion from Tvrtko. v3: rebase on already done cleanups v4: add missing dma_fence_put() in error path. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: fix adding eviction fenceChristian König
We can't add the eviction fence without validating the BO. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: fix eviction fence and userq manager shutdownChristian König
That is a really complicated dance and wasn't implemented fully correct. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: completely rework eviction fence handling v2Christian König
Well that was broken on multiple levels. First of all a lot of checks were placed at incorrect locations, especially if the resume worker should run or not. Then a bunch of code was just mid-layering because of incorrect assignment who should do what. And finally comments explaining what happens instead of why. Just re-write it from scratch, that should at least fix some of the hangs we are seeing. Use RCU for the eviction fence pointer in the manager, the spinlock usage was mostly incorrect as well. Then finally remove all the nonsense checks and actually add them in the correct locations. v2: some typo fixes and cleanups suggested by Sunil Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: rework how we handle TLB fencesAlex Deucher
Add a new VM flag to indicate whether or not we need a TLB fence. Userqs (KFD or KGD) require a TLB fence. A TLB fence is not strictly required for kernel queues, but it shouldn't hurt. That said, enabling this unconditionally should be fine, but it seems to tickle some issues in KIQ/MES. Only enable them for KFD, or when KGD userq queues are enabled (currently via module parameter). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Add JPEG_v5_0_2 IP blockSonny Jiang
Add support for JPEG_5_0_2 v2: comment out RAS for now (Alex) v3: drop some bringup leftovers (Alex) Signed-off-by: Sonny Jiang <sonjiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Set VCN_5_0_2 DPG modeSonny Jiang
Set DPG flag for VCN_5_0_2 Signed-off-by: Sonny Jiang <sonjiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Add VCN_5_0_2 codecs capabilities supportSonny Jiang
Support VCN_5_0_2 codec query Signed-off-by: Sonny Jiang <sonjiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Add VCN v5_0_2Sonny Jiang
Add support for VCN_5_0_2 v2: squash in RRMT enable bit fix from Sonny (Alex) v3: sqaush in doorbell enablement patch (Alex) v4: drop some bringup leftovers (Alex) Signed-off-by: Sonny Jiang <sonjiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amd/pm: Add fru eeprom info supportAsad Kamal
Add fru eeprom info support for smu_v15_0_8 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17drm/amdgpu: Fix ISP segfault issue in kernel v7.0Pratap Nirujogi
Add NULL pointer checks for dev->type before accessing dev->type->name in ISP genpd add/remove functions to prevent kernel crashes. This regression was introduced in v7.0 as the wakeup sources are registered using physical device instead of ACPI device. This led to adding wakeup source device as the first child of AMDGPU device without initializing dev-type variable, and resulted in segfault when accessed it in the amdgpu isp driver. Fixes: 057edc58aa59 ("ACPI: PM: Register wakeup sources under physical devices") Suggested-by: Bin Du <Bin.Du@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c51632d1ed7ac5aed2d40dbc0718d75342c12c6a)
2026-03-17drm/amdgpu/gmc9.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Cc: Benjamin Cheng <benjamin.cheng@amd.com> Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e14d468304832bcc4a082d95849bc0a41b18ddea) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub4.2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dea5f235baf3786bfd4fd920b03c19285fdc3d9f) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub4.1.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 04f063d85090f5dd0c671010ce88ee49d9dcc8ed) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub3.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f14f27bbe2a3ed7af32d5f6eaf3f417139f45253) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub3.0.2: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1441f52c7f6ae6553664aa9e3e4562f6fc2fe8ea) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub3.0.1: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5f76083183363c4528a4aaa593f5d38c28fe7d7b) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub2.3: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89cd90375c19fb45138990b70e9f4ba4806f05c4) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e064cef4b53552602bb6ac90399c18f662f3cacd) Cc: stable@vger.kernel.org