summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
4 daysdrm/amd/pm: fix smu13 power limit range calculationYang Wang
SMU13 reports SocketPowerLimitAc/Dc as the default power limit, but MsgLimits.Power may carry a different firmware bound for the same PPT throttler. Using only the socket limit for both min and max can therefore expose an incorrect power range. Keep the socket limit as the default, but derive the range from both values: use the lower value for the min base and the higher value for the max base before applying OD percentages. Keep the current limit query independent from the cap calculation. Fixes: 1eaf26db9590 ("drm/amd/pm: fix smu13 power limit default/cap calculation") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5419 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f45bbf0f62f266ed8422d84f347d75d5fca846a7) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: flush pending RCU callbacks on module unloadPerry Yuan
Call rcu_barrier() in module exit to wait for outstanding call_rcu() callbacks before freeing module text, preventing late callback execution in freed memory. BUG: unable to handle page fault for address: ffffffffc1d59c40 PGD 6a12067 P4D 6a12067 PUD 6a14067 PMD 13698b067 PTE 0 Oops: 0010 [#1] SMP NOPTI RIP: 0010:0xffffffffc1d59c40 Code: Unable to access opcode bytes at RIP 0xffffffffc1d59c16. RSP: 0018:ffffc900198c0f28 EFLAGS: 00010286 RAX: ffffffffc1d59c40 RBX: ffff897c7d6b61c0 RCX: ffff88826aff4590 RDX: ffff8884d8b35490 RSI: ffffc900198c0f30 RDI: ffff88812af67290 RBP: 000000000000000a (DONE segment entries) R08: 0000000000000000 R09: 0000000000000100 R10: 0000000000000000 R11: ffffffff82a06100 R12: ffff88811a4e3700 R13: 0000000000000000 R14: ffff897c7d6b6270 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff897c7d680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc1d59c16 CR3: 00000104a980a001 CR4: 0000000002770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? rcu_do_batch+0x163/0x450 ? rcu_core+0x177/0x1c0 ? __do_softirq+0xc1/0x280 ? asm_call_irq_on_stack+0xf/0x20 </IRQ> ? do_softirq_own_stack+0x37/0x50 ? irq_exit_rcu+0xc4/0x100 ? sysvec_apic_timer_interrupt+0x36/0x80 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 ? cpuidle_enter_state+0xd4/0x360 ? cpuidle_enter+0x29/0x40 ? cpuidle_idle_call+0x108/0x1a0 ? do_idle+0x77/0xf0 ? cpu_startup_entry+0x19/0x20 ? secondary_startup_64_no_verify+0xbf/0xcb Signed-off-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit feaa5039f6c12acc9aa934c2d45dcd251a12c69f)
4 daysdrm/amdgpu: Fix AMDGPU_GTT_MAX_TRANSFER_SIZE for non-4K systemsDonet Tom
Running RCCL unit tests on a system with a 64K PAGE_SIZE triggers the following warning and causes the test to terminate on latest upstream kernel: WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1335 at amdgpu_bo_release_notify+0x1bc/0x280 [amdgpu], CPU#18: rccl-UnitTests/33151 Call trace: amdgpu_bo_release_notify ttm_bo_release amdgpu_gem_object_free drm_gem_object_free amdgpu_bo_unref amdgpu_bo_create amdgpu_bo_create_user amdgpu_gem_object_create amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu kfd_ioctl_alloc_memory_of_gpu kfd_ioctl sys_ioctl The warning is triggered because amdgpu_ttm_next_clear_entity() returns NULL when a clear buffer operation is requested. This happens because the GART window allocation for the default_entity, clear_entity and move_entity fails during initialization. Commit [1] introduced separate GART windows for the default_entity, clear_entity and move_entity of each SDMA instance. Their sizes are derived from AMDGPU_GTT_MAX_TRANSFER_SIZE, which is currently defined as 1024 pages. This implicitly assumes a 4K PAGE_SIZE, where 1024 pages correspond to a 4MB transfer. On a 64K PAGE_SIZE system, however, the same value expands to 64MB. The default_entity and clear_entity each allocate one AMDGPU_GTT_MAX_TRANSFER_SIZE GART window, while the move_entity allocates two such windows. This results in 16MB of GART space per SDMA instance on a 4K PAGE_SIZE system, but 256MB per SDMA instance on a 64K PAGE_SIZE system. On an MI210 system with five SDMA instances and a 512MB GART aperture, the total GART space required becomes 1.25GB, exceeding the available GART aperture. Consequently, GART window allocation fails, amdgpu_ttm_next_clear_entity() returns NULL, and the above warning is triggered. Redefine AMDGPU_GTT_MAX_TRANSFER_SIZE in bytes instead of page units. Where a page count is required, convert it using PAGE_SHIFT. This preserves the existing 4MB transfer size across all PAGE_SIZE configurations while keeping GART window allocations within the available GART aperture. [1] https://lore.kernel.org/all/20260408100327.1372-3-pierre-eric.pelloux-prayer@amd.com/#t Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5435 Fixes: 897ee11ec020 ("drm/amdgpu: create multiple clear/move ttm entities") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 27213b776a666d3030de5acc3cd75278197b0494) Cc: stable@vger.kernel.org
4 daysdrm/amdkfd: Use kvcalloc to allocate arraysDavid Francis
There were a few instances in kfd_chardev.c of kvzalloc being used to allocate memory for an array. Switch those to kvcalloc, which - is the standard way of allocating a zero-initialized array - does a check for the mul overflowing Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 60b048c93f7a3add39757ad65fe2bb6e58eeae23) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: add support for GC IP version 11.7.1Granthali Vinodkumar Dhandar
Initialize GC IP 11_7_1 Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a928d8d81ec5cdb5a8944d08136720811efad0f6)
4 daysdrm/amdgpu: add support for GC IP version 11.7.0Granthali Vinodkumar Dhandar
Initialize GC IP 11_7_0 Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cf591e67c095542a16475df293ec7bc9a118e4ee)
4 daysdrm/amdgpu: add the doorbell index input for suspending userqPrike Liang
It requires inputing the doorbell offset for MES firmware preempts the userq, and adding the doorbell offset also keep aliging with the union MESAPI__SUSPEND in MES firmware. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bc434335ab3c096a33a9e88c7951b4ac574db458) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/mes12: set doorbell offset for suspending userqPrike Liang
Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to add the doorbell offset for suspending userq. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5b58a2c120063544869d0284d3b355527f9f04f5) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/mes11: set doorbell offset for suspending userqPrike Liang
Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to add the doorbell offset for suspending userq. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 30af09db33696f7e0de5c0c505cbb0cb92b6e25b) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: fix check in amdgpu_hmm_invalidate_gfxChristian König
For a short moment during alloc/free the userptr BO is not part of his VM, so bo->vm_bo can be NULL. Keep a reference to the VM root PD as parent of the userptr BO so that we can always use that to wait for all submissions of the VM instead of only the one involving the userptr BO. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 91250893cbaa ("drm/amdgpu: fix waiting for all submissions for userptrs") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5399 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 631849ff5d603841e74f19f4a5e30fe1f7d7cf30) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/jpeg: fix jpeg_v5_0_1_is_idle detectionBoyuan Zhang
jpeg_v5_0_1_is_idle() initializes ret to false and then accumulates ring idle status using &=. Since false & condition always remains false, the function can never report the JPEG block as idle. Initialize ret to true so the function returns true only when all JPEG rings report RB_JOB_DONE. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 680adf5faeeabb4585f7aeb53681719e2d6c2f41) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: Rename moved state to needs_updateNatalie Vock
This state can be reached via other means than physical moves, like PRT bindings. Make the name match the actual purpose of the state. Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1f7a795fb9f8186bd81ca9c4a80f75482db53c9e)
4 daysdrm/amdgpu: Only set bo->moved when the BO was actually movedNatalie Vock
The "moved" VM state is a bit unfortunately named, because BOs can end up in this state without being physically moved. While we need to invalidate every mapping when BOs are physically moved, in some other cases like PRT binds/unbinds there is no need to refresh mappings except those affected by the bind. Full invalidation of all BO mappings manifested as severe regressions in PRT bind performance, which this patch fixes. The offending patch is 4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4") in the amd-staging-drm-next tree, although it has not yet propagated anywhere else. Fixes: 4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5437 Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0b2fa33b4235991a100dd799c891cf5c242aaed1) Cc: stable@vger.kernel.org
4 daysdrm/amd/display: guard against overflow in HDCP message dumpHarry Wentland
[Why] mod_hdcp_dump_binary_message() computed target_size (a uint32_t) as roughly byte_size * msg_size and gated the whole write on buf_size >= target_size. A large msg_size can overflow target_size, wrapping it to a small value that passes the check while the loop still writes byte_size * msg_size bytes into buf. All current callers pass small constants so this is not reachable today, but the unchecked arithmetic should be hardened. [How] Drop the overflow-prone target_size precomputation and instead bounds-check the output position on every iteration, stopping once the next entry would not leave room for the trailing terminator. This cannot overflow and, for oversized messages, dumps as much as fits rather than printing nothing. Fixes: 4c283fdac08a ("drm/amd/display: Add HDCP module") Assisted-by: Copilot:claude-opus-4.8 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d0a775e5d70b376696245a14c09e3aa6dde0023a) Cc: stable@vger.kernel.org
4 daysdrm/amd/display: use kvzalloc to allocate struct dcHonglei Huang
struct dc has grown large over time (most of it the two inlined dc_scratch_space copies) and now sits close to the page allocator's 4 MiB contiguous allocation limit. Its actual size is not fixed by the source alone, it also depends on the compiler and the .config, so it can easily cross 4 MiB, e.g. with a newer GCC or a config change. dc_create() allocates it with kzalloc(). Once struct dc exceeds 4 MiB the request is rounded up to order 11 (8 MiB), which is above MAX_PAGE_ORDER, so the page allocator warns and returns NULL. dc_create() then fails, DM init fails and amdgpu probe aborts with -EINVAL: WARNING: mm/page_alloc.c:5197 at __alloc_frozen_pages_noprof+0x2f9/0x380 dc_create+0x38/0x660 [amdgpu] amdgpu_dm_init+0x2d9/0x510 [amdgpu] dm_hw_init+0x1b/0x90 [amdgpu] amdgpu_device_init.cold+0x150d/0x1e13 [amdgpu] amdgpu_driver_load_kms+0x19/0x80 [amdgpu] amdgpu_pci_probe+0x1e2/0x4c0 [amdgpu] dc_create() then returns NULL and DM init fails, which aborts the whole GPU init and makes amdgpu probe fail with -EINVAL ("hw_init of IP block <dm> failed -22"), leaving the display unusable. The subsequent amdgpu_irq_put() warnings during teardown are just fallout of unwinding a half-initialized device. struct dc is a software-only bookkeeping structure that is never handed to hardware DMA and is only ever kept as an opaque pointer, so it does not require physically contiguous memory. Allocate it with kvzalloc() (and free it with kvfree()) so that the allocator can fall back to vmalloc() when a contiguous allocation of that size is not available, which also avoids the MAX_PAGE_ORDER warning entirely. v2: - Rebase to amd-staging-drm-next. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5406 Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Honglei Huang <honghuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 991e0516a8072f2292681c6ae98a924ab0e32575) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: invoke pm_genpd_remove() before freeing genpdCe Sun
Call pm_genpd_remove() to unregister from global list prior to releasing acp_genpd memory, and clear the pointer after free. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cd8650d7a91ee8b768e202354672553faa5cc1f2) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: fix resource leak on ACP reset timeoutCe Sun
When ACP soft reset poll times out, original code returns early without cleanup, leaking MFD child devices, genpd links and all ACP heap allocations. Replace direct early return with goto out to force run all cleanup logic regardless of reset success, preserve timeout error code for caller. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 98073e4328d7a8d75d03696ab27f6de70ef1aeda) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: reject mapping a reserved doorbell to a new queueZhu Lingshan
When creating an user-queue, the user space provides a doorbell BO handle and an offset within the bo to obtain a doorbell. However current implementation using xa_store_irq() to store a doorbell, which allows a later queue created with the same BO and offset parameters to overwrite an existing queue and doorbell mapping. This can cause problems like misrouting fence IRQ processing to a wrong queue, and mislead the cleanup process of one queue erasing the mapping of another queue. This commit fixes this issue by replacing xa_store_irq with xa_insert_irq, which rejects mapping a reserved doorbell to a newly created queue Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6244eae22966350db52faf9c1369d3b2ffc5de4e) Cc: stable@vger.kernel.org
4 daysdrm/amd/display: Handle struct drm_plane_state.ignore_damage_clipsThomas Zimmermann
The mode-setting pipeline can disabled damage clippings for a commit by setting ignore_damage_clips in struct drm_plane_state. The commit will then do a full display update. Test the flag in DCN code and do a full update in DCN code if it has been set. Commit 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers to ignore damage clips") introduced ignore_damage_clips to selectively ignore damage clipping in certain framebuffer changes. This driver does not do that, but DRM's damage iterator will soon rely on the flag. Therefore supporting it here as well make sense for consistency. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers to ignore damage clips") Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Zack Rusin <zackr@vmware.com> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a24019f6480fad5c077b5956eed942c8960323d6) Cc: <stable@vger.kernel.org> # v6.8+
4 daysdrm/amdgpu/gfx12: fix EOP interrupt routing for KQ and userqJesse Zhang
Try KQ by ring_id first (KCQ and UQ never share a HW slot); fall back to amdgpu_userq_process_fence_irq() on miss, since KCQ EOPs were misrouted into the userq fence path when enable_mes is true. Require a strict (me,pipe,queue) match in the gfx case, then userq gfx EOPs fall through to amdgpu_userq_process_fence_irq(). Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6c1f4f7ff08448e0e18cd7fc4e59d6c96a36f25d) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx11: fix EOP interrupt routing for KQ and userqJesse Zhang
Try KQ by ring_id first (KCQ and UQ never share a HW slot); fall back to amdgpu_userq_process_fence_irq() on miss, since KQ EOPs were misrouted into the userq fence path when enable_mes is true. Require a strict (me,pipe,queue) match in the gfx case, then userq gfx EOPs fall through to amdgpu_userq_process_fence_irq(). Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 88e589cc811ba907209a426c426c469bcb4bb894) Cc: stable@vger.kernel.org
4 daysdrm/amdkfd: clamp v9 CRIU control stack checkpoint copy to BO sizeYongqiang Sun
CRIU checkpoint copies the MQD control stack using cp_hqd_cntl_stack_size from hardware without bounding it to the allocated BO region. If the HW field is larger than the queue's control stack allocation, memcpy reads past the BO into adjacent GTT memory and can leak kernel data to userspace. Store the page-aligned control stack BO size in mqd_manager and clamp checkpoint copies and reported checkpoint sizes to min(cp_hqd_cntl_stack_size, mm->ctl_stack_size). Apply the same bound for multi-XCC v9.4.3 checkpoint layout. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Reviewed-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6c2abd0ec09e86c6323010673766f76050e28aa3) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: fix aperture mapping leakAsad Kamal
amdgpu_pci_remove() calls drm_dev_unplug() before invoking the driver fini routines. This causes drm_dev_enter() in amdgpu_ttm_fini() to always return false, so iounmap(aper_base_kaddr) never runs on normal driver unload, leaving an orphaned entry in the x86 PAT interval tree. On connected_to_cpu hardware, the aperture is mapped write-back (WB) via ioremap_cache(). On reload, IP discovery calls memremap(..., MEMREMAP_WC) over the same range. The WC vs WB conflict causes: ioremap error for 0x..., requested 0x1, got 0x0 amdgpu: discovery failed: -2 Fix by switching to devres-managed mappings so cleanup is guaranteed regardless of drm_dev_enter() state: - connected_to_cpu path: devm_memremap(MEMREMAP_WB). For IORESOURCE_SYSTEM_RAM ranges this takes the try_ram_remap() shortcut, returning __va(offset) from the existing kernel direct map. No new ioremap VA or PAT entry is created, so there is nothing to orphan. - dGPU path: devm_ioremap_wc() registers iounmap() as a devres action, guaranteeing cleanup at device_del() time. Also remove iounmap(aper_base_kaddr) from amdgpu_device_unmap_mmio() since the mapping is now devres-owned. v2: Remove redundant x86_64 guard (Lijo) Fixes: 9d0af8b4def0 ("drm/amdgpu: pre-map device buffer as cached for A+A config") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d871e99879cb5fd1fa798b006b4888887e63a17a) Cc: stable@vger.kernel.org
4 daysdrm/amd/display: avoid large stack allocation in ↵Arnd Bergmann
commit_planes_do_stream_update_sequence The function has two arrays on the stack to hold temporary dsc_optc_config and dsc_config objects. The combination blows through common stack frame warning limits in combination with the other local variables: drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4070:22: error: stack frame size (1352) exceeds limit (1280) in 'commit_planes_do_stream_update_sequence' [-Werror,-Wframe-larger-than] Since neither array is initialized or used outside of the add_link_update_dsc_config_sequence() function, there is no actual need to keep each element around. Replace the arrays with a single instance each to reduce the stack usage to less than half. Fixes: 9f49d3cd7e71 ("drm/amd/display: Implement block sequencing infrastructure for modular hardware operations.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Acked-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9e0896fa6f7dbe9ca3dbbd3b593fa91670f4820b) Cc: stable@vger.kernel.org
4 daysdrm/amd/display: Remove DCCG registers not needed in DCN42Matthew Stewart
[why] Some resources that exist in the DCN block are not needed and shouldn't be used. [how] Remove defines from register lists. Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dac8aa629a45e34027444f74d3b86b6f104b024c)
4 daysdrm/amd/display: Fix DCN42 null registers & register masksMatthew Stewart
[why] The register lists used on DCN42 variants are different. Some reused codepaths are trying to access registers not used. [how] Add DISPCLK_FREQ_CHANGECNTL, HUBPREQ_DEBUG, and HDMISTREAMCLK_CNTL to the register lists. Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com> Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Signed-off-by: George Zhang <george.zhang@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 64142f9d51aff32f4130d916cb8f044a072ad27d)
4 daysdrm/amdkfd: Guard m->cp_hqd_eop_control setting by q->eop_ring_buffer_sizeXiaogang Chen
To avoid wraparound if the value is 0. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c0cae35661868af207077a4306bc42c7c972947c) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/vce: fix integer overflow in image sizeBoyuan Zhang
Fix a security vulnerability where malicious VCE command streams with oversized dimensions (e.g. 65536×65536) cause 32-bit integer overflow, wrapping the calculated buffer size to 0. This bypasses validation and allows GPU firmware to perform out-of-bound memory access. The fix uses 64-bit arithmetic to detect overflow and rejects invalid dimensions before they reach the hardware. V2: remove redundant check V3: modify max height value V4: remove size64 Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cbe408dba581755ad1279a487ec786d8927d778d) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/vcn4: avoid rereading IB param lengthBoyuan Zhang
Reuse the parameter length returned by vcn_v4_0_enc_find_ib_param() instead of rereading it from the IB. This avoids a potential TOCTOU issue if the IB contents change between reads. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: David Rosca <david.rosca@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dbb02b4755f8c1f3773263f2d779872c1c0c073a) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: fix division by zero with invalid uvd dimensionsBoyuan Zhang
When width or height is less than 16, width_in_mb or height_in_mb becomes 0, leading to fs_in_mb being 0. This causes a division by zero when calculating num_dpb_buffer in H264 and H264 Perf decode paths. Add validation to reject frames with width < 16 or height < 16 before performing any calculations that depend on these values. V2: Format change - move up all vaiable definitions. V3: Use warn_once to avoid spam. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3e41d26c70b0a459d041cc19482a226c4b7423cb) Cc: stable@vger.kernel.org
4 daysdrm/amd/display: set MSA MISC1 bit 6 when using VSC SDP for DCE 11.xLeorize
When BT.2020 colorimetry is selected, the driver sends information using VSC SDP but does not set "ignore MSA colorimetry" bit on older GPUs with DCE-based IPs. This causes certain sinks to prefer colorimetry information in DP MSA, resulting in terrible color rendering ("dull" colors) when HDR is enabled. This commit wires up the MISC1 bit 6 for GPUs with DCE 11.x based IPs to correctly configure sinks to ignore colorimetry information in MSA, resolving the color rendering issue. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4849 Assisted-by: oh-my-pi:GPT-5.5 Signed-off-by: Leorize <leorize+oss@disroot.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 323a09e56c1d549ce47d4f110de77b0051b4a8bf) Cc: stable@vger.kernel.org
4 daysdrm/amd/pm: fix amdgpu_pm_info power display unitsYang Wang
amdgpu_pm_info displayed power sensor readings with the wrong fractional unit. It treated the low byte of the raw sensor value as the decimal part of watts, while that field represents milliwatts in the decoded value. As a result, debugfs could report misleading SoC power when the remainder was not already a two-digit centiwatt value. Example with query = 0x00000354: raw field value --------------------- query >> 8 3 W query & 0xff 84 mW decoded power 3084 mW output value --------------------- before 3.84 W after 3.08 W Fixes: f0b8f65b4825 ("drm/amd/amdgpu: fix the GPU power print error in pm info") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 01992b121fb652c753d37e0c1427a2d1a557d2b1) Cc: stable@vger.kernel.org
4 daysdrm/amd/pm: make pp_features read-only when scpm is enabledYang Wang
SCPM owns power feature control when enabled. Make pp_features read-only during sysfs setup by clearing its write bits and store callback. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6a5786e191fdce36c5db170e5209cf609e8f0087) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/sdma7.1: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c4f230b51cf2d3e7e8b1c800331f3dbed2a9e3f5) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/sdma7.0: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9723a8bed3aa251a26bee4583bac9d8fb064dd44) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/sdma6.0: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c17a508a7d652da3728f8bbc481bfffe96d65a87) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/sdma5.2: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ae658afc7f47f6147371ec42cc6b1a793dfdb5af) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/sdma5.0: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8d144a0eb09537055841af48c9e7c2d4cd48e84d) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/sdma4.4.2: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fa4f86a148271e325e95287630a3a15a9cd35fdc) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx12.1: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e4d99e04b2e9b13b97d3b17804c735f62689db23) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx12: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f952076f76d62f783e8ba4995a7c400d39354ccf) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx11: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit daa62107452d2451787c4248ca38fa2d1a0cbefd) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx10: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ac6f00beb658239bced4aaed9efbb04a35348d48) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx9.4.3: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5676593d08998d7a6d9e2d51d6b54b3820e3755c) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx9: replace BUG_ON() with WARN_ON()Alex Deucher
There's no need to crash the kernel for these cases. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b71604f8685b0eba07866f4e8dc30f93e1931054) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/gfx8: drop unecessary BUG_ON()Alex Deucher
There's no need to crash the kernel for this case. Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4d7c25208ca612b754f3bf39e9f16e725b828891) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu/soc24: reset dGPU if suspend got abortedJakob Linke
For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an S3 suspend got aborted, the same issue already handled for SOC21 and SOC15: commit df3c7dc5c58b ("drm/amdgpu: Reset dGPU if suspend got aborted") commit 38e8ca3e4b6d ("amdgpu/soc15: enable asic reset for dGPU in case of suspend abort") The aborted resume fails with: amdgpu: SMU: No response msg_reg: 6 resp_reg: 0 amdgpu: Failed to enable requested dpm features! amdgpu: resume of IP block <smu> failed -62 Apply the same workaround for soc24: detect the aborted-suspend state at resume via the sign-of-life register and reset the device before re-init. This is a workaround till a proper solution is finalized. Fixes: 98b912c50e44 ("drm/amdgpu: Add soc24 common ip block (v2)") Signed-off-by: Jakob Linke <jakob@linke.cx> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fed5bdbfe1d4a19a26c70f7fc58017dc88be1c18) Cc: stable@vger.kernel.org
4 daysMerge tag 'nf-26-06-30' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net The following patchset contains Netfilter fixes for *net*. Due to bug volume the plan is to make a second *net* pull request this Friday. 1) Zero nf_conntrack_expect at allocation to prevent uninitialized data leaks to userspace. Add missing exp->dir initialization. 2) Prevent out-of-bounds writes in nft_set_pipapo caused by inconsistent clones during allocation failures. Fail operations if the clone enters an error state. This was a day-0 bug. 3) Fix use-after-free race between ipset dump and array resizing. Protect array pointer access with rcu_read_lock(). From Xiang Mei. Bug existed since v4.20. 4) Validate skb_dst() exists before access in nf_conntrack_sip. This Prevent crash when called from tc ingress or openvswitch. From Pablo Neira Ayuso. Bug added in 4.3 when ovs gained support for conntrack helpers. 5) Cap the maximum number of expectations to NF_CT_EXPECT_MAX_CNT during userspace helper policy updates. Also from Pablo. 6) Prevent NULL pointer dereference in nft_fib on netdev egress hooks. Add nft_fib_netdev_validate() to restrict fib expressions to appropriate netdev hooks. Restrict nft_fib_validate() to IPv4, IPv6, and INET protocols. From Theodor Arsenij Larionov-Trichkine. Bug was exposed in v5.16 when egress hooks got added. 7) Restrict nfnetlink_queue writes to network headers. Validate IP/IPv6 header length and disable extension headers or IP option modifications. Disable bridge modification for now, its unlikely anyone is using this. 8) Restrict arbitrary writes to link-layer and network headers in nftables. Prevent link-layer modifications from spilling into network headers. Prevent writes to IP version and length fields. 9) Restrict L3 checksum update offset to IPv4. Else csum offset can be used to munge arbitrary header offsets, rendering the previous change moot. These three patches are follow-ups to a 7.1 change that disabled header rewrite ability in unprivileged network namespaces. unprivileged netns support is not yet enabled again here. netfilter pull request nf-26-06-30 * tag 'nf-26-06-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nftables: restrict checkum update offset netfilter: nftables: restrict linklayer and network header writes netfilter: nfnetlink_queue: restrict writes to network header netfilter: nft_fib: reject fib expression on the netdev egress hook netfilter: nfnetlink_cthelper: cap to maximum number of expectation per master netfilter: nf_conntrack_sip: validate skb_dst() before accessing it netfilter: ipset: fix race between dump and ip_set_list resize netfilter: nft_set_pipapo: don't leak bad clone into future transaction netfilter: nf_conntrack_expect: zero at allocation time ==================== Link: https://patch.msgid.link/20260630045243.2657-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysdrm/xe/rtp: Add RING_FORCE_TO_NONPRIV_DENY to OA whitelistsAshutosh Dixit
Unconditionally whitelisting OA registers is a security violation. Set RING_FORCE_TO_NONPRIV_DENY bit in OA nonpriv slots, so that OA registers don't get whitelisted by default after probe, gt reset, resume and engine reset. Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support") Cc: stable@vger.kernel.org # v6.12+ Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260615224227.34880-2-ashutosh.dixit@intel.com (cherry picked from commit 90511bdcfda97211c01f1d945d4ea616578d8fca) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
4 daysdrm/xe: Remove redundant exec_queue_suspended() check in submit_exec_queue()Lu Yao
There already has a check for exec_queue_suspended(q) that returns early if suspended. Fixes: 65280af331aa ("drm/xe/multi_queue: skip submit when primary queue is suspended") Signed-off-by: Lu Yao <yaolu@kylinos.cn> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260617012516.19930-1-yaolu@kylinos.cn Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 173202a5a3a9e6590194ce0f5880d1529a71ade7) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>