summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2025-03-05drm/xe: Rename gmdid_map to xe_ipGustavo Sousa
If we pay closer attention to struct gmdid_map, we will realize that it is actually fully describing an IP (graphics or media): it contains "release info" and "features info". The former is comprised of fields "ver" and "name"; and the latter is done via member "ip", which is a pointer to either struct xe_graphics_desc or xe_media_desc, and can be reused across releases. As such let's: * Rename struct gmdid_map to xe_ip. * Rename the field ver to verx100 to be consistent with the naming of members using that encoding of the version. * Rename the field "ip" to "desc" to make it clear that it is a pointer to a descriptor of features for the IP, since it will not contain *all* info (i.e. features + release info). We sill have release info mapped into struct xe_{graphics,media}_desc for pre-GMDID IPs. In an upcoming change we will handle that so that we make a clear separation between "release info" and "feature info". Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-3-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Disambiguate GMDID-based IP namesGustavo Sousa
The name of an IP is a function of its version. As such, given an IP version, it should be clear to identify the name of that IP release. With the current code, we keep that mapping clear for pre-GMDID IPs, but ambiguous for GMDID-based ones. That causes two types of inconveniences: 1. The end user, who might not have all the necessary mapping at hand, might be confused when seeing different possible IP names in the dmesg log. 2. It makes a developer who is not familiar with the "IP version" to "Release name" need to resort to looking at the specs to understand see what version maps to what. While the specs should be the authority on the mapping, we should make our lives easier by reflecting that mapping in the source code. Thus, since the IP name is tied to the version, let's remove the ambiguity by using a "name" field in struct gmdid_map instead of accumulating names in the descriptor instances. This does result in the code having IP name being defined in different structs (gmdid_map, xe_graphics_desc, xe_media_desc), but that will be resolved in upcoming changes. A side-effect of this change is that media_xe2 exactly matches media_xelpmp now, so we just re-use the latter. v2: - Drop media_xe2 and re-use media_xelpmp. (Matt) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-2-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Set IP names in functions handling IP versionGustavo Sousa
In an upcoming change, we will handle setting graphics_name and media_name differently for GMDID-based IPs. As such, let's make both handle_pre_gmdid() and handle_gmdid() functions responsible for initializing those fields. While now we have both doing essentially the same thing with respect to those fields, handle_pre_gmdid() will diverge soon. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-1-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe/userptr: Unmap userptrs in the mmu notifierThomas Hellström
If userptr pages are freed after a call to the xe mmu notifier, the device will not be blocked out from theoretically accessing these pages unless they are also unmapped from the iommu, and this violates some aspects of the iommu-imposed security. Ensure that userptrs are unmapped in the mmu notifier to mitigate this. A naive attempt would try to free the sg table, but the sg table itself may be accessed by a concurrent bind operation, so settle for only unmapping. v3: - Update lockdep asserts. - Fix a typo (Matthew Auld) Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <oak.zeng@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.10+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250304173342.22009-4-thomas.hellstrom@linux.intel.com (cherry picked from commit ba767b9d01a2c552d76cf6f46b125d50ec4147a6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/xe/hmm: Don't dereference struct page pointers without notifier lockThomas Hellström
The pnfs that we obtain from hmm_range_fault() point to pages that we don't have a reference on, and the guarantee that they are still in the cpu page-tables is that the notifier lock must be held and the notifier seqno is still valid. So while building the sg table and marking the pages accesses / dirty we need to hold this lock with a validated seqno. However, the lock is reclaim tainted which makes sg_alloc_table_from_pages_segment() unusable, since it internally allocates memory. Instead build the sg-table manually. For the non-iommu case this might lead to fewer coalesces, but if that's a problem it can be fixed up later in the resource cursor code. For the iommu case, the whole sg-table may still be coalesced to a single contigous device va region. This avoids marking pages that we don't own dirty and accessed, and it also avoid dereferencing struct pages that we don't own. v2: - Use assert to check whether hmm pfns are valid (Matthew Auld) - Take into account that large pages may cross range boundaries (Matthew Auld) v3: - Don't unnecessarily check for a non-freed sg-table. (Matthew Auld) - Add a missing up_read() in an error path. (Matthew Auld) Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <oak.zeng@intel.com> Cc: <stable@vger.kernel.org> # v6.10+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250304173342.22009-3-thomas.hellstrom@linux.intel.com (cherry picked from commit ea3e66d280ce2576664a862693d1da8fd324c317) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/xe/hmm: Style- and include fixesThomas Hellström
Add proper #ifndef around the xe_hmm.h header, proper spacing and since the documentation mostly follows kerneldoc format, make it kerneldoc. Also prepare for upcoming -stable fixes. Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <oak.zeng@intel.com> Cc: <stable@vger.kernel.org> # v6.10+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Matthew Brost <Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250304173342.22009-2-thomas.hellstrom@linux.intel.com (cherry picked from commit bbe2b06b55bc061c8fcec034ed26e88287f39143) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/xe: Add staging tree for VM bindsMatthew Brost
Concurrent VM bind staging and zapping of PTEs from a userptr notifier do not work because the view of PTEs is not stable. VM binds cannot acquire the notifier lock during staging, as memory allocations are required. To resolve this race condition, use a staging tree for VM binds that is committed only under the userptr notifier lock during the final step of the bind. This ensures a consistent view of the PTEs in the userptr notifier. A follow up may only use staging for VM in fault mode as this is the only mode in which the above race exists. v3: - Drop zap PTE change (Thomas) - s/xe_pt_entry/xe_pt_entry_staging (Thomas) Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job") Fixes: a708f6501c69 ("drm/xe: Update PT layer with better error handling") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-5-thomas.hellstrom@linux.intel.com Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> (cherry picked from commit 6f39b0c5ef0385eae586760d10b9767168037aa5) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/xe: Fix fault mode invalidation with unbindThomas Hellström
Fix fault mode invalidation racing with unbind leading to the PTE zapping potentially traversing an invalid page-table tree. Do this by holding the notifier lock across PTE zapping. This might transfer any contention waiting on the notifier seqlock read side to the notifier lock read side, but that shouldn't be a major problem. At the same time get rid of the open-coded invalidation in the bind code by relying on the notifier even when the vma bind is not yet committed. Finally let userptr invalidation call a dedicated xe_vm function performing a full invalidation. Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-4-thomas.hellstrom@linux.intel.com (cherry picked from commit 100a5b8dadfca50d91d9a4c9fc01431b42a25cab) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/xe/vm: Fix a misplaced #endifThomas Hellström
Fix a (harmless) misplaced #endif leading to declarations appearing multiple times. Fixes: 0eb2a18a8fad ("drm/xe: Implement VM snapshot support for BO's and userptr") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-3-thomas.hellstrom@linux.intel.com (cherry picked from commit fcc20a4c752214b3e25632021c57d7d1d71ee1dd) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/xe/vm: Validate userptr during gpu vma prefetchingThomas Hellström
If a userptr vma subject to prefetching was already invalidated or invalidated during the prefetch operation, the operation would repeatedly return -EAGAIN which would typically cause an infinite loop. Validate the userptr to ensure this doesn't happen. v2: - Don't fallthrough from UNMAP to PREFETCH (Matthew Brost) Fixes: 5bd24e78829a ("drm/xe/vm: Subclass userptr vmas") Fixes: 617eebb9c480 ("drm/xe: Fix array of binds") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.9+ Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-2-thomas.hellstrom@linux.intel.com (cherry picked from commit 03c346d4d0d85d210d549d43c8cfb3dfb7f20e0a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/panel: fix Visionox RM692E5 dependenciesArnd Bergmann
The newly added driver uses the DSC helpers, so the corresponding Kconfig option must be enabled: ERROR: modpost: "drm_dsc_pps_payload_pack" [drivers/gpu/drm/panel/panel-visionox-rm692e5.ko] undefined! Fixes: 7cb3274341bf ("drm/panel: Add Visionox RM692E5 panel driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250304142907.732196-1-arnd@kernel.org Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2025-03-05drm/i915/display: convert intel_display.c to struct intel_displayJani Nikula
Going forward, struct intel_display is the main display device data pointer. Convert as much as possible of intel_display.c to struct intel_display. This exposes a couple of outside issues that need to be fixed as well, in a register macro and a DSI PLL stub. Reviewed-by: Nemesa Garg <nemesa.garg@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1c0bafcb978d1cf4f4d54be2f497386f5302f7c8.1741084010.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-03-05drm/i915/display: remove dupe intel_update_watermarks() declarationJani Nikula
intel_wm.h already has intel_update_watermarks() declaration. Remove the dupe. Reviewed-by: Nemesa Garg <nemesa.garg@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/67eeebff3ec9459f7854fbc56cfd7f2aa8c1fdc6.1741084010.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-03-05drm/i915/display: convert intel_has_pending_fb_unpin() to struct intel_displayJani Nikula
Going forward, struct intel_display is the main display device data pointer. The intel_display.[ch] files are too big to convert in one go. Convert intel_has_pending_fb_unpin() to struct intel_display. Reviewed-by: Nemesa Garg <nemesa.garg@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/d70ad8f9cbba5ee32d985b76047b56996ad4b31e.1741084010.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-03-05drm/i915/display: convert some intel_display.[ch] functions to struct ↵Jani Nikula
intel_display Going forward, struct intel_display is the main display device data pointer. The intel_display.[ch] files are too big to convert in one go. Convert the interface towards intel_display_driver.c to struct intel_display. Reviewed-by: Nemesa Garg <nemesa.garg@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ee8b108420763cbf47ee77fa35b782a7293f9cfe.1741084010.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-03-05drm/i915/display: convert various port/phy helpers to struct intel_displayJani Nikula
Going forward, struct intel_display is the main display device data pointer. The intel_display.[ch] files are too big to convert in one go. Convert the various port/phy helpers to struct intel_display. Reviewed-by: Nemesa Garg <nemesa.garg@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e28e53bad5014ba3ef17431557b517f1b8530963.1741084010.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-03-05drm/amd/pm: always allow ih interrupt from fwKenneth Feng
always allow ih interrupt from fw on smu v14 based on the interface requirement Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a3199eba46c54324193607d9114a1e321292d7a1) Cc: stable@vger.kernel.org # 6.12.x
2025-03-05drm/radeon: Fix rs400_gpu_init for ATI mobility radeon Xpress 200MRichard Thier
num_gb_pipes was set to a wrong value using r420_pipe_config This have lead to HyperZ glitches on fast Z clearing. Closes: https://bugs.freedesktop.org/show_bug.cgi?id=110897 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Richard Thier <u9vata@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 044e59a85c4d84e3c8d004c486e5c479640563a6) Cc: stable@vger.kernel.org
2025-03-05drm/amdkfd: Fix NULL Pointer Dereference in KFD queueAndrew Martin
Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence when calling kfd_queue_acquire_buffers. Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size") Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 049e5bf3c8406f87c3d8e1958e0a16804fa1d530) Cc: stable@vger.kernel.org
2025-03-05drm/amd/display: Fix null check for pipe_ctx->plane_state in ↵Ma Ke
resource_build_scaling_params Null pointer dereference issue could occur when pipe_ctx->plane_state is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not null before accessing. This prevents a null pointer dereference. Found by code review. Fixes: 3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 63e6a77ccf239337baa9b1e7787cde9fa0462092) Cc: stable@vger.kernel.org
2025-03-05drm/xe: Increase the XE_PL_TT watermarkThomas Hellström
The XE_PL_TT watermark was set to 50% of system memory. The idea behind that was unclear since the net effect is that TT memory will be evicted to TTM_PL_SYSTEM memory if that watermark is exceeded, requiring PPGTT rebinds and dma remapping. But there is no similar watermark for TTM_PL_1SYSTEM memory. The TTM functionality that tries to swap out system memory to shmem objects if a 50% limit of total system memory is reached is orthogonal to this, and with the shrinker added, it's no longer in effect. Replace the 50% TTM_PL_TT limit with a 100% limit, in effect allowing all graphics memory to be bound to the device unless it has been swapped out by the shrinker. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-8-thomas.hellstrom@linux.intel.com
2025-03-05drm/xe: Add a shrinker for xe bosThomas Hellström
Rather than relying on the TTM watermark accounting add a shrinker for xe_bos in TT or system memory. Leverage the newly added TTM per-page shrinking and shmem backup support. Although xe doesn't fully support WONTNEED (purgeable) bos yet, introduce and add shrinker support for purgeable ttm_tts. v2: - Cleanups bugfixes and a KUNIT shrinker test. - Add writeback support, and activate if kswapd. v3: - Move the try_shrink() helper to core TTM. - Minor cleanups. v4: - Add runtime pm for the shrinker. Shrinking may require an active device for CCS metadata copying. v5: - Separately purge ghost- and zombie objects in the shrinker. - Fix a format specifier - type inconsistency. (Kernel test robot). v7: - s/long/s64/ (Christian König) - s/sofar/progress/ (Matt Brost) v8: - Rebase on Xe KUNIT update. - Add content verifying to the shrinker kunit test. - Split out TTM changes to a separate patch. - Get rid of multiple bool arguments for clarity (Matt Brost) - Avoid an error pointer dereference (Matt Brost) - Avoid an integer overflow (Matt Auld) - Address misc review comments by Matt Brost. v9: - Fix a compliation error. - Rebase. v10: - Update to new LRU walk interface. - Rework ghost-, zombie and purged object shrinking. - Rebase. v11: - Use additional TTM helpers. - Honor __GFP_FS and __GFP_IO - Rebase. v13: - Use ttm_tt_setup_backup(). v14: - Don't set up backup on imported bos. v15: - Rebase on backup interface changes. Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-7-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Add helpers for shrinkingThomas Hellström
Add a number of helpers for shrinking that access core TTM and core MM functionality in a way that make them unsuitable for driver open-coding. v11: - New patch (split off from previous) and additional helpers. v13: - Adapt to ttm_backup interface change. - Take resource off LRU when backed up. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-6-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Add a macro to perform LRU iterationThomas Hellström
Following the design direction communicated here: https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9 Export a LRU walker for driver shrinker use. The walker initially supports only trylocking, since that's the method used by shrinkes. The walker makes use of scoped_guard() to allow exiting from the LRU walk loop without performing any explicit unlocking or cleanup. v8: - Split out from another patch. - Use a struct for bool arguments to increase readability (Matt Brost). - Unmap user-space cpu-mappings before shrinking pages. - Explain non-fatal error codes (Matt Brost) v10: - Instead of using the existing helper, Wrap the interface inside out and provide a loop to de-midlayer things the LRU iteration (Christian König). - Removing the R-B by Matt Brost since the patch was significantly changed. v11: - Split the patch up to include just the LRU walk helper. v12: - Indent after scoped_guard() (Matt Brost) v15: - Adapt to new definition of scoped_guard() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-5-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Use fault-injection to test error pathsThomas Hellström
Use fault-injection to test partial TTM swapout and interrupted swapin. Return -EINTR for swapin to test the callers ability to handle and restart the swapin, and on swapout perform a partial swapout to test that the swapin and release_shrunken functionality. v8: - Use the core fault-injection system. v9: - Fix compliation failure for !CONFIG_FAULT_INJECTION Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-4-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm/pool, drm/ttm/tt: Provide a helper to shrink pagesThomas Hellström
Provide a helper to shrink ttm_tt page-vectors on a per-page basis. A ttm_backup backend could then in theory get away with allocating a single temporary page for each struct ttm_tt. This is accomplished by splitting larger pages before trying to back them up. In the future we could allow ttm_backup to handle backing up large pages as well, but currently there's no benefit in doing that, since the shmem backup backend would have to split those anyway to avoid allocating too much temporary memory, and if the backend instead inserts pages into the swap-cache, those are split on reclaim by the core. Due to potential backup- and recover errors, allow partially swapped out struct ttm_tt's, although mark them as swapped out stopping them from being swapped out a second time. More details in the ttm_pool.c DOC section. v2: - A couple of cleanups and error fixes in ttm_pool_back_up_tt. - s/back_up/backup/ - Add a writeback parameter to the exported interface. v8: - Use a struct for flags for readability (Matt Brost) - Address misc other review comments (Matt Brost) v9: - Update the kerneldoc for the ttm_tt::backup field. v10: - Rebase. v13: - Rebase on ttm_backup interface change. Update kerneldoc. - Rebase and adjust ttm_tt_is_swapped(). v15: - Rebase on ttm_backup return value change. - Rebase on previous restructuring of ttm_pool_alloc() - Rework the ttm_pool backup interface (Christian König) - Remove cond_resched() (Christian König) - Get rid of the need to allocate an intermediate page array when restoring a multi-order page (Christian König) - Update documentation. Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-3-thomas.hellstrom@linux.intel.com
2025-03-05drm/ttm: Provide a shmem backup implementationThomas Hellström
Provide a standalone shmem backup implementation. Given the ttm_backup interface, this could later on be extended to providing other backup implementation than shmem, with one use-case being GPU swapout to a user-provided fd. v5: - Fix a UAF. (kernel test robot, Dan Carptenter) v6: - Rename ttm_backup_shmem_copy_page() function argument (Matthew Brost) - Add some missing documentation v8: - Use folio_file_page to get to the page we want to writeback instead of using the first page of the folio. v13: - Remove the base class abstraction (Christian König) - Include ttm_backup_bytes_avail(). v14: - Fix kerneldoc for ttm_backup_bytes_avail() (0-day) - Work around casting of __randomize_layout struct pointer (0-day) v15: - Return negative error code from ttm_backup_backup_page() (Christian König) - Doc fixes. (Christian König). Cc: Christian König <christian.koenig@amd.com> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/intel-xe/20250305092220.123405-2-thomas.hellstrom@linux.intel.com
2025-03-05drm/amdgpu: Fix core reset sequence for JPEG5_0_1Sathishkumar S
For cores 1 through 9 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: flag per-sdma queue reset supported to user spaceJonathan Kim
Similar to compute queue reset, flag SDMA queue reset capabilities to user space for safe testing. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: implement per queue sdma reset for gfx 9.4+Jonathan Kim
To reset hung SDMA queues on GFX 9.4+ for the GFX9 family, a soft reset must be issued through SMU. Since soft resets will reset an entire SDMA engine, use a common KGD call to do the reset as the KGD will handle avoiding a reset of in flight GFX and paging queues on that engine. In addition, create a common call for all reset types to simplify the handling of module parameter settings that block gpu resets. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.cVictor Lu
SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Fix core reset sequence for JPEG4_0_3Sathishkumar S
For cores 1 through 7 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Add support for CPERs on virtualizationTony Yi
Add support for CPERs on VFs. VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host event guard, CPER queries need to be rate limited. CPER queries share the same RAS telemetry buffer as error count query, so a mutex protecting the shared buffer was added as well. For readability, the amdgpu_detect_virtualization was refactored into multiple individual functions. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: remove unnecessary cpu domain validationJames Zhu
before move to GTT domain. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: use drm_* instead of DRM_ in apply_edid_quirks()Aurabindo Pillai
drm_* macros are more helpful that DRM_* macros since the former indicates the associated DRM device that prints the error, which maybe helpful when debugging. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/display: Add workaround for a panelAurabindo Pillai
Implement w/a for a panel which requires 10s delay after link detect. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Update headers for CPER support on SRIOVTony Yi
Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for CPER support on VFs. PMFW ACA messages are not available on VFs, and VFs must query CPERs from host. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/pm: always allow ih interrupt from fwKenneth Feng
always allow ih interrupt from fw on smu v14 based on the interface requirement Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Reinit FW shared flags on VCN v5.0.1Lijo Lazar
After a full device reset, shared memory region will clear out and it's not possible to reliably save the region in case of RAS errors. Reinitialize the flags if required. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Use the right struct for VCN v5.0.1Lijo Lazar
VCN IP versions >= 5.0 uses VCN5 fw shared struct. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdkfd: Fix NULL Pointer Dereference in KFD queueAndrew Martin
Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence when calling kfd_queue_acquire_buffers. Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size") Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: add dce_v6_0_soft_reset() to DCE6Alexandre Demers
DCE6 was missing soft reset, but it was easily identifiable under radeon. This should be it, pretty much as it is done under DCE8 and DCE10. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: fix style in DCE6Alexandre Demers
Whitespace cleanups. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: add some comments in DCE6Alexandre Demers
Add some comments. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amd/pm: Fix indentation issueAsad Kamal
Fix indentation issue for smu_v_13_0_12 get_gpu_metrics Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502272246.OISqUnC1-lkp@intel.com Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Set PG state to gating for vcn_v_5_0_1Asad Kamal
For vcn_v_5_0_1, set power state to gating during hw fini. Also there may be scenario where VCN engine hangs during a job execution, then it's not safe to assume that set_pg_state works fine during hw_fini to put the state to gated. After a reset, we can assume that it's in the default state, therefore reset the driver maintained state. Put the default state as gated during reset as per this assumption. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Remove unused pqm_get_kernel_queueDr. David Alan Gilbert
pqm_get_kernel_queue() has been unused since 2022's commit 5bdd3eb25354 ("drm/amdkfd: Remove unused old debugger implementation") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Remove unused print__rq_dlg_params_stDr. David Alan Gilbert
print__rq_dlg_params_st() was added in 2017 by commit 061bfa06a42a ("drm/amdgpu/display: Add dml support for DCN") but has remained unused. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Remove unused pre_surface_traceDr. David Alan Gilbert
pre_surface_trace() has been unused since 2017's commit 745cc746da42 ("drm/amd/display: remove dc_pre_update_surfaces_to_stream from dc use") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-05drm/amdgpu: Remove powerdown_uvd memberDr. David Alan Gilbert
With phm_powerdown_uvd() gone in the previous patch, there's now no longer anything that reads the powerdown_uvd member of the pp_hwmgr_func. Remove it. There are a few assignments to it; a boring NULL which can just go, and two functions, but those functions are called explicitly anyway so the assignments to the member go. One of those (smu7_powerdown_uvd) wasn't static previously; make it static. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>