summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/xe
AgeCommit message (Collapse)Author
2026-03-23drm/xe: Implement recent spec updates to Wa_16025250150Matt Roper
The hardware teams noticed that the originally documented workaround steps for Wa_16025250150 may not be sufficient to fully avoid a hardware issue. The workaround documentation has been augmented to suggest programming one additional register; make the corresponding change in the driver. Fixes: 7654d51f1fd8 ("drm/xe/xe2hpg: Add Wa_16025250150") Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20260319-wa_16025250150_part2-v1-1-46b1de1a31b2@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-23drm/xe/pf: Fix use-after-free in migration restoreMichał Winiarski
When an error is returned from xe_sriov_pf_migration_restore_produce(), the data pointer is not set to NULL, which can trigger use-after-free in subsequent .write() calls. Set the pointer to NULL upon error to fix the problem. Fixes: 1ed30397c0b92 ("drm/xe/pf: Add support for encap/decap of bitstream to/from packet") Reported-by: Sebastian Österlund <sebastian.osterlund@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7230 Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260217154118.176902-1-michal.winiarski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit 4f53d8c6d23527d734fe3531d08e15cb170a0819) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-03-23drm/xe/xe3p: Skip TD flushTejas Upadhyay
Xe3p has HW ability to do transient display flush so the xe driver can enable this HW feature by default and skip the software TD flush. Bspec: 60002 Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Link: https://patch.msgid.link/20260305121902.1892593-10-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-23drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimizationTejas Upadhyay
When set, starting xe3p_lpg, the L2 flush optimization feature will control whether L2 is in Persistent or Transient mode through monitoring of media activity. To enable L2 flush optimization include new feature flag GUC_CTL_ENABLE_L2FLUSH_OPT for Novalake platforms when media type is detected. Tighten UAPI validation to restrict userptr, svm and dmabuf mappings to be either 2WAY or XA+1WAY V5(Thomas): logic correction V4(MattA): Modify uapi doc and commit V3(MattA): check valid op and pat_index value V2(MattA): validate dma-buf bos and madvise pat-index Acked-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: Carl Zhang <carl.zhang@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260305121902.1892593-9-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-23drm/xe/pat: define coh_mode 2wayTejas Upadhyay
Defining 2way (two-way coherency) is critical for Xe3p_LPG (Nova Lake P) platforms to support L2 flush optimization safely. This mode allows the driver to skip certain manual cache flushes (L2 flush optimization) without risking memory corruption because the hardware ensures the most recent data is visible to both entities. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260305121902.1892593-8-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-23drm/xe/xe3p_lpg: flush shrinker bo cachelines manuallyTejas Upadhyay
XA, new pat_index introduced post xe3p_lpg, is memory shared between the CPU and GPU is treated differently from other GPU memory when the Media engine is power-gated. XA is *always* flushed, like at the end-of-submssion (and maybe other places), just that internally as an optimisation hw doesn't need to make that a full flush (which will also include XA) when Media is off/powergated, since it doesn't need to worry about GT caches vs Media coherency, and only CPU vs GPU coherency, so can make that flush a targeted XA flush, since stuff tagged with XA now means it's shared with the CPU. The main implication is that we now need to somehow flush non-XA before freeing system memory pages, otherwise dirty cachelines could be flushed after the free (like if Media suddenly turns on and does a full flush) V4: Add comments for L2 flush path V3(Thomas/MattA/MattR): Restrict userptr with non-xa, then no need to flush manually V2(MattA): Expand commit description Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260305121902.1892593-7-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-23drm/xe/vf: Improve getting clean NULL contextMichal Wajdeczko
There is a small risk that when fetching a NULL context image the VF may get a tweaked context image prepared by another VF that was previously running on the engine before the GuC scheduler switched the VFs. To avoid that risk, without forcing GuC scheduler to trigger costly engine reset on every VF switch, use a watchdog mechanism that when configured with impossible condition, triggers an interrupt, which GuC will handle by doing an engine reset. Also adjust job size to account for additional dwords with watchdog setup. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patch.msgid.link/20260303201354.17948-4-michal.wajdeczko@intel.com
2026-03-23drm/xe: Add MI_SEMAPHORE_WAIT command definitionMichal Wajdeczko
This command supports memory based Semaphore WAIT. Memory based semaphores will be used for synchronization between the Producer and the Consumer contexts. Producer and Consumer Contexts could be running on different engines or on the same engine inside GT. Bspec: 45749, 60244 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patch.msgid.link/20260303201354.17948-3-michal.wajdeczko@intel.com
2026-03-23drm/xe: Add PR_CTR_CTRL/THRSH register definitionsMichal Wajdeczko
The Watchdog Counter Control and Watchdog Counter Threshold registers are needed for watchdog programming. This watchdog will generate the "Media Hang Notify" interrupt. Bspec: 45999, 46000 Bspec: 60373, 60374 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patch.msgid.link/20260303201354.17948-2-michal.wajdeczko@intel.com
2026-03-23drm/xe/pf: Fix use-after-free in migration restoreMichał Winiarski
When an error is returned from xe_sriov_pf_migration_restore_produce(), the data pointer is not set to NULL, which can trigger use-after-free in subsequent .write() calls. Set the pointer to NULL upon error to fix the problem. Fixes: 1ed30397c0b92 ("drm/xe/pf: Add support for encap/decap of bitstream to/from packet") Reported-by: Sebastian Österlund <sebastian.osterlund@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7230 Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260217154118.176902-1-michal.winiarski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2026-03-20drm/xe: Fix format specifier for printing pointer differencesNathan Chancellor
GCC and clang warn (or error with CONFIG_WERROR=y / W=e) several times when targeting 32-bit platforms along the lines of drivers/gpu/drm/xe/xe_lrc.c: In function 'dump_mi_command': drivers/gpu/drm/xe/xe_lrc.c:1921:40: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'int' [-Werror=format=] 1921 | drm_printf(p, "LRC[%#5lx] = [%#010x] MI_NOOP (%d dwords)\n", | ~~~~^ | | | long unsigned int | %#5x 1922 | dw - num_noop - start, inst_header, num_noop); | ~~~~~~~~~~~~~~~~~~~~~ | | | int drivers/gpu/drm/xe/xe_lrc.c:1922:7: error: format specifies type 'unsigned long' but the argument has type '__ptrdiff_t' (aka 'int') [-Werror,-Wformat] 1921 | drm_printf(p, "LRC[%#5lx] = [%#010x] MI_NOOP (%d dwords)\n", | ~~~~~ | %#5tx 1922 | dw - num_noop - start, inst_header, num_noop); | ^~~~~~~~~~~~~~~~~~~~~ Use the '%tx' specifier for printing pointer differences, which clears up the warnings for 32-bit platforms while introducing no regressions for 64-bit platforms. Fixes: 65fcf19cb36b ("drm/xe: Include running dword offset in default_lrc dumps") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260316-drm-xe-fix-32-bit-wformat-ptrdiff-v1-1-0108b10b2b6b@kernel.org Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-20drm/xe: Extend Wa_14026781792 for xe3lpgNitin Gote
Wa_14026781792 applies to all graphics versions from 30.00 through 35.10 (inclusive). Since there are no IPs between 30.05 and 35.10, consolidate the RTP rules into a single GRAPHICS_VERSION_RANGE(3000, 3510). v2: (Matt) - There are no IPs between 30.05 and 35.10 either, So, consolidate this into a single GRAPHICS_VERSION_RANGE(3000, 3510) - Also move it up to the top part of the table Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260317080059.1275116-2-nitin.r.gote@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-20drm/xe/xe3p_lpg: Add Wa_16029437861Varun Gupta
Wa_16029437861 requires disabling COAMA atomics by setting bit 22 (SQ_DISABLE_COAMA) of L3SQCREG2 (0xb104) for Xe3p_LPG graphics version 35.10 stepping A0..B0. This bit is already set by the existing Wa_14026144927 entry, so add the new WA ID to the same implementation. Signed-off-by: Varun Gupta <varun.gupta@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patch.msgid.link/20260317040447.1792687-1-varun.gupta@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2026-03-19drm/xe: Fix missing runtime PM reference in ccs_mode_storeSanjay Yadav
ccs_mode_store() calls xe_gt_reset() which internally invokes xe_pm_runtime_get_noresume(). That function requires the caller to already hold an outer runtime PM reference and warns if none is held: [46.891177] xe 0000:03:00.0: [drm] Missing outer runtime PM protection [46.891178] WARNING: drivers/gpu/drm/xe/xe_pm.c:885 at xe_pm_runtime_get_noresume+0x8b/0xc0 Fix this by protecting xe_gt_reset() with the scope-based guard(xe_pm_runtime)(xe), which is the preferred form when the reference lifetime matches a single scope. v2: - Use scope-based guard(xe_pm_runtime)(xe) (Shuicheng) - Update commit message accordingly Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7593 Fixes: 480b358e7d8e ("drm/xe: Do not wake device during a GT reset") Cc: <stable@vger.kernel.org> # v6.19+ Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260313071608.3459480-2-sanjay.kumar.yadav@intel.com (cherry picked from commit 7937ea733f79b3f25e802a0c8360bf7423856f36) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe: Open-code GGTT MMIO access protectionMatthew Brost
GGTT MMIO access is currently protected by hotplug (drm_dev_enter), which works correctly when the driver loads successfully and is later unbound or unloaded. However, if driver load fails, this protection is insufficient because drm_dev_unplug() is never called. Additionally, devm release functions cannot guarantee that all BOs with GGTT mappings are destroyed before the GGTT MMIO region is removed, as some BOs may be freed asynchronously by worker threads. To address this, introduce an open-coded flag, protected by the GGTT lock, that guards GGTT MMIO access. The flag is cleared during the dev_fini_ggtt devm release function to ensure MMIO access is disabled once teardown begins. Cc: stable@vger.kernel.org Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-8-zhanjun.dong@intel.com (cherry picked from commit 4f3a998a173b4325c2efd90bdadc6ccd3ad9a431) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe/lrc: Fix uninitialized new_ts when capturing context timestampUmesh Nerlige Ramappa
Getting engine specific CTX TIMESTAMP register can fail. In that case, if the context is active, new_ts is uninitialized. Fix that case by initializing new_ts to the last value that was sampled in SW - lrc->ctx_timestamp. Flagged by static analysis. v2: Fix new_ts initialization (Ashutosh) Fixes: bb63e7257e63 ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20260312125308.3126607-2-umesh.nerlige.ramappa@intel.com (cherry picked from commit 466e75d48038af252187855058a7a9312db9d2f8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe/oa: Allow reading after disabling OA streamAshutosh Dixit
Some OA data might be present in the OA buffer when OA stream is disabled. Allow UMD's to retrieve this data, so that all data till the point when OA stream is disabled can be retrieved. v2: Update tail pointer after disable (Umesh) Fixes: efb315d0a013 ("drm/xe/oa/uapi: Read file_operation") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa<umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260313053630.3176100-1-ashutosh.dixit@intel.com (cherry picked from commit 4ff57c5e8dbba23b5457be12f9709d5c016da16e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe: Skip over non leaf pte for PRL generationBrian Nguyen
The check using xe_child->base.children was insufficient in determining if a pte was a leaf node. So explicitly skip over every non-leaf pt and conditionally abort if there is a scenario where a non-leaf pt is interleaved between leaf pt, which results in the page walker skipping over some leaf pt. Note that the behavior being targeted for abort is PD[0] = 2M PTE PD[1] = PT -> 512 4K PTEs PD[2] = 2M PTE results in abort, page walker won't descend PD[1]. With new abort, ensuring valid PRL before handling a second abort. v2: - Revert to previous assert. - Revised non-leaf handling for interleaf child pt and leaf pte. - Update comments to specifications. (Stuart) - Remove unnecessary XE_PTE_PS64. (Matthew B) v3: - Modify secondary abort to only check non-leaf PTEs. (Matthew B) Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind") Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260305171546.67691-6-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 1d123587525db86cc8f0d2beb35d9e33ca3ade83) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe/guc: Ensure CT state transitions via STOP before DISABLEDZhanjun Dong
The GuC CT state transition requires moving to the STOP state before entering the DISABLED state. Update the driver teardown sequence to make the proper state machine transitions. Fixes: ee4b32220a6b ("drm/xe/guc: Add devm release action to safely tear down CT") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-6-zhanjun.dong@intel.com (cherry picked from commit dace8cb0032f57ea67c87b3b92ad73c89dd2db44) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe: Trigger queue cleanup if not in wedged mode 2Matthew Brost
The intent of wedging a device is to allow queues to continue running only in wedged mode 2. In other modes, queues should initiate cleanup and signal all remaining fences. Fix xe_guc_submit_wedge to correctly clean up queues when wedge mode != 2. Fixes: 7dbe8af13c18 ("drm/xe: Wedge the entire device") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-4-zhanjun.dong@intel.com (cherry picked from commit e25ba41c8227c5393c16e4aab398076014bd345f) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe: Forcefully tear down exec queues in GuC submit finiMatthew Brost
In GuC submit fini, forcefully tear down any exec queues by disabling CTs, stopping the scheduler (which cleans up lost G2H), killing all remaining queues, and resuming scheduling to allow any remaining cleanup actions to complete and signal any remaining fences. Split guc_submit_fini into device related and software only part. Using device-managed and drm-managed action guarantees the correct ordering of cleanup. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-3-zhanjun.dong@intel.com (cherry picked from commit a6ab444a111a59924bd9d0c1e0613a75a0a40b89) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe: Always kill exec queues in xe_guc_submit_pause_abortMatthew Brost
xe_guc_submit_pause_abort is intended to be called after something disastrous occurs (e.g., VF migration fails, device wedging, or driver unload) and should immediately trigger the teardown of remaining submission state. With that, kill any remaining queues in this function. Fixes: 7c4b7e34c83b ("drm/xe/vf: Abort VF post migration recovery on failure") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-2-zhanjun.dong@intel.com (cherry picked from commit 78f3bf00be4f15daead02ba32d4737129419c902) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19drm/xe/guc: Fail immediately on GuC load errorDaniele Ceraolo Spurio
By using the same variable for both the return of poll_timeout_us and the return of the polled function guc_wait_ucode, the return value of the latter is overwritten and lost after exiting the polling loop. Since guc_wait_ucode returns -1 on GuC load failure, we lose that information and always continue as if the GuC had been loaded correctly. This is fixed by simply using 2 separate variables. Fixes: a4916b4da448 ("drm/xe/guc: Refactor GuC load to use poll_timeout_us()") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260303001732.2540493-2-daniele.ceraolospurio@intel.com (cherry picked from commit c85ec5c5753a46b5c2aea1292536487be9470ffe) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-19Merge drm/drm-next into drm-xe-nextThomas Hellström
Bring in series "drm/{i915,xe}: sort out step enums between the drivers" that was merged through i915. Link: https://lore.kernel.org/all/cover.1772635152.git.jani.nikula@intel.com Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-03-18drm/xe: Fix missing runtime PM reference in ccs_mode_storeSanjay Yadav
ccs_mode_store() calls xe_gt_reset() which internally invokes xe_pm_runtime_get_noresume(). That function requires the caller to already hold an outer runtime PM reference and warns if none is held: [46.891177] xe 0000:03:00.0: [drm] Missing outer runtime PM protection [46.891178] WARNING: drivers/gpu/drm/xe/xe_pm.c:885 at xe_pm_runtime_get_noresume+0x8b/0xc0 Fix this by protecting xe_gt_reset() with the scope-based guard(xe_pm_runtime)(xe), which is the preferred form when the reference lifetime matches a single scope. v2: - Use scope-based guard(xe_pm_runtime)(xe) (Shuicheng) - Update commit message accordingly Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7593 Fixes: 480b358e7d8e ("drm/xe: Do not wake device during a GT reset") Cc: <stable@vger.kernel.org> # v6.19+ Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260313071608.3459480-2-sanjay.kumar.yadav@intel.com
2026-03-17drm/xe/lrc: Fix uninitialized new_ts when capturing context timestampUmesh Nerlige Ramappa
Getting engine specific CTX TIMESTAMP register can fail. In that case, if the context is active, new_ts is uninitialized. Fix that case by initializing new_ts to the last value that was sampled in SW - lrc->ctx_timestamp. Flagged by static analysis. v2: Fix new_ts initialization (Ashutosh) Fixes: bb63e7257e63 ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20260312125308.3126607-2-umesh.nerlige.ramappa@intel.com
2026-03-17drm/xe/oa: Allow reading after disabling OA streamAshutosh Dixit
Some OA data might be present in the OA buffer when OA stream is disabled. Allow UMD's to retrieve this data, so that all data till the point when OA stream is disabled can be retrieved. v2: Update tail pointer after disable (Umesh) Fixes: efb315d0a013 ("drm/xe/oa/uapi: Read file_operation") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa<umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260313053630.3176100-1-ashutosh.dixit@intel.com
2026-03-17platform/x86/intel/vsec: Switch exported helpers from pci_dev to deviceDavid E. Box
Preparatory refactor for ACPI-enumerated PMT endpoints. Several exported PMT/VSEC interfaces and structs carried struct pci_dev * even though callers only need a generic struct device. Move those to struct device * so the same APIs work for PCI and ACPI parents. Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: https://patch.msgid.link/20260313015202.3660072-5-david.e.box@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-03-17drm/xe/compat: remove intel_step_name macroJani Nikula
As there are no more compat users left for intel_step_name(), remove the macro and use the more direct include for the enumerations. Reviewed-by: Luca Coelho <luciano.coelho@intel.com> Link: https://patch.msgid.link/816e3f6dda0a112392e8f8ccff820a81aff63f32.1773663208.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-03-17Merge tag 'drm-intel-next-2026-03-16' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next [airlied: fixed conflict with xe tree] drm/i915 feature pull for v7.1: Features and functionality: - C10/C20/LT PHY PLL divider verification (Mika) - Use trans push mechanism to generate PSR frame change event on LNL+ (Jouni) - Account for DSC bubble overhead for horizontal slices (Ankit, Chaitanya) Refactoring and cleanups: - Refactor DP DSC slice config computation (Imre) - Use GVT versions of register helper macros for GVT MMIO table (Ankit) - C10/C20/LT PHY PLL computation refactoring (Mika) - VGA decode refactoring and related fixes/cleanups (Ville) - Move DSB buffer buffer implementation to display parent interface (Jani) - Move error interrupt capture to display irq snapshot (Jani) - Move pcode calls to display parent interface (Jani) - Reduce GVT dependency on display headers (Jani) - Compute config and mode valid refactoring for DSC (Ankit) - Stop using i915 core register headers in display (Uma) - Refactor DPT, move i915 parts to display parent interface (Jani) - Refactor gen2-4 overlay, move to display parent interface (Ville) - Refactor masked field register macro helpers, move to shared headers (Jani) - Convert a number of workaround checks to the new workaround framework (Luca) - Refactor and move frontbuffer calls to display parent interface (Jani) - Add VMA calls to display parent interface (Jani) - Refactor stolen memory allocation decisions (Vinod, Ville) - Clean up and unify workqueue usage (Marco Crivellari) - Preparation for UHBR DP tunnels (Imre) - Allow DSC passthrough modes during DP MST mode validation (Imre) - Move framebuffer bo interface to display parent interface (Jani) Fixes: - Plenty of DP SST HPD IRQ handling fixes (Imre) - DP AUX backlight and luminance control fixes (Suraj) - Respect VBT pipe joiner disable for eDP (Ankit) - Do not use CASF with joiner (Nemesa) - Clear C10/C20 PHY response read and error bit to avoid PHY hangs (Suraj) - Xe3p_LPD DMG clock gating, CDCLK, port sync workarounds (Suraj, Gustavo, Mitul) - Fix GVT error path (Michał) - Handle errors on DP DSC receiver cap reads (Suraj) - DSS clock gating workaround on MTL+ to avoid DSC corruption (Mika) - Skip state verification for LT PHY in TBT mode (Suraj) - Fix NULL pointer dereference on suspend when uc firmware not loaded (Rahul Bukte) - Fix an unlikely DMC state related NULL pointer dereference at probe (Imre) - Handle error returns from vga_get_uninterruptible() (Simon Richter) - Increase C10/C20/LT PHY timeouts to include SOC/OS turnaround (Arun) - Fix BIOS FB vs. stolen memory size check (Ville) - Fix LOBF to use computed guardband and set context latency (Ankit) - Handle modeset WW mutex lock failures due to contention properly (Imre) - Fix pipe BPP clamping due to HDR (Imre) - Fix stale state usage in DSC state computation (Imre) - Take HDCP 1.4 vs 2.x into account during link check (Suraj) - Fix forced link retrain handling in MST HPD IRQ handler (Imre) - Remove redundant warning on vcpi < 0 (Jonathan) Core changes: - iopoll: fix function parameter names in read_poll_timeout_atomic() (Randy Dunlap) Merges: - Backmerge drm-next for v7.0-rc1 (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/b14bb0f297b1750816cf5f342bde608e435655fa@intel.com
2026-03-16drm/xe: Skip adding PRL entry to NULL VMABrian Nguyen
NULL VMAs have no corresponding PTE, so skip adding a PRL entry to avoid an unnecessary PRL abort during unbind. Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260305171546.67691-8-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-16drm/xe: Move page reclaim done_handler to own funcBrian Nguyen
Originally, page reclamation is handled by the same fence as tlb invalidation and uses its seqno, so there was no reason to separate out the handlers. However in hindsight, for readability, and possible future changes, it seems more beneficial to move this all out to its own function. Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260305171546.67691-7-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-16drm/xe: Skip over non leaf pte for PRL generationBrian Nguyen
The check using xe_child->base.children was insufficient in determining if a pte was a leaf node. So explicitly skip over every non-leaf pt and conditionally abort if there is a scenario where a non-leaf pt is interleaved between leaf pt, which results in the page walker skipping over some leaf pt. Note that the behavior being targeted for abort is PD[0] = 2M PTE PD[1] = PT -> 512 4K PTEs PD[2] = 2M PTE results in abort, page walker won't descend PD[1]. With new abort, ensuring valid PRL before handling a second abort. v2: - Revert to previous assert. - Revised non-leaf handling for interleaf child pt and leaf pte. - Update comments to specifications. (Stuart) - Remove unnecessary XE_PTE_PS64. (Matthew B) v3: - Modify secondary abort to only check non-leaf PTEs. (Matthew B) Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind") Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260305171546.67691-6-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-16drm/xe: Include running dword offset in default_lrc dumpsMatt Roper
Printing a running dword offset in the default_lrc_* debugfs entries makes it easier for developers to find the right offsets to use in regs/xe_lrc_layout.h and/or compare the default LRC contents against the bspec-documented LRC layout. Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://patch.msgid.link/20260311-default_lrc_offsets-v1-1-58d8ed3aa081@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-16drm/xe/i2c: Assert/Deassert I2C IRQRaag Jadav
I2C IRQ is triggered using virtual wire. Assert/Deassert it in IRQ handler to allow subsequent interrupt generation. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://patch.msgid.link/20260313080438.4166251-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-16drm/{i915,xe}: move framebuffer bo to parent interfaceJani Nikula
Add .framebuffer_init, .framebuffer_fini and .framebuffer_lookup to the bo parent interface. While they're about framebuffers, they're specifically about framebuffer objects, so the bo interface is a good enough fit, and there's no need to add another interface struct. Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/848d32a44bf844cba3d66e44ba9f20bea4a8352d.1773238670.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-03-16drm/{i915, xe}/bo: move display bo calls to parent interfaceJani Nikula
Continue i915 and xe separation from display by moving the bo calls to the display parent interface. Instead of adding all these functions to intel_parent.[ch], reuse the now vacated intel_bo.[ch], and avoid mass renames to calls of these functions. This is similar to intel_display_rpm.[ch]. Make many of the hooks optional to avoid having to implement dummy functions in xe. Indeed now we can remove many of the existing dummy functions. Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/7899eef2ccf0cd603df69099df065226a0df917b.1773238670.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-03-16drm/xe: rename intel_bo.c to xe_display_bo.cJani Nikula
Follow the xe_ prefixed file naming in xe. With xe_bo.[ch] already being a thing in xe core, use xe_display_bo.c. Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/2f73eda5117462407f12113ce096496282ee3fcc.1773238670.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-03-13drm/xe: Open-code GGTT MMIO access protectionMatthew Brost
GGTT MMIO access is currently protected by hotplug (drm_dev_enter), which works correctly when the driver loads successfully and is later unbound or unloaded. However, if driver load fails, this protection is insufficient because drm_dev_unplug() is never called. Additionally, devm release functions cannot guarantee that all BOs with GGTT mappings are destroyed before the GGTT MMIO region is removed, as some BOs may be freed asynchronously by worker threads. To address this, introduce an open-coded flag, protected by the GGTT lock, that guards GGTT MMIO access. The flag is cleared during the dev_fini_ggtt devm release function to ensure MMIO access is disabled once teardown begins. Cc: stable@vger.kernel.org Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-8-zhanjun.dong@intel.com
2026-03-13drm/xe/uc: Drop xe_guc_sanitize in favor of managed cleanupZhanjun Dong
If the firmware fails to load in GT resets the device is wedged also initiating a GuC state cleanup. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-7-zhanjun.dong@intel.com
2026-03-13drm/xe/guc: Ensure CT state transitions via STOP before DISABLEDZhanjun Dong
The GuC CT state transition requires moving to the STOP state before entering the DISABLED state. Update the driver teardown sequence to make the proper state machine transitions. Fixes: ee4b32220a6b ("drm/xe/guc: Add devm release action to safely tear down CT") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-6-zhanjun.dong@intel.com
2026-03-13drm/xe: Use XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET enum instead of magic numberZhanjun Dong
Replace the magic number 2 with the proper enum value XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET for better code readability and maintainability. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-5-zhanjun.dong@intel.com
2026-03-13drm/xe: Trigger queue cleanup if not in wedged mode 2Matthew Brost
The intent of wedging a device is to allow queues to continue running only in wedged mode 2. In other modes, queues should initiate cleanup and signal all remaining fences. Fix xe_guc_submit_wedge to correctly clean up queues when wedge mode != 2. Fixes: 7dbe8af13c18 ("drm/xe: Wedge the entire device") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-4-zhanjun.dong@intel.com
2026-03-13drm/xe: Forcefully tear down exec queues in GuC submit finiMatthew Brost
In GuC submit fini, forcefully tear down any exec queues by disabling CTs, stopping the scheduler (which cleans up lost G2H), killing all remaining queues, and resuming scheduling to allow any remaining cleanup actions to complete and signal any remaining fences. Split guc_submit_fini into device related and software only part. Using device-managed and drm-managed action guarantees the correct ordering of cleanup. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-3-zhanjun.dong@intel.com
2026-03-13drm/xe: Always kill exec queues in xe_guc_submit_pause_abortMatthew Brost
xe_guc_submit_pause_abort is intended to be called after something disastrous occurs (e.g., VF migration fails, device wedging, or driver unload) and should immediately trigger the teardown of remaining submission state. With that, kill any remaining queues in this function. Fixes: 7c4b7e34c83b ("drm/xe/vf: Abort VF post migration recovery on failure") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-2-zhanjun.dong@intel.com
2026-03-13drm/xe/guc: Fail immediately on GuC load errorDaniele Ceraolo Spurio
By using the same variable for both the return of poll_timeout_us and the return of the polled function guc_wait_ucode, the return value of the latter is overwritten and lost after exiting the polling loop. Since guc_wait_ucode returns -1 on GuC load failure, we lose that information and always continue as if the GuC had been loaded correctly. This is fixed by simply using 2 separate variables. Fixes: a4916b4da448 ("drm/xe/guc: Refactor GuC load to use poll_timeout_us()") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260303001732.2540493-2-daniele.ceraolospurio@intel.com
2026-03-12drm/xe/wa: Drop redundant entries for Wa_16021867713 & Wa_14019449301Matt Roper
The Xe2_HPM-specific RTP table entries for Wa_16021867713 and Wa_14019449301 were removed by commit 941f538b0af8 ("drm/xe: Consolidate workaround entries for Wa_16021867713") and commit aa0f0a678370 ("drm/xe: Consolidate workaround entries for Wa_14019449301") in favor of alternate entries earlier in the table that cover a wider range of IP versions. However these Xe2_HPM-specific entries were accidentally resurrected during a backmerge, which causes the Xe driver to complain on probe about two entries trying to program the same registers+bits: <3> [48.491155] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1c3f1c (clear: 00000008, set: 00000008, masked: no, mcr: no): ret=-22 <3> [48.491211] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1d3f1c (clear: 00000008, set: 00000008, masked: no, mcr: no): ret=-22 <3> [48.491225] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1c3f08 (clear: 00000020, set: 00000020, masked: no, mcr: no): ret=-22 <3> [48.491238] xe 0000:03:00.0: [drm] *ERROR* Tile0: GT1: discarding save-restore reg 1d3f08 (clear: 00000020, set: 00000020, masked: no, mcr: no): ret=-22 Re-drop the redundant Xe2_HPM-specific entries to eliminate the dmesg errors. Fixes: 58351f46de26 ("Merge v7.0-rc3 into drm-next") Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7608 Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patch.msgid.link/20260312-wa_merge_fix-v1-1-2ec6607f1e0c@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-03-12Merge drm/drm-next into drm-xe-nextMatthew Brost
Backmerging to bring in 7.00-rc3. Important ahead GPU SVM merging THP support. Signed-off-by: Matthew Brost <matthew.brost@intel.com>
2026-03-12drm/xe: Fix overflow in guc_ct_snapshot_captureMika Kuoppala
snapshot->ctb is u32*, so pointer arithmetic on it scales the byte offset from xe_bo_size() by 4, overshooting the intended start of the g2h portion and writing past the allocated buffer. Fix this by using void * to get the arithmetic right and prevent future mishaps. v2: s/u8/void for memcpy and iosys_map consistency (Matt) Fixes: af3de6cf06f9 ("drm/xe: Split H2G and G2H into separate buffer objects") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260304211728.249104-1-mika.kuoppala@linux.intel.com
2026-03-12drm/xe: implement VM_BIND decompression in vm_bind_ioctlNitin Gote
Implement handling of VM_BIND(..., DECOMPRESS) in xe_vm_bind_ioctl. Key changes: - Parse and record per-op intent (op->map.request_decompress) when the DECOMPRESS flag is present. - Use xe_pat_index_get_comp_en() helper to check if a PAT index has compression enabled via the XE2_COMP_EN bit. - Validate DECOMPRESS preconditions in the ioctl path: - Only valid for MAP ops. - The provided pat_index must select the device's "no-compression" PAT. - Only meaningful on devices with flat CCS and the required XE2+ otherwise return -EOPNOTSUPP. - Use XE_IOCTL_DBG for uAPI sanity checks. - Implement xe_bo_decompress(): For VRAM BOs run xe_bo_move_notify(), reserve one fence slot, schedule xe_migrate_resolve(), and attach the returned fence with DMA_RESV_USAGE_KERNEL. Non-VRAM cases are silent no-ops. - Wire scheduling into vma_lock_and_validate() so VM_BIND will schedule decompression when request_decompress is set. - Handle fault-mode VMs by performing decompression synchronously during the bind process, ensuring that the resolve is completed before the bind finishes. This schedules an in-place GPU resolve (xe_migrate_resolve) for decompression. Compute PR: https://github.com/intel/compute-runtime/pull/898 IGT PR: https://patchwork.freedesktop.org/series/157553/ v7: Rebase on latest drm-tip and add compute and igt pr info v6: (Matt Auld) - Rebase as xe_pat_index_get_comp_en() is added in separate patch - Drop vm param from xe_bo_decompress(), instead of it extract tile from bo - Reject decompression on igpu instead of silent skipping to avoid any failure on Xe2+igpu as xe_device_has_flat_ccs() can sometimes be false on igpu due some setting in the BIOS to turn off compression on igpu. - Nits v5: (Matt) - Correct the condition check of xe_pat_index_get_comp_en v4: (Matt) - Introduce xe_pat_index_get_comp_en(), which checks XE2_COMP_EN for the pat_index - .interruptible should be true, everything else false v3: (Matt) - s/xe_bo_schedule_decompress/xe_bo_decompress - skip the decrompress step if the BO isn't in VRAM - start/size not required in xe_bo_schedule_decompress - Use xe_bo_move_notify instead of xe_vm_invalidate_vma with respect to invalidation. - Nits v2: - Move decompression work out of vm_bind ioctl. (Matt) - Put that work in a small helper at the BO/migrate layer invoke it from vma_lock_and_validate which already runs under drm_exec. - Move lightweight checks to vm_bind_ioctl_check_args (Matthew Auld) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260304123758.3050386-8-nitin.r.gote@intel.com