| Age | Commit message (Collapse) | Author |
|
Introduce an internal __xe_migrate_copy(..., is_vram_resolve) path and
expose a small wrapper xe_migrate_resolve() that calls it with
is_vram_resolve=true.
For resolve/decompression operations we must ensure the copy code uses
the compression PAT index when appropriate; this change centralizes that
behavior and allows callers to schedule a resolve (decompress) operation
via the migrate API.
v3: Fix kernel-doc warnings
v2: (Matt)
- Simplify xe_migrate_resolve(), use single BO/resource;
remove copy_only_ccs argument as it's always false.
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260304123758.3050386-7-nitin.r.gote@intel.com
|
|
Avoid code duplication by extracting the logic for selection of the
correct PAT_PTA entry for a GT into function gt_pta_entry() and using
that function whenever necessary.
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20260303-pat-gt_pta_entry-v1-1-0dee8e1e7bd9@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
Requested by Maxime Ripard for drm-misc-next because renesas people need
fb797a70108f ("drm: renesas: rz-du: mipi_dsi: Set DSI divider").
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
|
|
Move the get/put/ref/flush_for_display calls to the display parent
interface.
For i915, move the hooks next to the other i915 core frontbuffer code in
i915_gem_object_frontbuffer.c. For xe, add new file xe_frontbuffer.c for
the same.
Note: The intel_frontbuffer_flush() calls from
i915_gem_object_frontbuffer.c will partially route back to i915 core via
the parent interface. This is less than stellar.
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patch.msgid.link/f69b967ed82bbcfd60ffa77ba197b26a1399f09f.1772475391.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
possible
Now that the two-pass notifier flow uses xe_vma_userptr_do_inval() for
the fence-wait + TLB-invalidate work, extend it to support a further
deferred TLB wait:
- xe_vma_userptr_do_inval(): when the embedded finish handle is free,
submit the TLB invalidation asynchronously (xe_vm_invalidate_vma_submit)
and return &userptr->finish so the mmu_notifier core schedules a third
pass. When the handle is occupied by a concurrent invalidation, fall
back to the synchronous xe_vm_invalidate_vma() path.
- xe_vma_userptr_complete_tlb_inval(): new helper called from
invalidate_finish when tlb_inval_submitted is set. Waits for the
previously submitted batch and unmaps the gpusvm pages.
xe_vma_userptr_invalidate_finish() dispatches between the two helpers
via tlb_inval_submitted, making the three possible flows explicit:
pass1 (fences pending) -> invalidate_finish -> do_inval (sync TLB)
pass1 (fences done) -> do_inval -> invalidate_finish
-> complete_tlb_inval (deferred TLB)
pass1 (finish occupied) -> do_inval (sync TLB, inline)
In multi-GPU scenarios this allows TLB flushes to be submitted on all
GPUs in one pass before any of them are waited on.
Also adds xe_vm_invalidate_vma_submit() which submits the TLB range
invalidation without blocking, populating a xe_tlb_inval_batch that
the caller waits on separately.
v3:
- Add locking asserts and notifier state asserts (Matt Brost)
- Update the locking documentation of the notifier
state members (Matt Brost)
- Remove unrelated code formatting changes (Matt Brost)
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260305093909.43623-5-thomas.hellstrom@linux.intel.com
|
|
xe_vm_range_tilemask_tlb_inval() submits TLB invalidation requests to
all GTs in a tile mask and then immediately waits for them to complete
before returning. This is fine for the existing callers, but a
subsequent patch will need to defer the wait in order to overlap TLB
invalidations across multiple VMAs.
Introduce xe_tlb_inval_range_tilemask_submit() and
xe_tlb_inval_batch_wait() in xe_tlb_inval.c as the submit and wait
halves respectively. The batch of fences is carried in the new
xe_tlb_inval_batch structure. Remove xe_vm_range_tilemask_tlb_inval()
and convert all three call sites to the new API.
v3:
- Don't wait on TLB invalidation batches if the corresponding batch
submit returns an error. (Matt Brost)
- s/_batch/batch/ (Matt Brost)
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260305093909.43623-4-thomas.hellstrom@linux.intel.com
|
|
In multi-GPU scenarios, asynchronous GPU job latency is a bottleneck if
each notifier waits for its own GPU before returning. The two-pass
mmu_interval_notifier infrastructure allows deferring the wait to a
second pass, so all GPUs can be signalled in the first pass before
any of them are waited on.
Convert the userptr invalidation to use the two-pass model:
Use invalidate_start as the first pass to mark the VMA for repin and
enable software signalling on the VM reservation fences to start any
gpu work needed for signaling. Fall back to completing the work
synchronously if all fences are already signalled, or if a concurrent
invalidation is already using the embedded finish structure.
Use invalidate_finish as the second pass to wait for the reservation
fences to complete, invalidate the GPU TLB in fault mode, and unmap
the gpusvm pages.
Embed a struct mmu_interval_notifier_finish in struct xe_userptr to
avoid dynamic allocation in the notifier callback. Use a finish_inuse
flag to prevent two concurrent invalidations from using it
simultaneously; fall back to the synchronous path for the second caller.
v3:
- Add locking asserts in notifier components (Matt Brost)
- Clean up newlines (Matt Brost)
- Update the userptr notifier state member locking documentation
(Matt Brost)
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260305093909.43623-3-thomas.hellstrom@linux.intel.com
|
|
There are higher level sleep states that will cause RC6 state readout to
come back with an "in-reset" value. That is the case with NVL-P. As
those states are only possible if the GT is already in C6, let's just
translate the "reset value" into C6 when doing the readout.
Bspec: 67651
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260309-extra-nvl-p-enabling-patches-v5-7-be9c902ee34e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
Wa_16028780921 involves writing to a register that is locked by firmware
prior to driver loading and doesn't have any effect if implemented by
the KMD. Since the implementation of the workaround actually belongs
the firmware, just drop the ineffective implementation by the KMD.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260309-extra-nvl-p-enabling-patches-v5-6-be9c902ee34e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
Implement the KMD part of Wa_14026539277, which applies to NVL-P A0.
The KMD implementation is just one component of the workaround, which
also depends on Pcode to implement its part in order to be complete.
v2:
- Add FUNC(xe_rtp_match_not_sriov_vf) to skip applying the workaround
to SRIOV VFs. (Matt)
v3:
- Make Wa_14026539277 a device workaround instead of a GT workaround.
(Matt)
v4:
- Drop FUNC(xe_rtp_match_not_sriov_vf) and use a direct check with
IS_SRIOV_VF() in the workaround implementation. (Matt)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com> # v3
Link: https://patch.msgid.link/20260309-extra-nvl-p-enabling-patches-v5-5-be9c902ee34e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
Add support for matching platform-level stepping, which will be used for
an upcoming NVL-P workaround.
As support for reading platform-level stepping information is added only
as needed in the driver, add a warning when the rule finds a STEP_NONE
value, which is an indication that the driver is missing such a support.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260309-extra-nvl-p-enabling-patches-v5-4-be9c902ee34e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
There will be a NVL-P workaround for which we will need to know the
platform-level stepping information in order to decide whether to apply
it or not.
While NVL-P has a nice mapping between the PCI revid and our symbolic
stepping enumeration, not all platforms are like that: (i) Some
platforms will have a single PCI revid used for a set platform level
steppings (ii) and some might even require specific mappings.
To make things simpler, let's include stepping information in the device
info only on demand, for those platforms where it is needed for
workaround checks.
v2:
- Call xe_step_platform_get() very early, to allow device workarounds
to use it in early stages of device initialization. (Matt)
Bspec: 74201
Reviewed-by: Matt Roper <matthew.d.roper@intel.com> # v1
Link: https://patch.msgid.link/20260309-extra-nvl-p-enabling-patches-v5-3-be9c902ee34e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
The macros IS_PLATFORM_STEP() and IS_SUBPLATFORM_STEP() are unused since
commit 87c299fa3a97 ("drm/xe/guc: Port Wa_14014475959 to xe_wa and fix
it") and commit 63bbd800ff01 ("drm/xe/guc: Port
Wa_22012727170/Wa_22012727685 to xe_wa"), respectively, and we can drop
them now. Furthermore, in upcoming changes we will add logic to read
platform-level step information from PCI RevID and keeping those macros
around would potentially cause confusion.
v2:
- Cite commits that made the macros unused. (Matt)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260309-extra-nvl-p-enabling-patches-v5-2-be9c902ee34e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
In an upcoming change, we will add a member to struct xe_step_info to
represent the platform-level stepping. As such, we should stop assigning
the value returned by functions xe_step_pre_gmdid_get() and
xe_step_gmdid_get() directly to xe->info.step.
Since there are no other users for those functions, let's simply update
them to modify xe->info.step directly.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260309-extra-nvl-p-enabling-patches-v5-1-be9c902ee34e@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
|
|
Similar to i915's commit cebc13de7e704b1355bea208a9f9cdb042c74588
("drm/i915: Whitelist COMMON_SLICE_CHICKEN3 for UMD access"), except
that instead of putting the register on the allowlist for UMD to
program, the KMD is doing the programming at context initialization
based on a queue creation flag.
This is a recommended tuning setting for both gen12 and Xe_HP
platforms.
If a render queue is created with
DRM_XE_EXEC_QUEUE_SET_STATE_CACHE_PERF_FIX, COMMON_SLICE_CHICKEN3 will
be programmed at initialization to enable the render color cache to
key with BTP+BTI (binding table pool + binding table entry) instead of
just BTI (binding table entry). This enables the UMD to avoid emitting
render-target-cache-flush + stall-at-pixel-scoreboard every time a
binding table entry pointing to a render target is changed.
v2: Use xe_lrc_write_ring()
v3: Update xe_query.c to report availability
v4: Rename defines to add DISABLE_
v5: update commit message
v6: rebase
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39982
Bspec: 73993, 73994, 72161, 31870, 68331
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patch.msgid.link/20260306075504.1288676-1-lionel.g.landwerlin@intel.com
|
|
Add GT workaround Wa_14026578760 for graphics versions 35.10, 35.11
and media version 35.03.
Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260309063923.4031933-1-varun.gupta@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
|
|
Convert existing loops with Coccinelle via the following semantic patch:
@@
identifier GT, XE, ID;
iterator name for_each_gt, for_each_gt_with_type;
@@
- for_each_gt(GT, XE, ID) {
+ for_each_gt_with_type(GT, XE, ID, BIT(XE_GT_TYPE_MAIN)) {
- if (xe_gt_is_media_type(GT))
- continue;
...
}
@@
identifier GT, XE, ID;
iterator name for_each_gt, for_each_gt_with_type;
@@
- for_each_gt(GT, XE, ID) {
+ for_each_gt_with_type(GT, XE, ID, BIT(XE_GT_TYPE_MEDIA)) {
- if (xe_gt_is_main_type(GT))
- continue;
...
}
@@
identifier GT, XE, ID;
iterator name for_each_gt, for_each_gt_with_type;
@@
- for_each_gt(GT, XE, ID) {
+ for_each_gt_with_type(GT, XE, ID, BIT(XE_GT_TYPE_MAIN)) {
- if (!xe_gt_is_main_type(GT))
- continue;
...
}
@@
identifier GT, XE, ID;
iterator name for_each_gt, for_each_gt_with_type;
@@
- for_each_gt(GT, XE, ID) {
+ for_each_gt_with_type(GT, XE, ID, BIT(XE_GT_TYPE_MEDIA)) {
- if (xe_gt_is_media_type(GT))
- continue;
...
}
No functional change expected.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260305-gt-type-loops-v1-2-aa42e9fc3f06@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
There are a couple places in the driver today that have GT loops that
only need to operate on a specific type of GT. E.g.,
for_each_gt(...) {
if (xe_gt_is_media_type(gt))
continue;
...
}
Some upcoming development is expected to utilize this pattern a bit more
widely, so add a dedicated iterator that allows looping over specific GT
type(s).
Note that this iterator uses a mask for the "type" parameter rather than
a direct value match. That's probably a bit overkill for now given that
there are only two possible types of GTs, but if additional types of GTs
ever show up in the future, this approach will fit more naturally and
allow cases where we might want to loop over a subset of the possible
types, or specifically mask off one single type.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260305-gt-type-loops-v1-1-aa42e9fc3f06@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Currently xe_migrate_prepare_vm() does three things.
1. Allocates pt_bo for migrate context.
2. Initializes pt_bo with actual pte details.
3. Initializes sa_manager for migrate context.
Split these implementations in their own functions for better
maintainability.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260303101913.3576481-1-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
GuCRC should not be disabled in xe_guc_stop_prepare() as C6 is a
prerequisite for s0ix and s2idle. This is a regression caused by
the patch below.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7510
Fixes: 40a684f91d26 ("drm/xe: Decouple GuC RC code from xe_guc_pc")
Cc: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patch.msgid.link/20260303181416.3880937-1-vinay.belgaumkar@intel.com
|
|
Report the SoC nonfatal/fatal hardware error and update the counters.
$ sudo ynl --family drm_ras --do get-error-counter \
--json '{"node-id":0, "error-id":2}'
{'error-id': 2, 'error-name': 'soc-internal', 'error-value': 0}
Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260304074412.464435-12-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
PVC supports GT error reporting via vector registers along with
error status register. Add support to report these errors and
update respective counters. Incase of Subslice error reported
by vector register, process the error status register
for applicable bits.
The counter is embedded in the xe drm ras structure and is
exposed to the userspace using the drm_ras generic netlink
interface.
$ sudo ynl --family drm_ras --do get-error-counter \
--json '{"node-id":0, "error-id":1}'
{'error-id': 1, 'error-name': 'core-compute', 'error-value': 0}
Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260304074412.464435-11-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Initialize DRM RAS in hw error init. Map the UAPI error severities
with the hardware error severities and refactor file.
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260304074412.464435-10-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Allocate correctable, uncorrectable nodes for every xe device. Each node
contains error component, counters and respective query counter functions.
Add basic functionality to create and register drm nodes.
Below operations can be performed using Generic netlink DRM RAS interface:
1) List Nodes:
$ sudo ynl --family drm_ras --dump list-nodes
[{'device-name': '0000:03:00.0',
'node-id': 0,
'node-name': 'correctable-errors',
'node-type': 'error-counter'},
{'device-name': '0000:03:00.0',
'node-id': 1,
'node-name': 'uncorrectable-errors',
'node-type': 'error-counter'}]
2) Get Error counters:
$ sudo ynl --family drm_ras --dump get-error-counter --json '{"node-id":0}'
[{'error-id': 1, 'error-name': 'core-compute', 'error-value': 0},
{'error-id': 2, 'error-name': 'soc-internal', 'error-value': 0}]
3) Get specific Error counter:
$ sudo ynl --family drm_ras --do get-error-counter --json '{"node-id":0, "error-id":1}'
{'error-id': 1, 'error-name': 'core-compute', 'error-value': 0}
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260304074412.464435-9-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add kernel doc to all exported functions that do not have
a kernel doc in xe_exec_queue.c.
v2: mention multi-lrc in comment (Matt Brost)
Assisted-by: Claude 4.5 Sonnet
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260303223040.140504-2-niranjana.vishwanathapura@intel.com
Link: https://patch.msgid.link/20260303223040.140504-2-niranjana.vishwanathapura@intel.com
|
|
When check_bo_args_are_sane() validation fails, jump to the new
free_vmas cleanup label to properly free the allocated resources.
This ensures proper cleanup in this error path.
Fixes: 293032eec4ba ("drm/xe/bo: Update atomic_access attribute on madvise")
Cc: stable@vger.kernel.org # v6.18+
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260223175145.1532801-1-varun.gupta@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit 29bd06faf727a4b76663e4be0f7d770e2d2a7965)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Free the newly allocated entry when xa_store() fails to avoid a memory
leak on the error path.
v2: use goto fail_free. (Bala)
Fixes: e5283bd4dfec ("drm/xe/reg_sr: Remove register pool")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260204172810.1486719-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 6bc6fec71ac45f52db609af4e62bdb96b9f5fadb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Wa_16025250150 asks us to set five register fields of the register to
0x1 each. However we were just OR'ing this into the existing register
value (which has a default of 0x4 for each nibble-sized field) resulting
in final field values of 0x5 instead of the desired 0x1. Correct the
RTP programming (use FIELD_SET instead of SET) to ensure each field is
assigned to exactly the value we want.
Cc: Aradhya Bhatia <aradhya.bhatia@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: stable@vger.kernel.org # v6.16+
Fixes: 7654d51f1fd8 ("drm/xe/xe2hpg: Add Wa_16025250150")
Reviewed-by: Ngai-Mint Kwan <ngai-mint.kwan@linux.intel.com>
Link: https://patch.msgid.link/20260227164341.3600098-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit d139209ef88e48af1f6731cd45440421c757b6b5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
xe_gsc_proxy_remove undoes what is done in both xe_gsc_proxy_init and
xe_gsc_proxy_start; however, if we fail between those 2 calls, it is
possible that the HW forcewake access hasn't been initialized yet and so
we hit errors when the cleanup code tries to write GSC register. To
avoid that, split the cleanup in 2 functions so that the HW cleanup is
only called if the HW setup was completed successfully.
Since the HW cleanup (interrupt disabling) is now removed from
xe_gsc_proxy_remove, the cleanup on error paths in xe_gsc_proxy_start
must be updated to disable interrupts before returning.
Fixes: ff6cd29b690b ("drm/xe: Cleanup unwind of gt initialization")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260220225308.101469-1-zhanjun.dong@intel.com
(cherry picked from commit 2b37c401b265c07b46408b5cb36a4b757c9b5060)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add a shared header that's used by i915, xe, and i915 display.
This allows us to drop the compat-i915-headers/i915_reg_defs.h include
from xe_reg_defs.h. All the register macro helpers were subtly pulled in
from i915 to all of xe through this.
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/fcd70f3317755bf98a6e7ae88974aa8ba06efd1e.1772042022.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Using REG_MASKED_FIELD_ENABLE() and REG_MASKED_FIELD_DISABLE() is more
obvious to the reader than having the ternary expression inside
REG_MASKED_FIELD().
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/a9b0151d82b1622daa0625fc8ea2c41d233e4318.1772042022.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The underscore prefixed masked field helper names aren't great. Rename
them REG_MASKED_FIELD(), REG_MASKED_FIELD_ENABLE(), and
REG_MASKED_FIELD_DISABLE(). This is more in line with the existing
REG_FIELD_PREP() etc. helpers, and using "field" instead of "bit" is
more accurate for the functionality.
This is done with:
sed -i 's/_MASKED_FIELD/REG_MASKED_FIELD/g' $(git grep -wl _MASKED_FIELD)
sed -i 's/_MASKED_BIT_ENABLE/REG_MASKED_FIELD_ENABLE/g' $(git grep -wl _MASKED_BIT_ENABLE)
sed -i 's/_MASKED_BIT_DISABLE/REG_MASKED_FIELD_DISABLE/g' $(git grep -wl _MASKED_BIT_DISABLE)
with some manual indentation fixes on top.
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/49dc20448a12f3e03f5f8347540d167a281b8987.1772042022.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
We've picked the value of TEST_VRAM to match real VRAM size as found
on the machines used by the CI, but that didn't work well on kernels
that have 32-bit resource_size_t. Use smaller value instead.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/intel-xe/20260227011639.GA1683727@ax162/
Fixes: cbe29da6f7c0 ("drm/xe/tests: Add KUnit tests for new VRAM fair provisioning")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20260227160010.12425-1-michal.wajdeczko@intel.com
|
|
Remove excess includes, group and sort include directives.
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Link: https://patch.msgid.link/637eab7df00a540df6b7ca1ca345302864b6342f.1772212579.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Move compat i915_vma.h to xe_display_vma.h, and remove all extra
cruft. Drop the i915_ggtt_offset() wrapper in favour of using
xe_ggtt_node_addr() directly.
The usefulness of the I915_TILING_X and I915_TILING_Y undef/define is
unclear, since uapi/drm/i915_drm.h is included in other paths as well.
The naming of struct i915_vma is a bit unfortunate in xe, but (at least
for now) a necessity for maintaining type safety on the opaque type.
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Link: https://patch.msgid.link/ecd5d75981b4b21c3da3b1831faceccfe385d898.1772212579.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
It's unclear what the direction of the VMA abstraction in the parent
interface should be, but convert i915_vma_fence_id() to parent interface
for starters. This paves the way for making struct i915_vma opaque
towards display.
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Link: https://patch.msgid.link/036f4b2d20cc1b0a7ab814beb5bb914c53b6eb53.1772212579.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
When check_bo_args_are_sane() validation fails, jump to the new
free_vmas cleanup label to properly free the allocated resources.
This ensures proper cleanup in this error path.
Fixes: 293032eec4ba ("drm/xe/bo: Update atomic_access attribute on madvise")
Cc: stable@vger.kernel.org # v6.18+
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260223175145.1532801-1-varun.gupta@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
|
|
Add a has_access_counter feature flag to the graphics IP descriptor
and skip writing parameters for the access counter queue in
guc_um_init_params(), leaving queue_params[2] zero-initialized
to signal unavailability to the GuC.
The queue_params[] array layout is fixed by firmware ABI, so we
maintain the structure with queues 0 and 1 (page fault request/response)
always configured, and queue 2 conditionally skipped based on the
has_access_counter flag.
Bspec: 59323
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Varun Gupta <varun.gupta1@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260225164748.2302380-1-varun.gupta@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- restrict multi-lrc to VCS/VECS engines (Xin Wang)
- Introduce a flag to disallow vm overcommit in fault mode (Thomas)
- update used tracking kernel-doc (Auld, Fixes)
- Some bind queue fixes (Auld, Fixes)
Cross-subsystem Changes:
- Split drm_suballoc_new() into SA alloc and init helpers (Satya, Fixes)
- pass pagemap_addr by reference (Arnd, Fixes)
- Revert "drm/pagemap: Disable device-to-device migration" (Thomas)
- Fix unbalanced unlock in drm_gpusvm_scan_mm (Maciej, Fixes)
- Small GPUSVM fixes (Brost, Fixes)
- Fix xe SVM configs (Thomas, Fixes)
Core Changes:
- Fix a hmm_range_fault() livelock / starvation problem (Thomas, Fixes)
Driver Changes:
- Fix leak on xa_store failure (Shuicheng, Fixes)
- Correct implementation of Wa_16025250150 (Roper, Fixes)
- Refactor context init into xe_lrc_ctx_init (Raag)
- Fix GSC proxy cleanup on early initialization failure (Zhanjun)
- Fix exec queue creation during post-migration recovery (Tomasz, Fixes)
- Apply windower hardware filtering setting on Xe3 and Xe3p (Roper)
- Free ctx_restore_mid_bb in release (Shuicheng, Fixes)
- Drop stale MCR steering TODO comment (Roper)
- dGPU memory optimizations (Brost)
- Do not preempt fence signaling CS instructions (Brost, Fixes)
- Revert "drm/xe/compat: Remove unused i915_reg.h from compat header" (Uma)
- Don't expose display modparam if no display support (Wajdeczko)
- Some VRAM flag improvements (Wajdeczko)
- Misc fix for xe_guc_ct.c (Shuicheng, Fixes)
- Remove unused i915_reg.h from compat header (Uma)
- Workaround cleanup & simplification (Roper)
- Add prefetch pagefault support for Xe3p (Varun)
- Fix fs_reclaim deadlock caused by CCS save/restore (Satya, Fixes)
- Cleanup partially initialized sync on parse failure (Shuicheng, Fixes)
- Allow to change VFs VRAM quota using sysfs (Michal)
- Increase GuC log sizes in debug builds (Tomasz)
- Wa_18041344222 changes (Harish)
- Add Wa_14026781792 (Niton)
- Add debugfs facility to catch RTP mistakes (Roper)
- Convert GT stats to per-cpu counters (Brost)
- Prevent unintended VRAM channel creation (Karthik)
- Privatize struct xe_ggtt (Maarten)
- remove unnecessary struct dram_info forward declaration (Jani)
- pagefault refactors (Brost)
- Apply Wa_14024997852 (Arvind)
- Redirect faults to dummy page for wedged device (Raag, Fixes)
- Force EXEC_QUEUE_FLAG_KERNEL for kernel internal VMs (Piotr)
- Stop applying Wa_16018737384 from Xe3 onward (Roper)
- Add new XeCore fuse registers to VF runtime regs (Roper)
- Update xe_device_declare_wedged() error log (Raag)
- Make xe_modparam.force_vram_bar_size signed (Shuicheng, Fixes)
- Avoid reading media version when media GT is disabled (Piotr, Fixes)
- Fix handling of Wa_14019988906 & Wa_14019877138 (Roper, Fixes)
- Basic enabling patches for Xe3p_LPG and NVL-P (Gustavo, Roper, Shekhar)
- Avoid double-adjust in 64-bit reads (Shuicheng, Fixes)
- Allow VF to initialize MCR tables (Wajdeczko)
- Add Wa_14025883347 for GuC DMA failure on reset (Anirban)
- Add bounds check on pat_index to prevent OOB kernel read in madvise (Jia, Fixes)
- Fix the address range assert in ggtt_get_pte helper (Winiarski)
- XeCore fuse register changes (Roper)
- Add more info to powergate_info debugfs (Vinay)
- Separate out GuC RC code (Vinay)
- Fix g2g_test_array indexing (Pallavi)
- Mutual exclusivity between CCS-mode and PF (Nareshkumar, Fixes)
- Some more _types.h cleanups (Wajdeczko)
- Fix sysfs initialization (Wajdeczko, Fixes)
- Drop unnecessary goto in xe_device_create (Roper)
- Disable D3Cold for BMG only on specific platforms (Karthik, Fixes)
- Add sriov.admin_only_pf attribute (Wajdeczko)
- replace old wq(s), add WQ_PERCPU to alloc_workqueue (Marco)
- Make MMIO communication more robust (Wajdeczko)
- Fix warning of kerneldoc (Shuicheng, Fixes)
- Fix topology query pointer advance (Shuicheng, Fixes)
- use entry_dump callbacks for xe2+ PAT dumps (Xin Wang)
- Fix kernel-doc warning in GuC scheduler ABI header (Chaitanya, Fixes)
- Fix CFI violation in debugfs access (Daniele, Fixes)
- Apply WA_16028005424 to Media (Balasubramani)
- Fix typo in function kernel-doc (Wajdeczko)
- Protect priority against concurrent access (Niranjana)
- Fix nvm aux resource cleanup (Shuicheng, Fixes)
- Fix is_bound() pci_dev lifetime (Shuicheng, Fixes)
- Use CLASS() for forcewake in xe_gt_enable_comp_1wcoh (Shuicheng)
- Reset VF GuC state on fini (Wajdeczko)
- Move _THIS_IP_ usage from xe_vm_create() to dedicated function (Nathan Chancellor, Fixes)
- Unregister drm device on probe error (Shuicheng, Fixes)
- Disable DCC on PTL (Vinay, Fixes)
- Fix Wa_18022495364 (Tvrtko, Fixes)
- Skip address copy for sync-only execs (Shuicheng, Fixes)
- derive mem copy capability from graphics version (Nitin, Fixes)
- Use DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations (Sanjay)
- Context based TLB invalidations (Brost)
- Enable multi_queue on xe3p_xpc (Brost, Niranjana)
- Remove check for gt in xe_query (Nakshtra)
- Reduce LRC timestamp stuck message on VFs to notice (Brost, Fixes)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/aaYR5G2MHjOEMXPW@lstrano-desk.jf.intel.com
|
|
Free the newly allocated entry when xa_store() fails to avoid a memory
leak on the error path.
v2: use goto fail_free. (Bala)
Fixes: e5283bd4dfec ("drm/xe/reg_sr: Remove register pool")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260204172810.1486719-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Wa_16025250150 asks us to set five register fields of the register to
0x1 each. However we were just OR'ing this into the existing register
value (which has a default of 0x4 for each nibble-sized field) resulting
in final field values of 0x5 instead of the desired 0x1. Correct the
RTP programming (use FIELD_SET instead of SET) to ensure each field is
assigned to exactly the value we want.
Cc: Aradhya Bhatia <aradhya.bhatia@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: stable@vger.kernel.org # v6.16+
Fixes: 7654d51f1fd8 ("drm/xe/xe2hpg: Add Wa_16025250150")
Reviewed-by: Ngai-Mint Kwan <ngai-mint.kwan@linux.intel.com>
Link: https://patch.msgid.link/20260227164341.3600098-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Currently xe_lrc_init() does two things.
1. Allocates LRC bo based on exec queue parameters.
2. Initializes LRC bo with actual context details.
Introduce xe_lrc_ctx_init() and split these two implementations for
better maintainability.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260302082757.3516577-1-raag.jadav@intel.com
|
|
xe_gsc_proxy_remove undoes what is done in both xe_gsc_proxy_init and
xe_gsc_proxy_start; however, if we fail between those 2 calls, it is
possible that the HW forcewake access hasn't been initialized yet and so
we hit errors when the cleanup code tries to write GSC register. To
avoid that, split the cleanup in 2 functions so that the HW cleanup is
only called if the HW setup was completed successfully.
Since the HW cleanup (interrupt disabling) is now removed from
xe_gsc_proxy_remove, the cleanup on error paths in xe_gsc_proxy_start
must be updated to disable interrupts before returning.
Fixes: ff6cd29b690b ("drm/xe: Cleanup unwind of gt initialization")
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260220225308.101469-1-zhanjun.dong@intel.com
|
|
Every call to queue init should have a corresponding fini call.
Skipping this would mean skipping removal of the queue from GuC list
(which is part of guc_id allocation). A damaged queue stored in
exec_queue_lookup list would lead to invalid memory reference,
sooner or later.
Call fini to free guc_id. This must be done before any internal
LRCs are freed.
Since the finalization with this extra call became very similar to
__xe_exec_queue_fini(), reuse that. To make this reuse possible,
alter xe_lrc_put() so it can survive NULL parameters, like other
similar functions.
v2: Reuse _xe_exec_queue_fini(). Make xe_lrc_put() aware of NULLs.
Fixes: 3c1fa4aa60b1 ("drm/xe: Move queue init before LRC creation")
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> (v1)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-2-tomasz.lis@intel.com
(cherry picked from commit 393e5fea6f7d7054abc2c3d97a4cfe8306cd6079)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
ctx_restore_mid_bb memory is allocated in wa_bb_store(), but
xe_config_device_release() only frees ctx_restore_post_bb.
Free ctx_restore_mid_bb[0].cs as well to avoid leaking the allocation
when the configfs device is removed.
Fixes: b30d5de3d40c ("drm/xe/configfs: Add mid context restore bb")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://patch.msgid.link/20260225013448.3547687-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit a235e7d0098337c3f2d1e8f3610c719a589e115f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
If a batch buffer is complete, it makes little sense to preempt the
fence signaling instructions in the ring, as the largest portion of the
work (the batch buffer) is already done and fence signaling consists of
only a few instructions. If these instructions are preempted, the GuC
would need to perform a context switch just to signal the fence, which
is costly and delays fence signaling. Avoid this scenario by disabling
preemption immediately after the BB start instruction and re-enabling it
after executing the fence signaling instructions.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Carlos Santa <carlos.santa@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260115004546.58060-1-matthew.brost@intel.com
(cherry picked from commit 2bcbf2dcde0c839a73af664a3c77d4e77d58a3eb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v7.1:
UAPI Changes:
connector:
- Add panel_type property
fourcc:
- Add ARM interleaved 64k modifier
nouveau:
- Query Z-Cull info with DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO
Cross-subsystem Changes:
coreboot:
- Clean up coreboot framebuffer support
dma-buf:
- Provide revoke mechanism for shared buffers
- Rename move_notify callback to invalidate_mappings and update users.
- Always enable move_notify
- Support dma_fence_was_initialized() test
- Protect dma_fence_ops by RCU and improve locking
- Fix sparse warnings
Core Changes:
atomic:
- Allocate drm_private_state via callback and convert drivers
atomic-helper:
- Use system_percpu_wq
buddy:
- Make buddy allocator available to all DRM drivers
- Document flags and structures
colorop:
- Add destroy helper and convert drivers
fbdev-emulation:
- Clean up
gem:
- Fix drm_gem_objects_lookup() error cleanup
Driver Changes:
amdgpu:
- Set panel_type to OELD for eDP
atmel-hlcdc:
- Support sana5d65 LCD controller
bridge:
- anx7625: Support USB-C plus DT bindings
- connector: Fix EDID detection
- dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others
- fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor'
- imx8qxp-pixel-link: Improve bridge reference handling
- lt9611: Support Port-B-only input plus DT bindings
- tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up
- Support TH1520 HDMI plus DT bindings
- Clean up
imagination:
- Clean up
komeda:
- Fix integer overflow in AFBC checks
mcde:
- Improve bridge handling
nouveau:
- Provide Z-cull info to user space
- gsp: Support GA100
- Shutdown on PCI device shutdown
- Clean up
panel:
- panel-jdi-lt070me05000: Use mipi-dsi multi functions
- panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes
- Fix Kconfig dependencies
panthor:
- Add tracepoints for power and IRQs
rcar-du:
- dsi: fix VCLK calculation
rockchip:
- vop2: Use drm_ logging functions
- Support DisplayPort on RK3576
sysfb:
- corebootdrm: Support system framebuffer on coreboot firmware; detect orientation
- Clean up pixel-format lookup
sun4i:
- Clean up
tilcdc:
- Use DT bindings scheme
- Use managed DRM interfaces
- Support DRM_BRIDGE_ATTACH_NO_CONNECTOR
- Clean up a lot of obsolete code
v3d:
- Clean up
vc4:
- Use system_percpu_wq
- Clean up
verisilicon:
- Support DC8200 plus DT bindings
virtgpu:
- Support PRIME imports with enabled 3D
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260226143615.GA47200@linux.fritz.box
|
|
If the xe module within a VM was creating a new LRC during save/
restore, this LRC will be invalid. The fixups procedure may not
be able to reach it, as there will be a race to add the new LRC
reference to an exec queue.
Even if the new LRC which was being created during VM migration is
added to EQ in time for fixups, said LRC may still remain damaged.
In a small percentage of specially crafted test cases, the resulting
LRC was still damaged and caused GPU hang.
Any LRC which could be created in such a situation, have to be
re-created.
Due to VM having arbitrarily set amount of CPU cores, it is possible
to limit the amount to 1. In such case, there is a possibility that
kernel will switch CPU contexts in a way which allows to miss
VF migration recovery running in parallel (by simply not switching
to the LRC creation thread during recovery). Therefore checking
if the migration is in progress just after LRC creation, is not
enough to ensure detection.
Free the incorrectly created LRC, and trigger a re-run of the
creation, but only after waiting for default LRC to get fixups.
Use additional atomic value increased after fixups, to ensure any VF
migration that avoided detection by just checking for recovery in
progress, will be caught.
v2: Merge marker and wait for default LRC, reducing amount of calls
within xe_init_eq(). Alter the LRC creation loop to remove a race
with post-migration fixups worker.
v3: Kerneldoc fixes. Rename fixups_complete_count.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-5-tomasz.lis@intel.com
|
|
When a context is being created during save/restore, the LRC creation
needs to wait for GGTT address space to be shifted. But it also needs
to have fixed default LRCs. This is mandatory to avoid the situation
where LRC will be created based on data from before the fixups, but
reference within exec queue will be set too late for fixups.
This fixes an issue where contexts created during save/restore have
a large chance of having one unfixed LRC, due to the xe_lrc_create()
being synced for equal start to race with default LRC fixups.
v2: Move the fixups confirmation further, behind all fixups.
Revert some renames.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-4-tomasz.lis@intel.com
|
|
There is a small but non-zero chance that VF post migration fixups
are running on an exec queue during teardown. The chances are
decreased by starting the teardown by releasing guc_id, but remain
non-zero. On the other hand the sync between fixups and EQ creation
(wait_valid_ggtt) drastically increases the chance for such parallel
teardown if queue creation error path is entered (err_lrc label).
The exec queue itself is not going to cause an issue, but LRCs have
a small chance of getting freed during the fixups.
Creating a setter and a getter makes it easier to protect the fixup
operations with a lock. For other driver activities, the original
access method (without any protection) can still be used.
v2: Separate lock, only for LRCs. Kerneldoc fixes. Subject tag fix.
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260226212701.2937065-3-tomasz.lis@intel.com
|