| Age | Commit message (Collapse) | Author |
|
Each scheduler group can be independently configured with its own exec
quantum and preemption timeouts. The existing KLVs to configure those
parameters will apply the value to all groups (even if they're not
enabled at the moment).
When scheduler groups are disabled, the GuC uses the values from Group 0.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-23-daniele.ceraolospurio@intel.com
|
|
Under a new subfolder, an entry is created for each group to list the
engines assigned to them. We create enough entries for each possible
group, with the disabled groups just returning an empty list.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-22-daniele.ceraolospurio@intel.com
|
|
Reading the debugfs file lists the available configurations by name.
Writing the name of a configuration to the file will enable it.
Note that while this debugfs is PF-only, follow up patches will add some
debugfs files that are applicable to VF as well, so the function accepts
a vfid parameter to be ready for that.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-21-daniele.ceraolospurio@intel.com
|
|
VF can check if PF has enabled scheduler groups with a dedicated KLV
query. If scheduler groups are enabled, MLRC queue registrations are
forbidden.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-20-daniele.ceraolospurio@intel.com
|
|
Since engines in the same class can be divided across multiple groups,
the GuC does not allow scheduler groups to be active if there are
multi-lrc contexts. This means that:
1) if a MLRC context is registered when we enable scheduler groups, the
GuC will silently ignore the configuration
2) if a MLRC context is registered after scheduler groups are enabled,
the GuC will disable the groups and generate an adverse event.
The expectation is that the admin will ensure that all apps that use
MLRC on PF have been terminated before scheduler groups are created. A
check is added anyway to make sure we don't still have contexts waiting
to be cleaned up laying around. A check is also added at queue creation
time to block MLRC queue creation if scheduler groups have been enabled.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-19-daniele.ceraolospurio@intel.com
|
|
Scheduler groups are enabled by sending a specific policy configuration
KLV to the GuC. We don't allow changing this policy if there are VF
active, since the expectation is that the VF will only check if the
feature is enabled during driver initialization.
While the GuC interface supports a maximum of 8 groups, the actual
number of groups that can be enabled can be lower than that and
can be different on different devices. For now, all devices support up
to 2 groups, so we check that we do not have more groups than that.
The functions added by this patch will be used by sysfs/debugfs, coming
in follow up patches.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-18-daniele.ceraolospurio@intel.com
|
|
Scheduler groups (a.k.a. Engine Groups Scheduling, or EGS) is a GuC
feature that allows the driver to define groups of engines that are
independently scheduled across VFs, which allows different VFs to be
active on the HW at the same time on different groups. The feature is
available for BMG and newer HW starting on GuC 70.53.0, but some
required fixes have been added to GuC 70.55.1.
This is intended for specific scenarios where the admin knows that the
VFs are not going to fully utilize the HW and therefore assigning all of
it to a single VF would lead to part of it being permanently idle.
We do not allow the admin to decide how to divide the engines across
groups, but we instead support specific configurations that are designed
for specific use-cases. During PF initialization we detect which
configurations are possible on a given GT and create the relevant
groups. Since the GuC expect a mask for each class for each group, that
is what we save when we init the configs.
Right now we only have one use-case on the media GT. If the VFs are
running a frame render + encoding at a not-too-high resolution (e.g.
1080@30fps) the render can produce frames faster than the video engine
can encode them, which means that the maximum number of parallel VFs is
limited by the VCS bandwidth. Since our products can have multiple VCS
engines, allowing multiple VFs to be active on the different VCS engines
at the same time allows us to run more parallel VFs on the same HW.
Given that engines in the same media slice share some resources (e.g.
SFC), we assign each media slice to a different scheduling group. We
refer to this configuration as "media_slices", given that each slice
gets its own group. Since upcoming products have a different number of
video engines per-slice, for now we limit the media_slices mode to BMG,
but we expect to add support for newer HW soon.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-17-daniele.ceraolospurio@intel.com
|
|
Some upcoming KLVs are sized based on the engine counts, so we need
those defines to be moved to a separate file to include them from
guc_klv_abi.h (which is already included by guc_fwif.h).
Instead of moving just the engine-related defines, it is cleaner to
move all scheduler-related defines (i.e., everything engine or context
related). Note that the legacy GuC defines have not been moved and have
instead been dropped because Xe doesn't support any GuC old enough to
still use them.
While at it, struct guc_ctxt_registration_info has been moved to
guc_submit.c since it doesn't come from the GuC specs (we added it to
make things simpler in our code).
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-16-daniele.ceraolospurio@intel.com
|
|
Follow up patches will need the engine masks for VCS and VECS engines.
Since we already have a macro for the CCS engines, just extend the same
approach to all classes.
To avoid confusion with the XE_HW_ENGINE_*_MASK masks, the new macros
use the _INSTANCES suffix instead. For consistency, rename CCS_MASK to
CCS_INSTANCES as well.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251218223846.1146344-15-daniele.ceraolospurio@intel.com
|
|
With the i915 switch to generic fault injection, display no longer needs
the compat i915_utils.h. Remove it, along with a few includes.
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Link: https://patch.msgid.link/20251219104036.855258-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The initial plane parent interface ->alloc_obj hook no longer needs the
crtc for anything. Pass struct drm_device instead.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/7a40381be6d98dc0916a5447be5dd6cba86cfd0a.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
->setup
The initial plane parent interface ->setup hook no longer needs the crtc
for anything. Pass the struct drm_plane_state instead.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/c3db101ef5fd13c56cb3a9329adecf521a807abc.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Deduplicate more of the identical parts of i915 and xe initial plane
setup. This lets us reduce the core dependency on display internals.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/1a2abbceedb9e7d03f262c44cd54a24556ef6b61.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Move the modifier checks into common code to deduplicate.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/3c62ad48595aa2306219b1d6a215cf7680a67da2.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Move intel_reuse_initial_plane_obj() into common display code, and split
the ->find_obj hook into ->alloc_obj and ->setup hooks.
Return the struct drm_gem_object from ->alloc_obj in preparation for
moving more things to display.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/c71011dbb11afaa5c4da30aa2627833374300d63.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Initialize fb in the same level as the other code path.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/47d3272cff13dc8f5d7323c32bfb3cc34c0c977d.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
i915 and xe
Move some easy common parts to display. Initially, the
intel_find_initial_plane_obj() error path seems silly, but it'll be more
helpful this way for later changes.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/950d308172443d5bae975aa1ab72111720134219.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Move the common code to display. Retain empty xe_plane_config_fini() for
now, in case it's needed in the future.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/14322386cdb1a0f4f6c7ff74a5a9696ea0ff84bf.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Move the parent interface at one step lower level, allowing
deduplication.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/0cb4077a5a39274c7a2dae95d548d7b33365a518.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Add the initial plane handling functions to the display parent
interface. Add the call wrappers in dedicated intel_initial_plane.c
instead of intel_parent.c, as we'll be refactoring the calls heavily.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/ab91c891677fe2bb83bf5aafa5ee984b2442b84d.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Follow the more naturally flowing naming. Rename both the header and the
vblank wait function.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/32c2d68a9ae7d2262ad2c63e873e522e67bc78df.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Follow i915 with the more naturally flowing naming.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/62eb56fe348a8fe7c17333d784192da701367cc7.1765812266.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Pass the correct alignment from intel_fb_pin_to_ggtt() down to
__xe_pin_fb_vma().
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Closes: https://lore.kernel.org/intel-xe/aNL_RgLy13fXJbYx@intel.com/
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: b0228a337de8 ("drm/xe/display: align framebuffers according to hw requirements")
Cc: <stable@vger.kernel.org> # v6.13+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251208181550.6618-1-tursulin@igalia.com
|
|
Print the GuC queue submission state when an engine reset occurs, as
this provides clues about the cause of the reset.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251218224546.4057424-1-matthew.brost@intel.com
|
|
Set the kernel log level for unhandled page faults to match the log
level (info) for engine resets. Currently, dmesg output can be confusing
because it shows an engine reset without indicating the page fault that
caused it. Without this change, the GuC log must be examined to
determine the source of the engine reset.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251218223745.4045207-1-matthew.brost@intel.com
|
|
Fix static analysis tool reported issue. Add index bound check before
accessing info array to prevent out of bound.
Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode")
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251219105224.871930-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Fix sparse warnings. Use static for survivability info attributes.
Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512101919.G12cuhBJ-lkp@intel.com/
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251219105224.871930-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Replace sprintf() calls with sysfs_emit() to follow current kernel
coding standards.
sysfs_emit() is the preferred method for formatting sysfs output as it
provides better bounds checking and is more secure.
Signed-off-by: Madhur Kumar <madhurkumar004@gmail.com>
Link: https://patch.msgid.link/20251214083659.2412218-1-madhurkumar004@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo adjusted commit message while pushing it]
|
|
Backmerging to bring in 6.19-rc1. An important upstream bugfix and
to help unblock PTL CI.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Sphinx reports htmldocs warnings:
Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:76: ERROR: A level 2 section cannot be used here.
Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:87: ERROR: A level 2 section cannot be used here.
The xe_guc_pc.c documentation is included inside xe_firmware.rst.
The headers in the C file currently use '=' underlines, which conflict with
the parent document's section levels.
Fix this by demoting "Frequency management" and "Render-C States" headers
from '=' to '-' to correctly nest them as subsections.
Build environment: Python 3.13.7 Sphinx 8.2.3 docutils 0.22.3
Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
Link: https://patch.msgid.link/20251209094836.18589-1-swarajgaikwad1925@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Remove unused index variable and fix for loop.
Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/intel-xe/20251210075757.GA1206705@ax162/
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251218105151.586575-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Mark CRI as one that have the CSC NVM device.
Update the writable override flow to take the information from
the scratch register for CRI.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251216111034.3093507-1-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
msleep is not very accurate in terms of how long it actually sleeps,
whereas usleep_range is precise. Replace the timeslice sleep for
long-running workloads with the more accurate usleep_range to avoid
jitter if the sleep period is less than 20ms.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-3-matthew.brost@intel.com
(cherry picked from commit ca415c4d4c17ad676a2c8981e1fcc432221dce79)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
When imported dma-bufs are destroyed, TTM is not fully
individualizing the dma-resv, but it *is* copying the fences that
need to be waited for before declaring idle. So in the case where
the bo->resv != bo->_resv we can still drop the preempt-fences, but
make sure we do that on bo->_resv which contains the fence-pointer
copy.
In the case where the copying fails, bo->_resv will typically not
contain any fences pointers at all, so there will be nothing to
drop. In that case, TTM would have ensured all fences that would
have been copied are signaled, including any remaining preempt
fences.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fa0af721bd1f ("drm/ttm: test private resv obj on release/destroy")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251217093441.5073-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 425fe550fb513b567bd6d01f397d274092a9c274)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
An EU stall property value of 0 is invalid and will cause a NPD.
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6453
Fixes: 1537ec85ebd7 ("drm/xe/uapi: Introduce API for EU stall sampling")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-4-ashutosh.dixit@intel.com
(cherry picked from commit 5bf763e908bf795da4ad538d21c1ec41f8021f76)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
An OA property value of 0 is invalid and will cause a NPD.
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6452
Fixes: cc4e6994d5a2 ("drm/xe/oa: Move functions up so they can be reused for config ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-3-ashutosh.dixit@intel.com
(cherry picked from commit 7a100e6ddcc47c1f6ba7a19402de86ce24790621)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The xe_sriov_vfio_migration_supported() function is type bool so
returning -EPERM means returning true. Return false instead.
Fixes: bd45d46ffc8f ("drm/xe/pf: Export helpers for VFIO")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/aTLEZ4g-FD-iMQ2V@stanley.mountain
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 0a2404c8f6a3a120f79c57ef8a3302c8e8bc34d9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Reports can be written out to the OA buffer using ways other than periodic
sampling. These include mmio trigger and context switches. To support these
use cases, when periodic sampling is not enabled,
OAG_OAGLBCTXCTRL_COUNTER_RESUME must be set.
Fixes: 1db9a9dc90ae ("drm/xe/oa: OA stream initialization (OAG)")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20251205212613.826224-4-ashutosh.dixit@intel.com
(cherry picked from commit 88d98e74adf3e20f678bb89581a5c3149fdbdeaa)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
A 10ms timeslice for long-running workloads is far too long and causes
significant jitter in benchmarks when the system is shared. Adjust the
value to 5ms for preempt-fencing VMs, as the resume step there is quite
costly as memory is moved around, and set it to zero for pagefault VMs,
since switching back to pagefault mode after dma-fence mode is
relatively fast.
Also change min_run_period_ms to 'unsiged int' type rather than 's64' as
only positive values make sense.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-2-matthew.brost@intel.com
(cherry picked from commit 33a5abd9a68394aa67f9618b20eee65ee8702ff4)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The OA open parameters did not validate num_syncs, allowing
userspace to pass arbitrarily large values, potentially
leading to excessive allocations.
Add check to ensure that num_syncs does not exceed DRM_XE_MAX_SYNCS,
returning -EINVAL when the limit is violated.
v2: use XE_IOCTL_DBG() and drop duplicated check. (Ashutosh)
Fixes: c8507a25cebd ("drm/xe/oa/uapi: Define and parse OA sync properties")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251205234715.2476561-6-shuicheng.lin@intel.com
(cherry picked from commit e057b2d2b8d815df3858a87dffafa2af37e5945b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The exec and vm_bind ioctl allow userspace to specify an arbitrary
num_syncs value. Without bounds checking, a very large num_syncs
can force an excessively large allocation, leading to kernel warnings
from the page allocator as below.
Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request
exceeding this limit.
"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124
...
Call Trace:
<TASK>
alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416
___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317
__kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348
__do_kmalloc_node mm/slub.c:4364 [inline]
__kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388
kmalloc_noprof include/linux/slab.h:909 [inline]
kmalloc_array_noprof include/linux/slab.h:948 [inline]
xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158
drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797
drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894
xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:598 [inline]
__se_sys_ioctl fs/ioctl.c:584 [inline]
__x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
"
v2: Add "Reported-by" and Cc stable kernels.
v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh)
v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt)
v5: Do the check at the top of the exec func. (Matt)
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Koen Koning <koen.koning@intel.com>
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450
Cc: <stable@vger.kernel.org> # v6.12+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Ivan Briano <ivan.briano@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com
(cherry picked from commit b07bac9bd708ec468cd1b8a5fe70ae2ac9b0a11c)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
When imported dma-bufs are destroyed, TTM is not fully
individualizing the dma-resv, but it *is* copying the fences that
need to be waited for before declaring idle. So in the case where
the bo->resv != bo->_resv we can still drop the preempt-fences, but
make sure we do that on bo->_resv which contains the fence-pointer
copy.
In the case where the copying fails, bo->_resv will typically not
contain any fences pointers at all, so there will be nothing to
drop. In that case, TTM would have ensured all fences that would
have been copied are signaled, including any remaining preempt
fences.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fa0af721bd1f ("drm/ttm: test private resv obj on release/destroy")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251217093441.5073-1-thomas.hellstrom@linux.intel.com
|
|
An EU stall property value of 0 is invalid and will cause a NPD.
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6453
Fixes: 1537ec85ebd7 ("drm/xe/uapi: Introduce API for EU stall sampling")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-4-ashutosh.dixit@intel.com
|
|
An OA property value of 0 is invalid and will cause a NPD.
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6452
Fixes: cc4e6994d5a2 ("drm/xe/oa: Move functions up so they can be reused for config ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-3-ashutosh.dixit@intel.com
|
|
De-referencing param.oa_unit, when an OA unit id is not provided during
stream open, results in NPD below.
Oops: general protection fault, probably for non-canonical address...
KASAN: null-ptr-deref in range...
RIP: 0010:xe_oa_stream_open_ioctl+0x169/0x38a0
xe_observation_ioctl+0x19f/0x270
drm_ioctl_kernel+0x1f4/0x410
Fix this by moving default oa unit assignment before the dereference.
Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6840
Fixes: c7e269aa565f ("drm/xe/oa: Allow exec_queue's to be specified only for OAG OA unit")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20251212061850.1565459-2-ashutosh.dixit@intel.com
|
|
Since it is illegal to register a MLRC context when scheduler groups are
enabled, the GuC consider the VF doing so as an adverse event. Like for
other adverse event, there is a threshold for how many times the event
can happen before the GuC throws an error, which we need to add support
for.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251216214902.1429-5-michal.wajdeczko@intel.com
|
|
We want to extend our macro-based KLV list definitions with new
information about the version from which given KLV is supported.
Prepare our code generators to emit dedicated version check if
a KLV was defined with the version information.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251216214902.1429-4-michal.wajdeczko@intel.com
|
|
There are already few places in the code where we need to check GuC
firmware version. Wrap existing raw conditions into a named helper
macro to make it clear and avoid explicit call of the MAKE_GUC_VER.
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251216214902.1429-3-michal.wajdeczko@intel.com
|
|
We want to extend our macro-based KLV list definitions with new
information about the version from which given KLV is supported.
Add utility IF_ARGS macro that can be used in code generators to
emit different code based on the presence of additional arguments.
Introduce macro itself and extend our kunit tests to cover it.
We will use this macro in next patch.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251217224018.3490-1-michal.wajdeczko@intel.com
|
|
Helper function xe_sync_needs_wait expects sync->fence when accessing
flags, patch makes sure we call only when sync->fence exists.
v2: move null checking to xe_sync_needs_wait and make
xe_sync_entry_wait utilize this helper (Matthew Auld)
v3: further simplify code (Matthew Auld)
Fixes NULL pointer dereference seen with Vulkan workloads:
[ 118.410401] RIP: 0010:xe_sync_needs_wait+0x27/0x50 [xe]
Fixes: 4ac9048d0501 ("drm/xe: Wait on in-syncs when swicthing to dma-fence mode")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251217132412.435755-1-tapani.palli@intel.com
|