| Age | Commit message (Collapse) | Author |
|
After the context is unpinned the backing memory can also be unpinned,
so any accesses via the lrc_reg_state pointer can end up in unmapped
memory. To avoid that, make sure to only access that memory if the
context is pinned when printing its info.
v2: fix newline alignment
Fixes: 28ff6520a34d ("drm/i915/guc: Update GuC debugfs to support new GuC")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250115001334.3875347-1-daniele.ceraolospurio@intel.com
|
|
This reverts commit 50554bf3e56dd0c78ef1eedb685d0ab36c9c9987.
DCC in LNL should be disabled. It was a mistake to decide
to go against GuC platform defaults in this case and this
could lead to regressions in some TDP limited scenarios
instead of helping.
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128223248.660748-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
ast_dp_is_connected() used to also check for link training success
to report the DP connector as connected. Without this check, the
physical_status is always connected. So if no monitor is present, it
will fail to read the EDID and set the default resolution to 640x480
instead of 1024x768.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Fixes: 2281475168d2 ("drm/ast: astdp: Perform link training during atomic_enable")
Reported-by: Jerry Hoemann <jerry.hoemann@hpe.com>
Tested-by: Jose Lopez <jose.lopez@hpe.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124141142.2434138-1-jfalempe@redhat.com
|
|
This restores the original behavior that gets min/max freq from EDID and
only set DP/eDP connector as freesync capable if "sink device is capable
of rendering incoming video stream without MSA timing parameters", i.e.,
`allow_invalid_MSA_timing_params` is true. The condition was mistakenly
removed by 0159f88a99c9 ("drm/amd/display: remove redundant freesync
parser for DP").
CC: Mario Limonciello <mario.limonciello@amd.com>
CC: Alex Hung <alex.hung@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3915
Fixes: 0159f88a99c9 ("drm/amd/display: remove redundant freesync parser for DP")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
The following page fault was observed duringthe KFD process release.
In this particular error case, the HIP test (./MemcpyPerformance -h)
does not require the queue. As a result, the process_context_addr was
not assigned when the KFD process was released, ultimately leading to
this page fault during the execution of the function
kfd_process_dequeue_from_all_devices().
[345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0)
[345962.295333] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33
[345962.296097] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[345962.296394] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[345962.296633] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1
[345962.296876] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[345962.297135] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
[345962.297377] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0)
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
[Why]
the offset address of mmCLK5_spll_field_8 was incorrect for dcn35
which causes SSC not to be enabled.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Lo-An Chen <lo-an.chen@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Aldebaran doesn't support querying MM activity percentage. Keep the
field as 0xFFs to mark it as unsupported.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
change the config of cgcg on gfx12
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
|
|
The purpose of halt_if_hws_hang is to preserve GPU state for driver
debugging when queue preemption fails. Issuing per-queue reset may
kill wavefronts which caused the preemption failure.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs updates from Greg KH:
"Here is the big set of driver core and debugfs updates for 6.14-rc1.
Included in here is a bunch of driver core, PCI, OF, and platform rust
bindings (all acked by the different subsystem maintainers), hence the
merge conflict with the rust tree, and some driver core api updates to
mark things as const, which will also require some fixups due to new
stuff coming in through other trees in this merge window.
There are also a bunch of debugfs updates from Al, and there is at
least one user that does have a regression with these, but Al is
working on tracking down the fix for it. In my use (and everyone
else's linux-next use), it does not seem like a big issue at the
moment.
Here's a short list of the things in here:
- driver core rust bindings for PCI, platform, OF, and some i/o
functions.
We are almost at the "write a real driver in rust" stage now,
depending on what you want to do.
- misc device rust bindings and a sample driver to show how to use
them
- debugfs cleanups in the fs as well as the users of the fs api for
places where drivers got it wrong or were unnecessarily doing
things in complex ways.
- driver core const work, making more of the api take const * for
different parameters to make the rust bindings easier overall.
- other small fixes and updates
All of these have been in linux-next with all of the aforementioned
merge conflicts, and the one debugfs issue, which looks to be resolved
"soon""
* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
rust: device: Use as_char_ptr() to avoid explicit cast
rust: device: Replace CString with CStr in property_present()
devcoredump: Constify 'struct bin_attribute'
devcoredump: Define 'struct bin_attribute' through macro
rust: device: Add property_present()
saner replacement for debugfs_rename()
orangefs-debugfs: don't mess with ->d_name
octeontx2: don't mess with ->d_parent or ->d_parent->d_name
arm_scmi: don't mess with ->d_parent->d_name
slub: don't mess with ->d_name
sof-client-ipc-flood-test: don't mess with ->d_name
qat: don't mess with ->d_name
xhci: don't mess with ->d_iname
mtu3: don't mess wiht ->d_iname
greybus/camera - stop messing with ->d_iname
mediatek: stop messing with ->d_iname
netdevsim: don't embed file_operations into your structs
b43legacy: make use of debugfs_get_aux()
b43: stop embedding struct file_operations into their objects
carl9170: stop embedding file_operations into their objects
...
|
|
Previously, the RPS function was being used, which utilizes
raw frequency to calculate measured power. This commit introduces
a dedicated function specifically for measuring power in SLPC,
ensuring more accurate and reliable power measurements.
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250113095912.356147-3-sk.anirban@intel.com
|
|
Fix the frequency calculation by ensuring it uses the raw frequency only.
Update live_rps_power test to use the correct frequency values for logging
and comparison.
Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250113095912.356147-2-sk.anirban@intel.com
|
|
Add Wa_22010465259 which points to an existing WA, but
was missing from the comment list. While at it, update
the other WAs and their applicable platforms as well.
v1: Initial commit.
v2: Add DG2 platform to Wa_22010465259.
v3: Removed DG2 platform to Wa_22010465259 since it
was for preproduction.
Signed-off-by: Ranu Maurya <ranu.maurya@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250116093115.2437154-1-ranu.maurya@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Currently we just define the display tracepoints with
TRACE_SYSTEM i915. However the code gets included separately
in i915 and xe, and now both modules are competing for the
same tracepoints. Apparently whichever module is loaded first
gets the tracepoints and the other guy is left with nothing.
Give each module its own set of display tracepoints so that
things work even when both modules are loaded.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250127213055.640-1-ville.syrjala@linux.intel.com
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
|
|
Make debugging a bit easier by including the pixel format in
the plane tracepoints.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241218173650.19782-5-ville.syrjala@linux.intel.com
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
|
Using the plane->state pointer in the tracepoints is incorrect
as technically a different state could already have been swapped
in (though in reality that is currently prevented by the stall
hacks in the commit machinery). But let's not leave such footguns
lying around when we can just pass in the correct state by hand.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241218173650.19782-4-ville.syrjala@linux.intel.com
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
|
Out plane names already include the "plane" part (or
"primary","sprite","cursor" in some cases). Don't duplicate
that in the tracepoints as that leadst to weird stuff like
"plane plane 1A".
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241218173650.19782-3-ville.syrjala@linux.intel.com
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
|
I'm seeing underruns with these 64bpp YUV formats on TGL.
The weird details:
- only happens on pipe B/C/D SDR planes, pipe A SDR planes
seem fine, as do all HDR planes
- somehow CDCLK related, higher CDCLK allows for bigger plane
with these formats without underruns. With 300MHz CDCLK I
can only go up to 1200 pixels wide or so, with 650MHz even
a 3840 pixel wide plane was OK
- ICL and ADL so far appear unaffected
So not really sure what's the deal with this, but bspec does
state "64-bit formats supported only on the HDR planes" so
let's just drop these formats from the SDR planes. We already
disallow 64bpp RGB formats.
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241218173650.19782-2-ville.syrjala@linux.intel.com
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
|
When converting to folios the cleanup path of shmem_get_pages() was
missed. When a DMA remap fails and the max segment size is greater than
PAGE_SIZE it will attempt to retry the remap with a PAGE_SIZEd segment
size. The cleanup code isn't properly using the folio apis and as a
result isn't handling compound pages correctly.
v2 -> v3:
(Ville) Just use shmem_sg_free_table() as-is in the failure path of
shmem_get_pages(). shmem_sg_free_table() will clear mapping unevictable
but it will be reset when it retries in shmem_sg_alloc_table().
v1 -> v2:
(Ville) Fixed locations where we were not clearing mapping unevictable.
Cc: stable@vger.kernel.org
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Vidya Srinivas <vidya.srinivas@intel.com>
Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13487
Link: https://lore.kernel.org/lkml/20250116135636.410164-1-bgeffon@google.com/
Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Signed-off-by: Brian Geffon <bgeffon@google.com>
Suggested-by: Tomasz Figa <tfiga@google.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250127204332.336665-1-bgeffon@google.com
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Tested-by: Vidya Srinivas <vidya.srinivas@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
|
|
Initialize mei-gsc in survivability mode and disable HECI
interrupts. Also initialize vsec in survivability mode
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-4-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Enable boot survivability mode if pcode initialization fails and
if boot status indicates a failure. In this mode, drm card is not
exposed and driver probe returns success after loading the bare minimum
to allow firmware to be flashed via mei.
v2: abstract survivability mode variable
add BMG check inside function (Jani, Rodrigo)
v3: return -EBUSY during system suspend (Anshuman)
check survivability mode in pci probe only
on error
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-3-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Boot Survivability is a software based workflow for recovering a system
in a failed boot state. Here system recoverability is concerned with
recovering the firmware responsible for boot.
This is implemented by loading the driver with bare minimum (no drm card)
to allow the firmware to be flashed through mei-gsc and collect telemetry.
The driver's probe flow is modified such that it enters survivability mode
when pcode initialization is incomplete and boot status denotes a failure.
In this mode, drm card is not exposed and presence of survivability_mode
entry in PCI sysfs is used to indicate survivability mode and
provide additional information required for debug
This patch adds initialization functions and exposes admin
readable sysfs entries
The new sysfs will have the below layout
/sys/bus/.../bdf
├── survivability_mode
v2: reorder headers
fix doc
remove survivability info and use mode to display information
use separate function for logging survivability information
for critical error (Rodrigo)
v3: use for loop
use dev logs instead of drm
use helper function for aux history(Rodrigo)
remove unnecessary error check of greater than max_scratch
as we are reading only 3 bit
v4: fix checkpatch warnings
fix space (Rodrigo)
rename register
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Acked-by: Ashwin Kumar Kulkarni <ashwin.kumar.kulkarni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-2-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The ASTDP transmitter sometimes takes up to 1 second for enabling the
video signal, while the timeout is only 200 msec. This results in a
kernel error message. Increase the timeout to 1 second. An example
of the error message is shown below.
[ 697.084433] ------------[ cut here ]------------
[ 697.091115] ast 0000:02:00.0: [drm] drm_WARN_ON(!__ast_dp_wait_enable(ast, enabled))
[ 697.091233] WARNING: CPU: 1 PID: 160 at drivers/gpu/drm/ast/ast_dp.c:232 ast_dp_set_enable+0x123/0x140 [ast]
[...]
[ 697.272469] RIP: 0010:ast_dp_set_enable+0x123/0x140 [ast]
[...]
[ 697.415283] Call Trace:
[ 697.420727] <TASK>
[ 697.425908] ? show_trace_log_lvl+0x196/0x2c0
[ 697.433304] ? show_trace_log_lvl+0x196/0x2c0
[ 697.440693] ? drm_atomic_helper_commit_modeset_enables+0x30a/0x470
[ 697.450115] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.458059] ? __warn.cold+0xaf/0xca
[ 697.464713] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.472633] ? report_bug+0x134/0x1d0
[ 697.479544] ? handle_bug+0x58/0x90
[ 697.486127] ? exc_invalid_op+0x13/0x40
[ 697.492975] ? asm_exc_invalid_op+0x16/0x20
[ 697.500224] ? preempt_count_sub+0x14/0xc0
[ 697.507473] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.515377] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.523227] drm_atomic_helper_commit_modeset_enables+0x30a/0x470
[ 697.532388] drm_atomic_helper_commit_tail+0x58/0x90
[ 697.540400] ast_mode_config_helper_atomic_commit_tail+0x30/0x40 [ast]
[ 697.550009] commit_tail+0xfe/0x1d0
[ 697.556547] drm_atomic_helper_commit+0x198/0x1c0
This is a cosmetical problem. Enabling the video signal still works
even with the error message. The problem has always been present, but
only recent versions of the ast driver warn about missing the timeout.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 4e29cc7c5c67 ("drm/ast: astdp: Replace ast_dp_set_on_off()")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.13+
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250127134423.84266-1-tzimmermann@suse.de
|
|
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
All other(hwsp, hwctx and vmas) binaries follow this format:
[name].length: 0x1000
[name].data: xxxxxxx
[name].error: errno
The error one is just in case by some reason it was not able to
capture the binary.
So this GuC binaries should follow the same patern.
v2:
- renamed GUC binary to LOG
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-3-jose.souza@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.
The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.
v2: Add suffix description comment
v3: Reword explanation of xe_print_blob_ascii85() calling drm_puts()
in a loop
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: stable@vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-2-jose.souza@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Pull drm fixes from Simona Vetter:
"cgroup:
- fix Koncfig fallout from new dmem controller
Driver Changes:
- v3d NULL pointer regression fix in fence signalling race
- virtio: uaf in dma_buf free path
- xlnx: fix kerneldoc
- bochs: fix double-free on driver removal
- zynqmp: add missing locking to DP bridge driver
- amdgpu fixes all over:
- documentation, display, sriov, various hw block drivers
- use drm/sched helper
- mark some debug module options as unsafe
- amdkfd: mark some debug module options as unsafe, trap handler
updates, fix partial migration handling
DRM core:
- fix fbdev Kconfig select rules, improve tiled-based display
support"
* tag 'drm-next-2025-01-27' of https://gitlab.freedesktop.org/drm/kernel: (40 commits)
drm/amd/display: Optimize cursor position updates
drm/amd/display: Add hubp cache reset when powergating
drm/amd/amdgpu: Enable scratch data dump for mes 12
drm/amd: Clarify kdoc for amdgpu.gttsize
drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation
drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed
drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment
drm/amd/pm: Fix smu v13.0.6 caps initialization
drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks
revert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2"
revert "drm/amdgpu/pm: Implement SDMA queue reset for different asic"
drm/amd/pm: Add capability flags for SMU v13.0.6
drm/amd/display: fix SUBVP DC_DEBUG_MASK documentation
drm/amd/display: fix CEC DC_DEBUG_MASK documentation
drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL
drm/amdgpu: cache gpu pcie link width
drm/amd/display: mark static functions noinline_for_stack
drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler
drm/amdgpu: Refine ip detection log message
drm/amdgpu: Add handler for SDMA context empty
...
|
|
Having the exec queue snapshot inside a "GuC CT" section was always
wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section
headings and add tile info") tried to fix that bug, but with that also
broke the mesa tool that parses the devcoredump, hence it was reverted
in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool").
With the mesa tool also fixed, this can propagate as a fix on both
kernel and userspace side to avoid unnecessary headache for a debug
feature.
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: stable@vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123051112.1938193-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
When running igt@gem_exec_balancer@individual for multiple iterations,
it is seen that the delta busyness returned by PMU is 0. The issue stems
from a combination of 2 implementation specific details:
1) gt_park is throttling __update_guc_busyness_stats() so that it does
not hog PCI bandwidth for some use cases. (Ref: 59bcdb564b3ba)
2) busyness implementation always returns monotonically increasing
counters. (Ref: cf907f6d29421)
If an application queried an engine while it was active,
engine->stats.guc.running is set to true. Following that, if all PM
wakeref's are released, then gt is parked. At this time the throttling
of __update_guc_busyness_stats() may result in a missed update to the
running state of the engine (due to (1) above). This means subsequent
calls to guc_engine_busyness() will think that the engine is still
running and they will keep updating the cached counter (stats->total).
This results in an inflated cached counter.
Later when the application runs a workload and queries for busyness, we
return the cached value since it is larger than the actual value (due to
(2) above)
All subsequent queries will return the same large (inflated) value, so
the application sees a delta busyness of zero.
Fix the issue by resetting the running state of engines each time
intel_guc_busyness_park() is called.
v2: (Rodrigo)
- Use the correct tag in commit message
- Drop the redundant wakeref check in guc_engine_busyness() and update
commit message
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13366
Fixes: cf907f6d2942 ("i915/guc: Ensure busyness counter increases motonically")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123193839.2394694-1-umesh.nerlige.ramappa@intel.com
|
|
The steering code needs to know slice/subslice counts and this
information should be retrieved from the hwconfig table. However,
earlier platforms don't have it, hence the KMD has a fallback path.
Newer platforms really should have the entries and if they are missing
that is a bug that needs to be fixed in the table.
So update the complaint to be an error on newer platforms and remove
it completely for older ones that we know are bad (but are not POR for
the Xe driver anyway). Also, re-word the message a little to make it
clearer what the issue is.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250118005403.2960807-1-John.C.Harrison@Intel.com
|
|
Avoid hardcoding the LSPCON settle timeout because it takes a longer
time on certain chips made by certain vendors. Use the function that
already exists to determine the timeout.
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017075725.207384-1-giedriuswork@gmail.com
Acked-by: Simona Vetter <simona.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Since the GuC is reset during GT reset, we need to re-send the
entire SR-IOV provisioning configuration to the GuC. But since
this whole configuration is protected by the PF master mutex and
we can't avoid making allocations under this mutex (like during
LMEM provisioning), we can't do this reprovisioning from gt-reset
path if we want to be reclaim-safe. Move VFs reprovisioning to a
async worker that we will start from the gt-reset path.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250125215505.720-1-michal.wajdeczko@intel.com
|
|
The last use of live_context_for_engine() was removed in 2021 by
commit 99919be74aa3 ("drm/i915/gem: Zap the i915_gem_object_blt code")
Remove it.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250125003846.228514-1-linux@treblig.org
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Start using GuC buffer cache for the SRIOV policy configuration
actions. This is a required step before we could declare SRIOV
PF as being a reclaim safe.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124185247.676-1-michal.wajdeczko@intel.com
|
|
The CMTG is a timing generator that runs in parallel with transcoders
timing generators and can be used as a reference for synchronization.
We have observed that we are inheriting from GOP a display configuration
with the CMTG enabled. Because our driver doesn't currently implement
any CMTG sequences, the CMTG ends up still enabled after our driver
takes over.
We need to make sure that the CMTG is not enabled if we are not going to
use it. For that, let's add a partial implementation in our driver that
only cares about disabling the CMTG if it was found enabled during
initial hardware readout. In the future, we can also implement sequences
for using the CMTG if that becomes a needed feature.
For now, we only deal with cases when it is possible to disable the CMTG
without requiring a modeset. For earlier display versions, we simply
skip if we find the CMTG enabled and we can't disable it without a
proper modeset. In the future, we need to properly handle that case.
v2:
- DG2 does not have the CMTG. Update HAS_CMTG() accordingly.
- Update logic to force disabling of CMTG only for initial commit.
v3:
- Add missing changes for v2 that were staged but not committed.
v4:
- Avoid if/else duplication in intel_cmtg_dump_state() by using "n/a"
for CMTG B enabled/disabled string for platforms without it. (Jani)
- Prefer intel_cmtg_readout_hw_state() over intel_cmtg_readout_state().
(Jani)
- Use display struct instead of i915 as first parameter for
TRANS_DDI_FUNC_CTL2(). (Jani)
- Fewer continuation lines in variable declaration/initialization for
better readability. (Jani)
- Coding style improvements. (Jani)
- Use drm_dbg_kms() instead of drm_info() for logging the disabling
of the CMTG.
- Make struct intel_cmtg_state entirely private to intel_cmtg.c.
v5:
- Do the disable sequence as part of the sanitization step after
hardware readout instead of initial modeset commit. (Jani)
- Adapt to commit 15133582465f ("drm/i915/display: convert global state
to struct intel_display") by using a display struct instead of i915
as argument for intel_atomic_global_obj_init().
v6:
- Do not track CMTG state as a global state. (Ville)
- Simplify the driver logic by only disabling the CMTG only on cases
when a modeset is not required. (Ville)
v7:
- Remove the call to drm_WARN_ON() when checking
intel_cmtg_disable_requires_modeset() and use a FIXME in the comment
instead.
- Remove the !HAS_CMTG() guard from intel_cmtg_get_config(), which is
static and its caller is already protected by that same condition.
- Also take the opportunity to put some Bspec references in the commit
trailers section.
v8:
- Use HAS_TRANSCODER() instead of intel_crtc_for_pipe(). (Ville)
- Ensure transcoder power well is enabled before reading
TRANS_DDI_FUNC_CTL2. (Ville)
Bspec: 68915, 49262
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124173956.46534-1-gustavo.sousa@intel.com
|
|
Provide a PMU interface for GT C6 residency counters. The interface is
similar to the one available for i915, but gt is passed in the config
when creating the event.
Sample usage and output:
$ perf list | grep gt-c6
xe_0000_00_02.0/gt-c6-residency/ [Kernel PMU event]
$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
event=0x01
==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
ms
$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
# time counts unit events
1.001196056 1,001 ms xe_0000_00_02.0/gt-c6-residency,gt=0/
2.005216219 1,003 ms xe_0000_00_02.0/gt-c6-residency,gt=0/
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Add the generic support for defining new attributes. This only adds
the macros and common infra for the event counters, but no counters
yet. This is going to be added as follow up changes.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
When the event is created, make sure runtime pm is taken and later put:
in order to read an event counter the GPU needs to remain accessible and
doing a get/put during perf's read is not possible it's holding a
raw_spinlock.
Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Like other pmu drivers, keep the update separate from the read so it can
be called from other methods (like stop()) without side effects.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
XE_PMU_MAX_GT needs to be used due to a circular dependency, but we
should make sure it doesn't go out of sync with XE_PMU_MAX_GT. Add a
compile check for that.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Basic PMU enabling patch. Setup the basic framework
for adding events.
Based on previous versions by Bommu Krishnaiah, Aravind Iddamsetty and
Riana Tauro, using i915 and rapl as reference implementations.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Mainly individually changelogged singleton patches. The patch series
in this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap
library code
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
some cleanup and Rust preparation in the xarray library code
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven
fixes pathnames in some code comments
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
the new secs_to_jiffies() in various places where that is
appropriate
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API
- "Convert ocfs2 to use folios" from Matthew Wilcox does that
- "Remove get_task_comm() and print task comm directly" from Yafang
Shao removes now-unneeded calls to get_task_comm() in various
places
- "squashfs: reduce memory usage and update docs" from Phillip
Lougher implements some memory savings in squashfs and performs
some maintainability work
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented
with a corrupted image
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger
- "minmax.h: Cleanups and minor optimisations" from David Laight does
some maintenance work on the min/max library code
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
work on the xarray library code"
* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
ocfs2: use str_yes_no() and str_no_yes() helper functions
include/linux/lz4.h: add some missing macros
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Xarray: remove repeat check in xas_squash_marks()
Xarray: distinguish large entries correctly in xas_split_alloc()
Xarray: move forward index correctly in xas_pause()
Xarray: do not return sibling entries from xas_find_marked()
ipc/util.c: complete the kernel-doc function descriptions
gcov: clang: use correct function param names
latencytop: use correct kernel-doc format for func params
minmax.h: remove some #defines that are only expanded once
minmax.h: simplify the variants of clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: update some comments
minmax.h: add whitespace around operators and after commas
nilfs2: do not update mtime of renamed directory that is not moved
nilfs2: handle errors that nilfs_prepare_chunk() may return
CREDITS: fix spelling mistake
...
|
|
The ADV7511 chip allows 24 bits samples max in I2S mode, excepted for
direct AES3 mode (SNDRV_PCM_FORMAT_IEC958_SUBFRAME_LE format).
However the HDMI codec exposes S32_LE format as supported.
Adapt ADV7511 HDMI I2S format list to expose formats actually supported.
Signed-off-by: Olivier Moysan <olivier.moysan@foss.st.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250108170356.413063-4-olivier.moysan@foss.st.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
|
|
Set no_i2s_capture and no_spdif_capture flags in hdmi_codec_pdata structure
to report that the ADV7511 HDMI bridge does not support i2s or spdif audio
capture.
Signed-off-by: Olivier Moysan <olivier.moysan@foss.st.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250108170356.413063-2-olivier.moysan@foss.st.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
|
|
As the GSP message recv path is able to handle the return of large GSP
message, consume the return of large GSP message in the sending path.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-16-zhiw@nvidia.com
|
|
The max GSP message element size is 16 pages (including the headers). To
send a message larger than 16 pages, nvkm should split it into multiple
and send them accordingly. The first element has the expected function
number, while the rest are sent with function number as
NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD. GSP consumes the elements from
the cmdq and always writes the result back to the msgq. The result is also
formed as split elements.
However, nvkm is able to split the large GSP message and send them, but
totally not aware of handling the return of the large GSP message, which
are the split elements in the msgq. Thus, it keeps dumping the unknown RPC
messages from msgq, which is actually CONTINUATION_RECORD message,
discard them unexpectedly. Thus, the caller will not be able to consume
the result from GSP.
Introduce the handling of the return of large GSP message on the msgq path.
Slightly re-factor the low-level part of msg receiving routines. Merge the
split elements back into a large element before handling it to the upper
level. Thus, the upper-level of GSP RPC APIs don't need to be heavily
changed.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-15-zhiw@nvidia.com
|
|
Prepare for supporting receive the large GSP RPC message.
Factor out r535_gsp_msgq_recv_one_elem(). Fold its params into a data
structure of params. Move the allocation of the GSP RPC message to its
caller. Refine the variable names in the re-factor.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-14-zhiw@nvidia.com
|
|
To receive a GSP message queue element from the GSP status queue, the
driver needs to make sure there are available elements in the queue.
The previous r535_gsp_msgq_wait() consists of three functions, which is
a little too complicated for a single function:
- wait for an available element.
- peek the message element header in the queue.
- recevice the element from the queue.
Factor out r535_gsp_msgq_peek() and divide the functions in
r535_gsp_msgq_wait() into three functions.
No functional change is intended.
Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-13-zhiw@nvidia.com
|
|
Refine the name to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-12-zhiw@nvidia.com
|
|
The variable "msg" in r535_gsp_msg_recv() actually means the GSP RPC.
Refine the names to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-11-zhiw@nvidia.com
|