| Age | Commit message (Collapse) | Author |
|
Now that we introduced a new drm_output_color_format enum to represent
what DRM_COLOR_FORMAT_* bits were representing, we can switch to the new
enum.
The main difference is that while DRM_COLOR_FORMAT_ was a bitmask,
drm_output_color_format is a proper enum. However, the enum was done is
such a way than DRM_COLOR_FORMAT_X = BIT(DRM_OUTPUT_COLOR_FORMAT_X) so
the transitition is easier.
The only thing we need to consider is if the original code meant to use
that value as a bitmask, in which case we do need to keep the bit shift,
or as a discriminant in which case we don't.
Acked-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Link: https://lore.kernel.org/r/20260305-drm-rework-color-formats-v3-4-f3935f6db579@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Now that we introduced a new drm_output_color_format enum to represent
what DRM_COLOR_FORMAT_* bits were representing, we can switch to the new
enum.
The main difference is that while DRM_COLOR_FORMAT_ was a bitmask,
drm_output_color_format is a proper enum. However, the enum was done is
such a way than DRM_COLOR_FORMAT_X = BIT(DRM_OUTPUT_COLOR_FORMAT_X) so
the transitition is easier.
The only thing we need to consider is if the original code meant to use
that value as a bitmask, in which case we do need to keep the bit shift,
or as a discriminant in which case we don't.
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260305-drm-rework-color-formats-v3-3-f3935f6db579@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Now that we introduced a new drm_output_color_format enum to represent
what DRM_COLOR_FORMAT_* bits were representing, we can switch to the new
enum.
The main difference is that while DRM_COLOR_FORMAT_ was a bitmask,
drm_output_color_format is a proper enum. However, the enum was done is
such a way than DRM_COLOR_FORMAT_X = BIT(DRM_OUTPUT_COLOR_FORMAT_X) so
the transitition is easier.
The only thing we need to consider is if the original code meant to use
that value as a bitmask, in which case we do need to keep the bit shift,
or as a discriminant in which case we don't.
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260305-drm-rework-color-formats-v3-2-f3935f6db579@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Our xe-vfio-pci component relies on the confirmation from the PF
that VF FLR processing has finished, but due to the notification
latency on the HW/FW side, PF might be unaware yet of the already
triggered VF FLR.
Update VF state machine with new FLR_PREPARE state that indicate
imminent VF FLR notification and treat that as a begin of the FLR
sequence. Also introduce function that xe-vfio-pci should call to
guarantee correct synchronization.
v2: move PREPARE into WIP, update commit msg (Michal)
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Co-developed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patch.msgid.link/20260309152449.910636-2-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
The firmware will send the context reset notification message as
part of handling hardware recovery (HWR) events deecoding the message
and printing via drm_info(). This eliminates the "Unknown FWCCB command"
message that was previously printed.
Co-developed-by: Sarah Walker <sarah.walker@imgtec.com>
Signed-off-by: Sarah Walker <sarah.walker@imgtec.com>
Signed-off-by: Alexandru Dadu <alexandru.dadu@imgtec.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20260323-b4-firmware-context-reset-notification-handling-v3-3-1a66049a9a65@imgtec.com
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
|
|
Update the reset_reason fwif structure fields from enum to u32 to remove
any ambiguity from the interface (enum is not a fixed size thus is unfit
for the purpose of the data type).
Fixes: a26f067feac1f ("drm/imagination: Add FWIF headers")
Signed-off-by: Alexandru Dadu <alexandru.dadu@imgtec.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20260323-b4-firmware-context-reset-notification-handling-v3-2-1a66049a9a65@imgtec.com
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
|
|
Update the context reset reason enum with the missing reset reasons in
the 6-11 value gap:
- CDM Mission/safety checksum mismatch;
- TRP checksum mismatch;
- GPU ECC error (corrected, OK);
- GPU ECC error (uncorrected, HWR);
- FW ECC error (corrected, OK);
- FW ECC error (uncorrected, ERR);
Co-developed-by: Sarah Walker <sarah.walker@imgtec.com>
Signed-off-by: Sarah Walker <sarah.walker@imgtec.com>
Signed-off-by: Alexandru Dadu <alexandru.dadu@imgtec.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20260323-b4-firmware-context-reset-notification-handling-v3-1-1a66049a9a65@imgtec.com
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
|
|
The existing DPLL compute clock callback for the XE3PLPD platform
(`xe3plpd_crtc_compute_clock`) was specific to that platform. Replace it
with the more generic Haswell (`hsw_crtc_compute_clock`) implementation
so that the compute clock path does not rely on the XE3PLPD hook.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-25-mika.kahola@intel.com
|
|
xe3plpd platform is supported by dpll framework remove a separate
check for hw comparison and rely solely on dpll framework
hw comparison.
Finally, all required hooks are now in place so initialize
PLL manager for xe3plpd platform and remove the redirections
to the legacy code paths for clock enable/disable as well as
state mismatch checks that are no longer needed.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312101415.2669387-1-mika.kahola@intel.com
|
|
Remove LT PHY specific state verification as DPLL framework
has state verification check.
v2: Reuse intel_lt_phy_pll_compare_hw_state() as only config[0]
and config[0] parameters are reliable with LT PHY (Suraj)
v3: Rephrase handling of LT PHY case when verifying the state (CI)
v4: Fix checkpatch warning of line length exceeding 100 columns
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-23-mika.kahola@intel.com
|
|
Add the PLL hooks for the TBT PLL on xe3plpd. These are simple stubs
similar to the TBT PLL on earlier platforms, since this PLL is always
on from the display POV - so no PLL enable/disable programming is
required as opposed to the non-TBT PLLs - and the clocks for different
link rates are enabled/disabled at a different level, via the
intel_encoder::enable_clock()/disable_clock() interface.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-22-mika.kahola@intel.com
|
|
Reuse mtl_ddi_*_get_config functions now that all hooks are in place.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-21-mika.kahola@intel.com
|
|
Readout lane count back from HW. Reuse existing function
for Cx0 for LT PHY case with minor modification to add
lanes as function parameters.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-20-mika.kahola@intel.com
|
|
To increase debuggability add lane count as part of HW state dump.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-19-mika.kahola@intel.com
|
|
Add new pll_disable_clock functions so that they can be
hooked up to dpll->disable. This is just a wrapper over
the exitisting intel_xe3plpd_pll_disable to make it
compatible With dpll->disable function
v2: Revise commit message (Suraj)
Drop wrapper for TBT clock disabling and reuse
intel_mtl_pll_disable_clock() for DDI clock
disabling hook (Suraj)
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-18-mika.kahola@intel.com
|
|
Enable PLL clock on DDI by moving part of the PLL enabling
sequence into a DDI clock enabling function.
v2: Reuse intel_mtl_pll_enable_clock for DDI clock enabling
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-17-mika.kahola@intel.com
|
|
Add .crtc_get_dpll function pointer to support xe3plpd
platform.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-16-mika.kahola@intel.com
|
|
Add .get_freq function hook to support dpll framework for xe3plpd platform.
v2: Restore port clock calculation (Suraj)
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-15-mika.kahola@intel.com
|
|
Add .get_hw_state hook to xe3plpd platform for dpll framework
and update intel_lt_phy_pll_readout_hw_state() function
accordingly to support dpll framework.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-14-mika.kahola@intel.com
|
|
Add .compare_hw_state function pointer for xe3plpd platform
to support dpll framework.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-13-mika.kahola@intel.com
|
|
Add .dump_hw_state function pointer for xe3plpd platform
to support dpll framework. While at it, switch to use
drm_printer structure to print hw state information.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-12-mika.kahola@intel.com
|
|
Add .update_dpll_ref_clks function pointer to xe3plpd
platform to support dpll framework. Reuse ICL
function pointer.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-11-mika.kahola@intel.com
|
|
Add .update_active_dpll function pointer to support
dpll framework for xe3plpd platform. Reuse ICL function
pointer.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-10-mika.kahola@intel.com
|
|
Add .put_dplls function pointer to support xe3plpd platform
on dpll framework. Reuse ICL function pointer.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-9-mika.kahola@intel.com
|
|
Add .get_dplls function pointer for xe3plpd platforms
to support dpll framework. Reuse the ICL function
pointer.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-8-mika.kahola@intel.com
|
|
Add compute dpll hook for xe3plpd platform and bring
PLL state calculation to support PLL framework.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-7-mika.kahola@intel.com
|
|
Cache lane count as part of PLL state.
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-6-mika.kahola@intel.com
|
|
The LT PHY implementation currently pulls PLL and port_clock
information directly from the CRTC state. This ties the PHY
programming logic too tightly to the CRTC state and makes it
harder to clearly express the PHY’s own PLL configuration.
Introduce an explicit "struct intel_lt_phy_pll_state" argument
for the PHY functions and update callers accordingly.
No functional change is intended — this is a preparatory cleanup for
to bring LT PHY PLL handling as part of PLL framework.
v2: DP, HDMI 2.0, and HDMI FRL modes are port of the VDR configuration 0
register. These modes are defined by bits 2:0. Decode these to
differentiate DP and HDMI modes when programming PLL's. (Imre, Suraj)
v3: Pass port_clock as argument instead of recalculating it (Suraj)
v4: Fix checkpatch warning of line length exceeding 100 columns
BSpec: 744921
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-5-mika.kahola@intel.com
|
|
Start bringing in xe3plpd as part of dpll framework. The work is
started by adding PLL information and related function hooks.
v2: Fix xe3plpd type (Suraj)
Remove empty line between BSpec link and Signed-off-by (Suraj)
BSpec: 74304
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-4-mika.kahola@intel.com
|
|
Add check for PLL enabling and return early if
PLL is not enabled.
v2: Use PCLK PLL ACK bit to check if PLL is enabled (Suraj)
v3: Check only if PCLK PLL ACK bit for lane 0 is enabled (Suraj)
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-3-mika.kahola@intel.com
|
|
Dump missing PLL structure members ssc_enabled and tbt_mode
in order to enhance debugging.
v2: Drop addr_lsb and addr_msb printouts
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patch.msgid.link/20260312080657.2648265-2-mika.kahola@intel.com
|
|
The command-queue structure has a `dma_handle` method that returns the
DMA handle to the memory segment shared with the GSP. This works, but is
not ideal for the following reasons:
- That method is effectively only ever called once, and is technically
an accessor method since the handle doesn't change over time,
- It feels a bit out-of-place with the other methods of `Cmdq` which
only deal with the sending or receiving of messages,
- The method has `pub(crate)` visibility, allowing other driver code to
access this highly-sensitive handle.
Address all these issues by turning `dma_handle` into a struct member
with `pub(super)` visibility. This keeps the method space focused, and
also ensures the member is not visible outside of the modules that need
it.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260319-b4-cmdq-dma-handle-v1-1-57840b4a4f90@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
|
|
Clearing the DP tunnel stream BW in the atomic state involves getting
the tunnel group state, which can fail. Handle the error accordingly.
This fixes at least one issue where drm_dp_tunnel_atomic_set_stream_bw()
failed to get the tunnel group state returning -EDEADLK, which wasn't
handled. This lead to the ctx->contended warn later in modeset_lock()
while taking a WW mutex for another object in the same atomic state, and
thus within the same already contended WW context.
Moving intel_crtc_state_alloc() later would avoid freeing saved_state on
the error path; this stable patch leaves that simplification for a
follow-up.
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Fixes: a4efae87ecb2 ("drm/i915/dp: Compute DP tunnel BW during encoder state computation")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7617
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260320092900.13210-1-imre.deak@intel.com
(cherry picked from commit fb69d0076e687421188bc8103ab0e8e5825b1df1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging hangs,
so wedge the device with 'none' recovery method and have it available
to the user for debugging.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260305130720.3685754-4-raag.jadav@intel.com
|
|
Update log for 'none' recovery method for wedged event where driver wants
to hint "no recovery" without resetting the device from driver context.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260305130720.3685754-3-raag.jadav@intel.com
|
|
Remove all usages of dma::CoherentAllocation and use the new
dma::Coherent type instead.
Signed-off-by: Gary Guo <gary@garyguo.net>
Co-developed-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260320194626.36263-9-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Convert libos (LibosMemoryRegionInitArgument) and rmargs
(GspArgumentsPadded) to use CoherentBox / Coherent::init() and simplify
the initialization. This also avoids separate initialization on the
stack.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260320194626.36263-8-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Convert wpr_meta to use Coherent::init() and simplify the
initialization. It also avoids a separate initialization of
GspFwWprMeta on the stack.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://patch.msgid.link/20260320194626.36263-7-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
We need the latest GPU buddy changes from drm-misc-next-2026-03-12 in
drm-rust-next as well, as the Rust abstractions are built on top of it.
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
The hardware teams noticed that the originally documented workaround
steps for Wa_16025250150 may not be sufficient to fully avoid a hardware
issue. The workaround documentation has been augmented to suggest
programming one additional register; make the corresponding change in
the driver.
Fixes: 7654d51f1fd8 ("drm/xe/xe2hpg: Add Wa_16025250150")
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patch.msgid.link/20260319-wa_16025250150_part2-v1-1-46b1de1a31b2@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid,
otherwise PMFW will reject this configuration on smu v13.0.x
example:
$ sudo cat /sys/bus/pci/devices/<BDF>/gpu_od/fan_ctrl/fan_curve
OD_FAN_CURVE:
0: 0C 0%
1: 0C 0%
2: 0C 0%
3: 0C 0%
4: 0C 0%
OD_RANGE:
FAN_CURVE(hotspot temp): 0C 0C
FAN_CURVE(fan speed): 0% 0%
$ echo "0 50 40" | sudo tee fan_curve
kernel log:
[ 756.442527] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
[ 777.345800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
Closes: https://github.com/ROCm/amdgpu/issues/208
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 470891606c5a97b1d0d937e0aa67a3bed9fcb056)
Cc: stable@vger.kernel.org
|
|
When SET_UCLK_MAX capability is absent, return -EOPNOTSUPP from
smu_v13_0_6_emit_clk_levels() for OD_MCLK instead of 0. This makes
unsupported OD_MCLK reporting consistent with other clock types
and allows callers to skip the entry cleanly.
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d82e0a72d9189e8acd353988e1a57f85ce479e37)
Cc: stable@vger.kernel.org
|
|
Only reapply UCLK soft limits during PP_OD_RESTORE_DEFAULT when the
current max differs from the DPM table max. This avoids redundant
SMC updates and prevents -EINVAL on restore when no change is needed.
Fixes: b7a900344546 ("drm/amd/pm: Allow setting max UCLK on SMU v13.0.6")
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 17f11bbbc76c8e83c8474ea708316b1e3631d927)
|
|
[WHAT]
When a sink is connected, aconnector->drm_edid was overwritten without
freeing the previous allocation, causing a memory leak on resume.
[HOW]
Free the previous drm_edid before updating it.
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 52024a94e7111366141cfc5d888b2ef011f879e5)
Cc: stable@vger.kernel.org
|
|
PASID resue could cause interrupt issue when process
immediately runs into hw state left by previous
process exited with the same PASID, it's possible that
page faults are still pending in the IH ring buffer when
the process exits and frees up its PASID. To prevent the
case, it uses idr cyclic allocator same as kernel pid's.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8f1de51f49be692de137c8525106e0fce2d1912d)
Cc: stable@vger.kernel.org
|
|
amdgpu_device_get_job_timeout_settings() passes a pointer directly
to the global amdgpu_lockup_timeout[] buffer into strsep().
strsep() destructively replaces delimiter characters with '\0'
in-place.
On multi-GPU systems, this function is called once per device.
When a multi-value setting like "0,0,0,-1" is used, the first
GPU's call transforms the global buffer into "0\00\00\0-1". The
second GPU then sees only "0" (terminated at the first '\0'),
parses a single value, hits the single-value fallthrough
(index == 1), and applies timeout=0 to all rings — causing
immediate false job timeouts.
Fix this by copying into a stack-local array before calling
strsep(), so the global module parameter buffer remains intact
across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
(256) bytes, which is safe for the stack.
v2: wrap commit message to 72 columns, add Assisted-by tag.
v3: use stack array with strscpy() instead of kstrdup()/kfree()
to avoid unnecessary heap allocation (Christian).
This patch was developed with assistance from Claude (claude-opus-4-6).
Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 94d79f51efecb74be1d88dde66bdc8bfcca17935)
Cc: stable@vger.kernel.org
|
|
Starting with commit 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in
atomic check"), amdgpu resets the CRTC state mode_changed flag to false when
recomputing the DSC configuration results in no timing change for a particular
stream.
However, this is incorrect in scenarios where a change in MST/DSC configuration
happens in the same KMS commit as another (unrelated) mode change. For example,
the integrated panel of a laptop may be configured differently (e.g., HDR
enabled/disabled) depending on whether external screens are attached. In this
case, plugging in external DP-MST screens may result in the mode_changed flag
being dropped incorrectly for the integrated panel if its DSC configuration
did not change during precomputation in pre_validate_dsc().
At this point, however, dm_update_crtc_state() has already created new streams
for CRTCs with DSC-independent mode changes. In turn,
amdgpu_dm_commit_streams() will never release the old stream, resulting in a
memory leak. amdgpu_dm_atomic_commit_tail() will never acquire a reference to
the new stream either, which manifests as a use-after-free when the stream gets
disabled later on:
BUG: KASAN: use-after-free in dc_stream_release+0x25/0x90 [amdgpu]
Write of size 4 at addr ffff88813d836524 by task kworker/9:9/29977
Workqueue: events drm_mode_rmfb_work_fn
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xa0
print_address_description.constprop.0+0x88/0x320
? dc_stream_release+0x25/0x90 [amdgpu]
print_report+0xfc/0x1ff
? srso_alias_return_thunk+0x5/0xfbef5
? __virt_addr_valid+0x225/0x4e0
? dc_stream_release+0x25/0x90 [amdgpu]
kasan_report+0xe1/0x180
? dc_stream_release+0x25/0x90 [amdgpu]
kasan_check_range+0x125/0x200
dc_stream_release+0x25/0x90 [amdgpu]
dc_state_destruct+0x14d/0x5c0 [amdgpu]
dc_state_release.part.0+0x4e/0x130 [amdgpu]
dm_atomic_destroy_state+0x3f/0x70 [amdgpu]
drm_atomic_state_default_clear+0x8ee/0xf30
? drm_mode_object_put.part.0+0xb1/0x130
__drm_atomic_state_free+0x15c/0x2d0
atomic_remove_fb+0x67e/0x980
Since there is no reliable way of figuring out whether a CRTC has unrelated
mode changes pending at the time of DSC validation, remember the value of the
mode_changed flag from before the point where a CRTC was marked as potentially
affected by a change in DSC configuration. Reset the mode_changed flag to this
earlier value instead in pre_validate_dsc().
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5004
Fixes: 17ce8a6907f7 ("drm/amd/display: Add dsc pre-validation in atomic check")
Signed-off-by: Yussuf Khalil <dev@pp3345.net>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cc7c7121ae082b7b82891baa7280f1ff2608f22b)
|
|
Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid,
otherwise PMFW will reject this configuration on smu v13.0.x
example:
$ sudo cat /sys/bus/pci/devices/<BDF>/gpu_od/fan_ctrl/fan_curve
OD_FAN_CURVE:
0: 0C 0%
1: 0C 0%
2: 0C 0%
3: 0C 0%
4: 0C 0%
OD_RANGE:
FAN_CURVE(hotspot temp): 0C 0C
FAN_CURVE(fan speed): 0% 0%
$ echo "0 50 40" | sudo tee fan_curve
kernel log:
[ 756.442527] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
[ 777.345800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]!
Closes: https://github.com/ROCm/amdgpu/issues/208
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Instead of a dynamic allocation, use stack variable and let the caller
pass the maximum ranges that can be held in the buffer.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Rename the enum 'pixel_format' to 'dc_pixel_format' to avoid potential
name conflicts with the pixel_format struct defined in
include/video/pixel_format.h.
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|