summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2026-04-17drm/amd/display: add const qualifiers to watermark params structWenjing Liu
[why] There are few non const input pointer fields. Setting them to const to prevent future modification of read-only data. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: fix math_mod() using arg1 instead of arg2Wenjing Liu
[Why] math_mod() multiplied by arg1 instead of arg2, returning a wrong result for any non-trivial modulo operation. [How] Replace arg1 with arg2 in the subtraction term to correctly implement fmod(arg1, arg2). Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Use overlay cursor when color pipeline is activeAlex Hung
Force overlay cursor mode when an underlying plane has a non-bypassed color pipeline to avoid incorrect cursor transformation. Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Fix compiler warningsGaghik Khachatrian
[Why] Implicit conversions from wider integer types to byte-sized fields were generating compiler warnings. These warnings hide intentional protocol /storage boundaries and reduce signal quality during builds. Making conversion intent explicit improves readability and warning hygiene without changing behavior. [How] Added explicit, type-safe casts at intentional narrow-storage boundaries. Kept data models & runtime logic unchanged, only clarifying conversion intent. Functionality and behavior is unchanged; only type intent is explicit. Aligned warning cleanup with existing coding standards for explicit boundary conversions. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: fix NULL ptr deref in ISM delayed workRay Wu
dc_destroy() sets dm->dc to NULL before amdgpu_dm_ism_fini() is called, leaving a window where in-flight ISM delayed work dereferences the stale pointer. Call amdgpu_dm_ism_fini() in amdgpu_dm_fini() before dc_destroy(). Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)") Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Add missing do_mccs parameter descriptionSrinivasan Shanmugam
Add missing description for do_mccs parameter in amdgpu_dm_update_freesync_caps. Fixes the below with gcc W=1: ../display/amdgpu_dm/amdgpu_dm.c:13269 function parameter 'do_mccs' not described in 'amdgpu_dm_update_freesync_caps' Fixes: 8dc88c6a5948 ("drm/amd/display: Avoid to do MCCS transaction if unnecessary") Cc: Harry Wentland <harry.wentland@amd.com> Cc: Wayne Lin <Wayne.Lin@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Remove redundant includes from DCRoman Li
[Why] The explicit include of linux/array_size.h in Display Core (DC) is redundant. The ARRAY_SIZE macro is already provided by dm_services.h (via os_types.h) which DC includes. [How] Remove the unnecessary #include <linux/array_size.h> from dc_hw_sequencer.c and dce_clock_source.c. Fixes: 2d2366176445 ("drm/amd/display: Replace inline NUM_ELEMENTS macro with ARRAY_SIZE") CC: Linus Probert <linus.probert@gmail.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Promote DC to 3.2.377Taimur Hassan
This version brings along the following updates: - Enable sink freesync via MCCS with pcon whitelist adjustments - Rework YCbCr422 DSC policy - Update DML2.1 parameters - Fix coding style issues and compiler warnings Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Fix coding style issueChuanyu Tseng
[Why & How] Function logic should put after variable declare section, so let's move it. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Remove Duplicate Prefetch ParameterZheng, Austin
[Why/How] UrgLatency value is passed in twice to the prefetch calculations. Once through the UrgentLatency term and once through the Turg term. Only Turg is used in the prefetch calculation so remove the unused UrgentLatency parameter Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Zheng, Austin <Austin.Zheng@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Add DCN42 PMO policy for DML2.1Nicholas Kazlauskas
[Why] The MinTTU policy in DML2.1 does not guarantee that we support p-state in blank. This is a delta vs dml2 and earlier revisions as the prefetch mode override has been removed in favor of a more configurable pstate optimizer. [How] Split off DCN42 with its own PMO helpers so that we can use a simpler strategy of only allowing the mode if we support p-state in vblank and if vactive has enough latency hiding. The actual hookup to use these helpers in the PMO factory will be done in a later patch to satisfy build system requirements. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: move memory latency update to dml for dcn42Dmytro Laktyushkin
Memory latencies are soc specific and should be part of dml soc bounding box. This change removes them from clk_mgr and has latency update happen based on memory type when dml socbb is being updated. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Fix implicit narrowing conversions in modulesGaghik Khachatrian
[Why]: Implicit narrowing of wider integer types (unsigned int, uint64_t) into narrower fields (uint8_t, uint16_t, unsigned short) has potential truncation issues. [How]: For each warning site, added ASSERT(<value> <= 0xFFFF/0xFF) for debug-mode bounds verification followed by an explicit cast. Typed intermediate variables introduced where needed for clarity. No functional change intended. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: update dcn42 memory latenciesDmytro Laktyushkin
Add latency update based on memory type to dml2.1 Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Fix DCN42 gpuvm_min_page_size_kbytes in SOC BBNicholas Kazlauskas
[Why & How] To match the HW specification this should be 4, not 256. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Pass min page size from SOC BB to dml2_1 plane configNicholas Kazlauskas
[Why] Like dml2_0 this isn't guaranteed to be constant for every ASIC. This can cause corruption or underflow for linear surfaces due to a wrong PTE_ROW_HEIGHT_LINEAR value if not correctly specified. [How] Like dml2_0 pass in the SOC bb into the plane configuration population functions. Set both GPUVM and HostVM page sizes in the overrides. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Correct MALL parameters for DCN42 soc bbNicholas Kazlauskas
[Why & How] The MALL and DCC parameters were copied and pasted from a previous ASIC but the correct value per HW specification should all be 0. If not correct this can impact urgent bandwidth calculation and PMO. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Fix HostVMMinPageSize unit mismatch in DML2.1Nicholas Kazlauskas
[Why] This was found back on DML2 but was missed when creating DML2.1. The bottom layer calculation (CalculateHostVMDynamicLevels) expects a value in bytes, not KB, but we pass in the value in KB (eg. 4). This causes an extra page table level to be required in the prefetch bytes which can be significant overhead - preventing some modes from being supported that should otherwise be. [How] Correct the units by multiplying the input and override values by 1024. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Avoid to do MCCS transaction if unnecessaryWayne Lin
We don't have to do MCCS/DDCCI transactions with sink side every time by calling get_modes(). Limit it to be operated when hotplug occurs. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Enable sink freesync via MCCSWayne Lin
If sink like HDMI indicates supporting freesync via MCCS, explicitly to send vcp set command on sink to enable freesync. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Read sink freesync support via mccsWayne Lin
If EDID AMD VSDB declares that sink supports MCCS method for freesync usage, send mccs request to understand sink freesync current supporting state. If sink supports freesync but user toggles OSD to turn off it, disable freesync. If HDMI sink doesn't support MCCS method for freesync usage, disable freesync as well. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Parse freesync mccs vcp codeWayne Lin
[Why & How] DMUB supports to parse freesynce mccs vcp code now. Store it for later freesync mccs manipulation. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Adjust freesync pcon whitelistWayne Lin
Add more freesync supported pcon ID into the whitelist. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Remove unnecessary Freesync w/a from DCN32George Shen
[Why/How] A workaround was previously used for certain Freesync cases that would override the vstartup_start value from DML to position the SDP correctly. This is no longer needed in DCN32 and above, so remove the workaround. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Rework YCbCr422 DSC policyRelja Vojvodic
- Reworked YCbCr4:2:2 Native/Simple policy decision making with DSC enabled based on DSC caps and stream signal type Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Relja Vojvodic <Relja.Vojvodic@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: update dcn42 bounding boxCharlene Liu
[why] update according hw spec. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Drop unused tiling formats from dml2Roman Li
Remove unused legacy tiling format support from dml2. Legacy asics don't use dml2. Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373") Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/display: Fix unused parameters warnings in dml2_0Gaghik Khachatrian
[Why] Resolve warnings by marking unused parameters explicitly. [How] Keep parameter names in signatures and add a line with '(void)param;' inside the function body Preserved function signatures and avoids breaking code paths that may reference the parameter under conditional compilation. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Clayton King <clayton.king@amd.com> Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu/mes_v12_1: Fix iterator reuse in mes_v12_1_test_ring()Srinivasan Shanmugam
This code waits for the MES self-test to complete by repeatedly checking a register or memory value until it becomes valid or a timeout occurs. The fix ensures the timeout counter works correctly by not reusing the same variable inside another loop. mes_v12_1_test_ring() uses 'i' as the outer timeout loop counter, but reuses the same variable for the inner XCC scan in cooperative mode. This makes the timeout counter ambiguous and can lead to incorrect timeout handling. It also triggers a Smatch warning about reusing the outer loop iterator. Fix this by introducing a separate iterator for the inner XCC loop so that 'i' continues to represent only the timeout wait duration. drivers/gpu/drm/amd/amdgpu/mes_v12_1.c:2080 mes_v12_1_test_ring() warn: reusing outside iterator: 'i' drivers/gpu/drm/amd/amdgpu/mes_v12_1.c 2069 atomic64_set((atomic64_t *)wptr_cpu_addr, wptr); 2070 WDOORBELL64(doorbell_idx, wptr); 2071 2072 for (i = 0; i < adev->usec_timeout; i++) { i is counting usec 2073 if (queue_type == AMDGPU_RING_TYPE_SDMA) { 2074 tmp = le32_to_cpu(*cpu_ptr); 2075 } else { 2076 if (!adev->mes.enable_coop_mode) { 2077 tmp = RREG32_SOC15(GC, GET_INST(GC, xcc_id), 2078 regSCRATCH_REG0); 2079 } else { --> 2080 for (i = 0; i < num_xcc; i++) { and then re-used to count something else Fixes: 44e5195fa3d4 ("drm/amdgpu/mes_v12_1: add mes self test") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jack Xiao <Jack.Xiao@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/pm: add od table upload error message parsing for smu v14.0.xYang Wang
parse and print detailed reasons for od table upload failures to help users understand error causes. example: $ echo "0 30 40" | sudo tee fan_curve $ echo "1 40 30" | sudo tee fan_curve $ echo "c" | sudo tee fan_curve kernel log: [ 75.040174] amdgpu 0000:0a:00.0: Failed to upload overdrive table, ret:-5 [ 75.040178] amdgpu 0000:0a:00.0: Invalid overdrive table content: OD_FAN_CURVE_PWM_ERROR (13) [ 75.040181] amdgpu 0000:0a:00.0: Failed to upload overdrive table! Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/pm: add read arg support to smu_cmn_update_tableYang Wang
Extend the smu_cmn_update_table function to support reading a 32-bit return argument from the SMU firmware during table transfer operations. - Rename the original function to smu_cmn_update_table_read_arg - Add a uint32_t *read_arg output parameter to capture firmware response - Pass the read_arg pointer to the SMU message command - Keep full backward compatibility using a macro wrapper for the old API This allows the driver to retrieve status codes, results, or configuration feedback from the SMU firmware after table data transfer. No functional changes for existing users of the original smu_cmn_update_table() API. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/pm: fix runtime PM imbalance issue in amdgpu_pm.cYang Wang
Fix runtime PM counter imbalance to prevent device from failing to enter low power state Fixes: a50d32c41fb2 ("drm/amd/pm: Deprecate print_clock_levels interface") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu/sdma7.1: add support for disable_kqAlex Deucher
Plumb in support for disabling kernel queues and make it the default. For testing, kernel queues can be re-enabled by setting amdgpu.user_queue=0. Kernel queues are still created for use by the kernel driver for memory management, etc., just not user submissions. Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu: fix IP discovery v0 handlingfilippor
Cyan skillfish uses IP discovery v0. This was broken when the IP discovery was refactored for newer versions. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5189 Fixes: d0c647a6aae2 ("drm/amdgpu/discovery: support new discovery binary header") Signed-off-by: filippor <filippo.rossoni@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/pm: Fix mode2 reset ACK handling on aldebaran v2Srinivasan Shanmugam
aldebaran_mode2_reset() sends a mode2 reset message and waits for an acknowledgment from the SMU. The current ACK handling is incorrect. The wait loop runs only when ret is -ETIME. But after a successful async send, ret is 0. Because of this, the loop is skipped and the code does not wait for the reset acknowledgment. Also, the code checks for ret != 1 after calling smu_msg_wait_response(). However, smu_msg_wait_response() returns 0 on success and negative error codes on failure. So checking against 1 is wrong. Return -EOPNOTSUPP when the firmware does not support this reset message. Fix this by setting ret to -ETIME before entering the wait loop, checking for ret != 0 after getting the SMU response, and returning -EOPNOTSUPP when the firmware does not support the message. v2: - Update ACK check to use ret != 0 instead of ret != 1, since smu_msg_wait_response() returns 0 on success (Feifei) - Remove unnecessary handling for ret == 0 Fixes: e42569d02acb ("drm/amd/pm: Modify mode2 msg sequence on aldebaran") Reported-by: Dan Carpenter <error27@gmail.com> Cc: Feifei Xu <Feifei.Xu@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/pm: smu7: Remove stale error check in smu7_hwmgr_backend_initSrinivasan Shanmugam
smu7_hwmgr_backend_init() is responsible for initializing the SMU7 power management backend. It allocates and sets up the backend structure, initializes voltage tables, configures dependency tables, and prepares platform-specific power and clock parameters. The function follows a typical pattern where each initialization step returns a status in "result", and failures are handled via a common "goto fail" path that performs cleanup. Commit 2c21648bb814 ("drm/amd/pm/smu7: Remove non-functional SMU7 voltage dependency on DAL") removed a function call in this initialization sequence, but left behind the corresponding error check. As a result, "result" is checked twice without being updated in between: result = smu7_init_voltage_dependency_on_display_clock_table(hwmgr); if (result) goto fail; ... if (result) goto fail; The second check is redundant and unreachable for any new failure, since no operation modifies "result" between the two checks. This triggers a Smatch warning about a duplicate zero check and reduces code clarity. Remove the stale error check to keep the control flow correct and readable. Fixes: 9f49e3d4cb86 ("drm/amd/pm/smu7: Remove non-functional SMU7 voltage dependency on DAL") Reported-by: Dan Carpenter <error27@gmail.com> Cc: Timur Kristóf <timur.kristof@gmail.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/ras: Avoid ECC status update in hw_fini for VF unloadCe Sun
VF sends IDH_REQ_GPU_FINI_ACCESS before hw_fini during unload. PF no longer accepts requests, so skip ECC status update to prevent mailbox timeout. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu: fix CPER ring header parsingXiang Liu
amdgpu_cper_ring_get_ent_sz() parses CPER headers directly from the circular ring buffer to determine the current entry size. When the ring is full and the write pointer lands near the end of the buffer, the header can wrap across the ring boundary. The existing code treats the 4-byte CPER signature as a C string and uses strcmp() on in-ring binary data, then reads record_length through a direct struct pointer cast. Both assumptions are unsafe for wrapped entries and can read past the end of the ring mapping. Fix the parser by comparing the signature as raw bytes and by copying the header into a local buffer before reading record_length, handling wraparound explicitly in both cases. This avoids out-of-bounds reads in amdgpu_cper_ring_get_ent_sz() when the CPER ring is full or the current entry starts at the tail of the ring. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu: fix heap buffer overflow in amdgpu_coredump ring dumpVitaly Prosyak
The off variable in the ring content dump loop tracks a byte offset accumulated from ring->ring_size (which is in bytes), but it is used as an index into u32 *rings_dw. C pointer arithmetic on a u32 pointer automatically multiplies the index by sizeof(u32) = 4, so the actual byte address accessed is: &rings_dw[off] == (char *)rings_dw + off * 4 This means off is effectively quadrupled, causing a 4x overshoot. Concrete example -- two rings, each ring_size = 8 192 bytes (8 KB): total_ring_size = 16 384 bytes rings_dw = kzalloc(16 384) /* 16 KB buffer */ Ring 0: off = 0 memcpy(&rings_dw[0], ring0->ring, 8192) -> writes bytes 0 .. 8 191 OK off += ring->ring_size -> off = 8 192 (BUG) Ring 1: off = 8 192 memcpy(&rings_dw[8192], ring1->ring, 8192) -> actual byte offset = 8 192 * 4 = 32 768 -> writes bytes 32 768 .. 40 959 -> but buffer is only 16 384 bytes! OVERFLOW With the fix (off += ring->ring_size / 4): Ring 0: off = 0 memcpy(&rings_dw[0], ring0->ring, 8192) OK off += 8 192 / 4 -> off = 2 048 Ring 1: off = 2 048 memcpy(&rings_dw[2048], ring1->ring, 8192) -> byte offset = 2 048 * 4 = 8 192 -> writes bytes 8 192 .. 16 383 OK KASAN catches the overflow as a slab-use-after-free when the write lands on a quarantined slab object: BUG: KASAN: slab-use-after-free in amdgpu_coredump+0x775/0x13c0 [amdgpu] Write of size 8192 at addr ffff8890b2400000 by task kworker/u128:1/329 Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] Call Trace: __asan_memcpy+0x3c/0x60 amdgpu_coredump+0x775/0x13c0 [amdgpu] amdgpu_job_timedout+0xdb5/0x1420 [amdgpu] The corrupted object was a 4 KB drm_exec buffer from a completed amdgpu_cs_ioctl -- the ring dump memcpy overshot into this freed slab region. Fix by accumulating off in dword units (ring->ring_size / 4) so the u32* indexing produces the correct byte address. The reader in amdgpu_devcoredump_format() already consumes the stored offset as a dword index (rings_dw[off + j / 4]), so no change is needed there. Fixes: eea85914d15b ("drm/amdgpu: save ring content before resetting the device") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu: correct single device PCIe reset flow for DPCCe Sun
For triggering the dpc event with a single device, we still need to set the in_link_reset flag and the dpc status. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_formatVitaly Prosyak
A race condition in the devcoredump code causes a NULL pointer dereference in amdgpu_devcoredump_format() when multiple GPU resets occur in quick succession. The sequence of events: 1. First reset calls amdgpu_coredump(), creates coredump1, sets adev->coredump = coredump1, and queues the deferred work. 2. The deferred work begins executing (work_pending() returns false since the work is now running, not just queued). 3. A second reset calls amdgpu_coredump(). work_pending() returns false because the work is running, so amdgpu_coredump() proceeds: creates coredump2, overwrites adev->coredump = coredump2, and re-queues the deferred work with queue_work(). 4. The first deferred work finishes and unconditionally sets adev->coredump = NULL, destroying the reference to coredump2. 5. The re-queued deferred work starts and reads adev->coredump = NULL. It then passes this NULL into amdgpu_devcoredump_format() which dereferences coredump->adev (offset 0 in the struct), triggering: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:amdgpu_devcoredump_format+0xa6/0x36b0 [amdgpu] This was observed during the amd_deadlock IGT test where multiple subtests trigger rapid ring resets. The dmesg log shows four coredumps created within 120ms (at 102.377s, 104.424s, 104.492s, and 104.497s), with the crash occurring 13ms after the last one. Fix this with two changes: - Replace work_pending() with work_busy() in amdgpu_coredump() to also reject new coredumps while the deferred work is executing, not just when it is queued. This closes the main race window. - Add a defensive NULL check for adev->coredump at the start of amdgpu_devcoredump_deferred_work() to prevent the crash if the race still occurs (work_busy() is advisory, not a full barrier). v2: Drop the job->pasid NULL guard -- that fix was independently submitted and merged as commit 4c1f0a162da5 ("drm/amdgpu: add job->pasid in check as amdgpu_job could be NULL") by Sunil Khatri, reviewed by Christian König. Integrate with that patch as suggested by Christian. Fixes: 4bbba79a7f1d ("drm/amdgpu: move devcoredump generation to a worker") Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdgpu: add job->pasid in check as amdgpu_job could be NULLSunil Khatri
In below stack job->pasid is accessed while job is NULL. Access it within the check when job is non NULL. Failure call stack. [ 222.653622] BUG: kernel NULL pointer dereference, address: 000000000000014c [ 222.653625] #PF: supervisor read access in kernel mode [ 222.653628] #PF: error_code(0x0000) - not-present page [ 222.653630] PGD 0 P4D 0 [ 222.653635] Oops: Oops: 0000 [#1] SMP NOPTI [ 222.653639] CPU: 1 UID: 0 PID: 12 Comm: kworker/u96:0 Not tainted 6.19.0-amd-staging-drm-next #271 PREEMPT(voluntary) [ 222.653644] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F37c 05/12/2022 [ 222.653646] Workqueue: amdgpu-reset-dev amdgpu_userq_reset_work [amdgpu] [ 222.653961] RIP: 0010:amdgpu_coredump+0x8b/0x470 [amdgpu] [ 222.654158] Code: 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 c9 31 ff 31 d2 31 f6 45 31 c0 45 31 db e9 8c a9 1a e2 88 58 48 44 88 68 49 <41> 8b b7 4c 01 00 00 89 b0 80 00 00 00 4d 85 ff 48 89 45 d0 0f 84 [ 222.654161] RSP: 0018:ffffce68c0147c00 EFLAGS: 00010282 [ 222.654165] RAX: ffff8bc337407740 RBX: 0000000000000000 RCX: 0000000000000000 [ 222.654167] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 222.654170] RBP: ffffce68c0147c48 R08: 0000000000000000 R09: 0000000000000000 [ 222.654172] R10: ffff8bc337407740 R11: ffffffffc10dda10 R12: ffff8bc2d2e00000 [ 222.654174] R13: 0000000000000001 R14: ffff8bc2d2e5b368 R15: 0000000000000000 [ 222.654176] FS: 0000000000000000(0000) GS:ffff8bc64a5fe000(0000) knlGS:0000000000000000 [ 222.654179] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.654182] CR2: 000000000000014c CR3: 0000000135eca000 CR4: 0000000000350ef0 [ 222.654184] Call Trace: [ 222.654187] <TASK> [ 222.654190] ? amdgpu_ip_block_resume+0x28/0x70 [amdgpu] [ 222.654376] ? srso_return_thunk+0x5/0x5f [ 222.654382] amdgpu_device_reinit_after_reset+0x184/0x320 [amdgpu] [ 222.654552] amdgpu_do_asic_reset+0x129/0x160 [amdgpu] [ 222.654720] amdgpu_device_asic_reset+0x92/0x710 [amdgpu] [ 222.654890] amdgpu_device_gpu_recover+0x2ae/0x3d0 [amdgpu] [ 222.655060] amdgpu_userq_reset_work+0x76/0xa0 [amdgpu] [ 222.655229] process_scheduled_works+0x1f0/0x450 [ 222.655235] worker_thread+0x27f/0x370 Fixes: 32ab301b89b3 ("drm/amdgpu: store ib info for devcoredump") Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amdkfd: Clear VRAM on allocation to prevent stale data exposureAmir Shetaia
KFD VRAM allocations set AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE but not AMDGPU_GEM_CREATE_VRAM_CLEARED, leaving freshly allocated VRAM with stale data from prior use observable by compute kernels. The GEM ioctl path already sets VRAM_CLEARED for all userspace allocations via amdgpu_gem_create_ioctl() and amdgpu_mode_dumb_create(). The KFD path was missing this flag, allowing stale page table remnants to leak into user buffers. This causes crashes in RCCL P2P transport where non-zero data in ptrExchange/head/tail fields corrupts the protocol handshake. Signed-off-by: Amir Shetaia <Amir.Shetaia@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-04-17drm/amdgpu: Use NBIF offset for register RCC_STRAP0_RCC_DEV0_EPF0_STRAP0 .Ramalingeswara Reddy, Kanala
Define and use regRCC_STRAP0_RCC_DEV0_EPF0_STRAP0_nbif_4_10, to get correct rev_id in nbif_v6_3_1_get_rev_id(). Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Ramalingeswara Reddy, Kanala <Kanala.RamalingeswaraReddy@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-04-17drm/amd: Add missing firmware declaration for PSP v15.0.0Mario Limonciello
PSP v15.0.0 needs both TOC and TA firmware. Without the declaration it won't get included in initramfs and leads to following failure: ``` Direct firmware load for amdgpu/psp_15_0_0_ta.bin failed with error -2 early_init of IP block <psp> failed -19 Fatal error during GPU init ``` Fixes: 9b24f63d825e7 ("drm/amdgpu: Enable support for PSP 15_0_0") Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-04-17amdgpu/jpeg: fix deepsleep register for jpeg 5_0_0 and 5_0_2David (Ming Qiang) Wu
PCTL0__MMHUB_DEEPSLEEP_IB is 0x69004 on MMHUB 4,1,0 and and 0x60804 on MMHUB 4,2,0. 0x62a04 is on MMHUB 1,8,0/1. The DS bits are adjusted to cover more JPEG engines and MMHUB version. Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-04-17drm/amdgpu: gate VM CPU HDP flush on reset lockChenglei Xie
During GPU reset, the application could still run CPU page table updates. Each commit called amdgpu_device_flush_hdp(), which on SR-IOV sends work through the KIQ ring. That can advance sync_seq while the GPU is being reset, leaving fence writeback out of sync and causing amdgpu_fence_emit_polling() to time out on later KIQ use. Fix: amdgpu_vm_cpu_commit(): Reset will flush HDP anyway, the HDP flush in amdgpu_vm_cpu_commit() can be skipped when a reset is ongoging. Take reset_domain->sem with down_read_trylock() before amdgpu_device_flush_hdp(). If the reset path holds the write lock, skip the HDP flush so no HDP-related HW access (including KIQ) runs during reset; state is re-established after reset. Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-04-17drm/amdgpu: Use SMUIO 15.0.0 offsets for TSC upper and lower count.Ramalingeswara Reddy, Kanala
Define and use regGOLDEN_TSC_COUNT_UPPER_smu_15_0_0 and regGOLDEN_TSC_COUNT_LOWER_smu_15_0_0 for TSC upper and lower count. Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Ramalingeswara Reddy, Kanala <Kanala.RamalingeswaraReddy@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-04-17drm/amdgpu: Remove sys file compute_partition_mem_alloc_mode at module unloadXiaogang Chen
Module reload would fail when create sys file that was not removed during module unload. Fixes: e0e9792ea2d4 ("drm/amdgpu: add an option to allow gpu partition allocate all available memory") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-17drm/amd/pm: fix incorrect FeatureCtrlMask setting on smu v14.0.xYang Wang
OverDriveTable.FanMinimumPwm and FeatureCtrlMask.PP_OD_FEATURE_FAN_LEGACY_BIT have a hard dependency. Invalid handling of this dependency leads to disabled thermal monitoring and temperature boundary validation. v2: squash in typo fix (Yang) Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3") Cc: stable@vger.kernel.org Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>