summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-11-04drm/amd/ras: Add ras support for nbio v7_9_1YiPeng Chai
Add ras support for nbio v7_9_1. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: Add ras ip block nameYiPeng Chai
Add ras ip block name. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Increase ras switch control rangeYiPeng Chai
Increase ras switch control range. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu/smu: Handle S0ix for vangoghAlex Deucher
Fix the flows for S0ix. There is no need to stop rlc or reintialize PMFW in S0ix. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Antheas Kapenekakis <lkml@antheas.dev> Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Update SMUv13.0.12 partition metricsLijo Lazar
Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Update SMUv13.0.6 partition metricsLijo Lazar
For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Add schema v1.1 for parition metricsLijo Lazar
Use a schema similar to gpu metrics v1.9 for partition metrics also. It will have field type encoded followed by the field value(s). The attribute ids used will be shared with gpu metrics. The structure definition is only to distinguish between gpu metrics and partition metrics though both gpu metrics v1.9 and partition metrics v1.1 follow the same definition. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12Lijo Lazar
Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: validate the bo from done list for NULLSunil Khatri
Make sure the bo is valid before using it. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: lock bo before calling amdgpu_vm_bo_update_sharedPierre-Eric Pelloux-Prayer
BO's reservation object must be locked before using amdgpu_vm_bo_update_shared otherwise dma_resv_assert_held will complain in amdgpu_vm_update_shared. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: grab a BO reference in vm_lock_done_list.Christian König
Otherwise it is possible that between dropping the status lock and locking the BO that the BO is freed up. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Fix format truncationXiang Liu
../ras/rascore/ras_cper.c: In function ‘cper_generate_fatal_record.isra’: ../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=] 75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id, | ^~~~ ../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935] 75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id, | ^~~~~~~~~ ../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9 75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 76 | RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../ras/rascore/ras_cper.c: In function ‘cper_generate_runtime_record.isra’: ../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=] 75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id, | ^~~~ ../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935] 75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id, | ^~~~~~~~~ ../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9 75 | snprintf(record_id, 9, "%d:%llX", dev_info.socket_id, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 76 | RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Promote DC to 3.2.357Taimur Hassan
This version brings along following update: - HDCP2 FW locality check refactors - Fix black screen issue with HDMI output - Increase IB mem size - Revert max buffered cursor size to 64 - Extend inbox0 lock to run Replay / PSR - Refactor VActive implementation - Add Pstate viewport reduction - Persist stream refcount through restore Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: [FW Promotion] Release 0.1.34.0Taimur Hassan
Release hightlights DCN35/36 * Dynamically clock gate before and after prefetch Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Fix black screen with HDMI outputsAlex Hung
[Why & How] This fixes the black screen issue on certain APUs with HDMI, accompanied by the following messages: amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info frame on connector DP-1: -22 amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm] Cannot find any crtc or sizes Fixes: 489f0f600ce2 ("drm/amd/display: Fix DVI-D/HDMI adapters") Suggested-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Increase IB mem sizeAlvin Lee
[Why & How] Increase IB mem size to match size of largest structure that will use IB transfer between driver and DMU. Reviewed-by: Oleh Kuzhylnyi <oleh.kuzhylnyi@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Revert DCN4 max buffered cursor size to 64Dillon Varone
[Why & How] The buffered cursor cap is expressed assuming a square cursor, and usage of the cursor buffer is limited by the request size. For greater than 32 pixels, the request size is fixed at 256 bytes, so the maximum width must be floored to the nearest 256th byte. At 4bpp this means even with 24kB DCN4 can only hold a 64x64 cursor in the buffer as even 65 pixels would require 512 bytes per line instead of 256. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Persist stream refcount through restoreJoshua Aberback
[Why & How] Overwriting the refcount on stream restore can lead to double-free errors or memory leaks if an unbalanced number of retains and releases occurs between a backup and restore. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Add Pstate viewport reductionAustin Zheng
[Why/How] Add struct to hold calculated reduced viewport pstate recout reduction lines per plane Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Refactor VActive implementationAustin Zheng
[Why & How] Refactors VActive accounting in PMO, and breaks down fill time requirement by P-State type as it can result in drasitcally different bandwidth requirements depending on the blackout length. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Update P-state naming for clarity.Austin Zheng
[Why & How] P-state can refer to different things like UCLK P-state, PPT, or temp read Update naming for clarity Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Remove old PMO optionsAustin Zheng
[Why & How] Removes deprecated or unused PMO options. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Add pte_buffer_mode and force_one_row_for_frame in dchub regAustin Zheng
[Why & How] Update structs for rq regs Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Extend inbox0 lock to run Replay/PSRAndrew Mazour
[Why] The inbox1 infrastructure is deprecated, so to support display power features requiring a DMUB interlock moving forward extend the inbox0 locking conditions to also include Replay or PSR. [How] Implemented a series of changes to improve HW lock handling: - Deprecated should_use_dmub_inbox1_lock() and guarded it with DCN401 flag. - Migrated lock checks into inbox0 helpers and added PSR/Replay enablement checks to ensure correct behavior. - Updated HWSS fast update path to acquire HW lock as needed using the new helpers. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Andrew Mazour <Andrew.Mazour@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: fw locality check refactorsWenjing Liu
[why] There are some new changes for HDCP2 firmware locality check. The implementation doesn't perfectly fit the intended design and clarity. 1. Clarify and consolidate variable responsibilities. The previous implementation introduced the following variables: - config.ddc.funcs.atomic_write_poll_read_i2c (optional pointer) - hdcp->config.ddc.funcs.atomic_write_poll_read_aux (optional pointer) - hdcp->connection.link.adjust.hdcp2.force_sw_locality_check (bool) - hdcp->config.debug.lc_enable_sw_fallback (bool) - use_fw (bool) They will be used together to determine two operations: - Whether to use FW locality check - Whether to use SW fallback on FW locality check failure The refactor streamlines this by introducing two variables in the hdcp2 link adjustment, while ensuring function pointers are always assigned and remain independent from policy decisions: - use_fw_locality_check (bool) -> true if fw locality should be used. - use_sw_locality_fallback (bool) -> true to reset use_fw_locality_check back to false and retry on fw locality check failure. 2. Mixed meanings of l_prime_read transition input l_prime_read originally means if l_prime is read when sw locality check is used. When FW locality check is used, l_prime_read means if lc init write, l prime poll and l_prime read combo operation is successful. The mix of meanings is confusing. The refactor introduces a new variable l_prime_combo_read to isolate the second meaning into its own variable. 3. Missing specific error code on firmware locality error. The original change reuses the generic DDC failure error code when firmware fails to return locality check result. This is not ideal as DDC failure indicates an error occurred during an I2C/AUX transaction. FW locality failure could be caused by polling timeout in firmware or failure to acquire firmware access. Which sits at a higher level of abstraction above DDC hardware. An incorrect error code could mislead the debug into a wrong direction. 4. Correcting misplaced comments. The previous implementation of the firmware locality check resulted in some comments in hdcp2_transition being incorrectly positioned. This refactor relocates those comments to their appropriate locations for better clarity. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: Implement user queue reset functionalityJesse.Zhang
This patch adds robust reset handling for user queues (userq) to improve recovery from queue failures. The key components include: 1. Queue detection and reset logic: - amdgpu_userq_detect_and_reset_queues() identifies failed queues - Per-IP detect_and_reset callbacks for targeted recovery - Falls back to full GPU reset when needed 2. Reset infrastructure: - Adds userq_reset_work workqueue for async reset handling - Implements pre/post reset handlers for queue state management - Integrates with existing GPU reset framework 3. Error handling improvements: - Enhanced state tracking with HUNG state - Automatic reset triggering on critical failures - VRAM loss handling during recovery 4. Integration points: - Added to device init/reset paths - Called during queue destroy, suspend, and isolation events - Handles both individual queue and full GPU resets The reset functionality works with both gfx/compute and sdma queues, providing better resilience against queue failures while minimizing disruption to unaffected queues. v2: add detection and reset calls when preemption/unmaped fails. add a per device userq counter for each user queue type.(Alex) v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex) warn if the adev->userq_mutex is not held. v4: make sure we have all of the uqm->userq_mutex held. warn if the uqm->userq_mutex is not held. v5: Use array for user queue type counters.(Alex) all of the uqm->userq_mutex need to be held when calling detect and reset. (Alex) v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo) v8: remove atomic_set(&userq_mgr->userq_count[i], 0). it should already be 0 since we kzalloc the structure (Alex) v9: For consistency with kernel queues, We may want something like: amdgpu_userq_is_reset_type_supported (Alex) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Don't stretch non-native images by default in eDPMario Limonciello (AMD)
commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native resolutions on eDP") started using the GPU scaler hardware to scale when a non-native resolution was picked on eDP. This scaling was done to fill the screen instead of maintain aspect ratio. The idea was supposed to be that if a different scaling behavior is preferred then the compositor would request it. The not following aspect ratio behavior however isn't desirable, so adjust it to follow aspect ratio and still try to fill screen. Note: This will lead to black bars in some cases for non-native resolutions. Compositors can request the previous behavior if desired. Fixes: 978fa2f6d0b1 ("drm/amd/display: Use scaling for non-native resolutions on eDP") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdkfd: Fix Unchecked Return ValuesSunday Clement
Properly Check for return values from calls to debug functions in runtime_disable(). v2: storing the last non zero returned value from the loop. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd: Unwind for failed device suspendMario Limonciello (AMD)
If device suspend has failed, add a recovery flow that will attempt to unwind the suspend and get things back up and running. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4627 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase2()Mario Limonciello (AMD)
If any hardware IPs involved with the second phase of suspend fail, unwind all steps to restore back to original state. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase1()Mario Limonciello (AMD)
If any hardware IPs involved with the first phase of suspend fail, unwind all steps to restore back to original state. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()Alex Deucher
For S3 on vangogh, PMFW needs to be notified before the driver powers down RLC. This already happens in smu_disable_dpms() so drop the superfluous call in amdgpu_device_suspend(). Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Use correct severity for BP threshold exceed eventXiang Liu
The severity of CPER for BP threshold exceed event should be set as FATAL to match the OOB implementation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Correct info field of bad page threshold exceed CPERXiang Liu
Correct valid_bits and ms_chk_bits of section info field for bad page threshold exceed CPER to match OOB's behavior. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: remove unneeded semicolonJiapeng Chong
No functional modification involved. ./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:7392:3-4: Unneeded semicolon. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=26821 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: remove unneeded semicolonJiapeng Chong
No functional modification involved. ./drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c:1850:3-4: Unneeded semicolon. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=26821 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: remove unneeded semicolonJiapeng Chong
No functional modification involved. ./drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c:1674:3-4: Unneeded semicolon. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=26821 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm/si: Delete unused structs and fieldsTimur Kristóf
The contents of si_dpm.h seem to have been copied from the old radeon driver, including a lot of structs and fields which were only relevant to GPU generations even older than SI. A lot of these can be deleted without causing much churn to the actual SI DPM code. Let's delete them to make the code easier to understand. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.6Lijo Lazar
Fill and publish GPU metrics in v1.9 format for SMUv13.0.6 SOCs Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Add helper functions for gpu metricsLijo Lazar
Add helper macros to define metrics struct definitions. It will define structs with field type followed by actual field. A helper macro is also added to initialize the field encoding for all fields and to initialize the field members to 0xFFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()Yang Wang
Use the correct label to complete all cleanup work. Fixes: 4d154b1ca580 ("drm/amd/pm: Add support for DPM policies") Fixes: 25e82f2e2c59 ("drm/amd/pm: Add temperature metrics sysfs entry") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: fix the issue of size calculation error for smu 13.0.6Yang Wang
v1: the driver should handle return value of smu_v13_0_6_printk_clk_levels() to return the correct size for sysfs reads. v2: fix the issue of size calculation error in smu_v13_0_6_print_clks() Fixes: cdfdec6f1608 ("drm/amd/pm: Avoid writing nulls into `pp_od_clk_voltage`") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Update IPID value for bad page threshold CPERXiang Liu
The IPID register value for bad page threshold CPER holds socket_id info now according to the latest definition. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Fix null pointer on analog detectionHarry Wentland
Check if we have an amdgpu_dm_connector->dc_sink first before adding common modes for analog outputs. If we don't have a sink yet we can safely skip this. Fixes: 70181ad96ec2 ("drm/amd/display: Add common modes to analog displays without EDID") Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: Fix error injection parameter errorYiPeng Chai
Fix error injection parameter error. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Fix the error of undefined reference to `__udivdi3'YiPeng Chai
Fix the error: drivers/gpu/drm/amd/amdgpu/../ras/ras_mgr/amdgpu_ras_mgr.c:132:undefined reference to `__udivdi3' Fixes: fa0b203cd902 ("drm/amd/ras: Add amdgpu ras management function.") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510272144.6SUHUoWx-lkp@intel.com/ Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04spi: tegra210-quad: Check hardware status on timeoutVishwaroop A
Under high system load, QSPI interrupts can be delayed or blocked on the target CPU, causing wait_for_completion_timeout() to report failure even though the hardware successfully completed the transfer. When a timeout occurs, check the QSPI_RDY bit in QSPI_TRANS_STATUS to determine if the hardware actually completed the transfer. If so, manually invoke the completion handler to process the transfer successfully instead of failing it. This distinguishes lost/delayed interrupts from real hardware timeouts, preventing unnecessary failures of transfers that completed successfully. Signed-off-by: Vishwaroop A <va@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20251028155703.4151791-4-va@nvidia.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-04spi: tegra210-quad: Refactor error handling into helper functionsVishwaroop A
Extract common cleanup code into dedicated helper functions to simplify the code and improve readability. This refactoring includes: - tegra_qspi_reset(): Device reset and interrupt cleanup - tegra_qspi_dma_stop(): DMA termination and disable - tegra_qspi_pio_stop(): PIO mode disable No functional changes. This is purely a code reorganization to prepare for improved timeout handling in subsequent patches. Signed-off-by: Vishwaroop A <va@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20251028155703.4151791-3-va@nvidia.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-04spi: tegra210-quad: Fix timeout handlingVishwaroop A
When the CPU that the QSPI interrupt handler runs on (typically CPU 0) is excessively busy, it can lead to rare cases of the IRQ thread not running before the transfer timeout is reached. While handling the timeouts, any pending transfers are cleaned up and the message that they correspond to is marked as failed, which leaves the curr_xfer field pointing at stale memory. To avoid this, clear curr_xfer to NULL upon timeout and check for this condition when the IRQ thread is finally run. While at it, also make sure to clear interrupts on failure so that new interrupts can be run. A better, more involved, fix would move the interrupt clearing into a hard IRQ handler. Ideally we would also want to signal that the IRQ thread no longer needs to be run after the timeout is hit to avoid the extra check for a valid transfer. Fixes: 921fc1838fb0 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller") Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Vishwaroop A <va@nvidia.com> Link: https://patch.msgid.link/20251028155703.4151791-2-va@nvidia.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-04drm/amdgpu: Remove invalidate and flush hdp macrosAsad Kamal
Remove amdgpu_asic_flush_hdp & amdgpu_asic_invalidate_hdp functions and directly use the mapped ones Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>