summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2023-11-10drm/amdgpu: move kfd_resume before the ip late initTim Huang
The kfd_resume needs to touch GC registers to enable the interrupts, it needs to be done before GFXOFF is enabled to ensure that the GFX is not off and GC registers can be touched. So move kfd_resume before the amdgpu_device_ip_late_init which enables the CGPG/GFXOFF. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amd: Explicitly check for GFXOFF to be enabled for s0ixMario Limonciello
If a user has disabled GFXOFF this may cause problems for the suspend sequence. Ensure that it is enabled in amdgpu_acpi_is_s0ix_active(). The system won't reach the deepest state but it also won't hang. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10Merge tag 'drm-misc-fixes-2023-11-08' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-fixes for v6.7-rc1: qxl: - qxl memory leak fix. syncobj: - Fix waiting for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE vc4: - Fix UAF in mock helpers Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [sima: Stitch together both changelogs from Maarten. Also because of branch history this contains a few more bugfixes which are already in v6.6, but I didn't feel like this justifies some backmerge since there wasn't any real conflict.] Link: https://patchwork.freedesktop.org/patch/msgid/bc8598ee-d427-4616-8ebd-64107ab9a2d8@linux.intel.com
2023-11-10Merge tag 'drm-intel-next-fixes-2023-11-08' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 fixes for v6.7-rc1: - Fix null dereference when perf interface is not available - Fix a -Wstringop-overflow warning - Fix a -Wformat-truncation warning in intel_tc_port_init - Flush WC GGTT only on required platforms - Fix MTL HBR3 rate support on C10 phy and eDP - Fix MTL notify_guc for multi-GT - Bump GLK CDCLK frequency when driving multiple pipes - Fix potential spectre vulnerability Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/878r78xrxd.fsf@intel.com
2023-11-10drm/panfrost: Set regulators on/off during system sleep on MediaTek SoCsAngeloGioacchino Del Regno
All of the MediaTek SoCs supported by Panfrost can completely cut power to the GPU during full system sleep without any user-noticeable delay in the resume operation, as shown by measurements taken on multiple MediaTek SoCs (MT8183/86/92/95). As an example, for MT8195 - a "before" with only runtime PM operations (so, without turning on/off regulators), and an "after" executing both the system sleep .resume() handler and .runtime_resume() (so the time refers to T_Resume + T_Runtime_Resume): Average Panfrost-only system sleep resume time, before: ~33500ns Average Panfrost-only system sleep resume time, after: ~336200ns Keep in mind that this additional ~308200 nanoseconds delay happens only in resume from a full system suspend, and not in runtime PM operations, hence it is acceptable. Measurements were also taken on MT8186, showing a delay of ~312000 ns. Testing of this happened on all of the aforementioned MediaTek SoCs, but: MT8183 got tested only by KernelCI with <=10 suspend/resume cycles MT8186, MT8192, MT8195 were tested manually with over 100 suspend/resume cycles with GNOME DE (Mutter + Wayland). Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231109102543.42971-7-angelogioacchino.delregno@collabora.com
2023-11-10drm/panfrost: Implement ability to turn on/off regulators in suspendAngeloGioacchino Del Regno
Some platforms/SoCs can power off the GPU entirely by completely cutting off power, greatly enhancing battery time during system suspend: add a new pm_feature GPU_PM_VREG_OFF to allow turning off the GPU regulators during full suspend only on selected platforms. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231109102543.42971-6-angelogioacchino.delregno@collabora.com
2023-11-10drm/panfrost: Set clocks on/off during system sleep on MediaTek SoCsAngeloGioacchino Del Regno
All of the MediaTek SoCs supported by Panfrost can switch the clocks off and on during system sleep to save some power without any user experience penalty. Measurements taken on multiple MediaTek SoCs (MT8183/8186/8192/8195) show that adding this will not prolong the time that is required to resume the system in any meaningful way. As an example, for MT8195 - a "before" with only runtime PM operations (so, without turning on/off GPU clocks), and an "after" executing both the system sleep .resume() handler and .runtime_resume() (so the time refers to T_Resume + T_Runtime_Resume): Average Panfrost-only system sleep resume time, before: ~28000ns Average Panfrost-only system sleep resume time, after: ~33500ns Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231109102543.42971-5-angelogioacchino.delregno@collabora.com
2023-11-10drm/panfrost: Implement ability to turn on/off GPU clocks in suspendAngeloGioacchino Del Regno
Currently, the GPU is being internally powered off for runtime suspend and turned back on for runtime resume through commands sent to it, but note that the GPU doesn't need to be clocked during the poweroff state, hence it is possible to save some power on selected platforms. Add suspend and resume handlers for full system sleep and then add a new panfrost_gpu_pm enumeration and a pm_features variable in the panfrost_compatible structure: BIT(GPU_PM_CLK_DIS) will be used to enable this power saving technique only on SoCs that are able to safely use it. Note that this was implemented only for the system sleep case and not for runtime PM because testing on one of my MediaTek platforms showed issues when turning on and off clocks aggressively (in PM runtime) resulting in a full system lockup. Doing this only for full system sleep never showed issues during my testing by suspending and resuming the system continuously for more than 100 cycles. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231109102543.42971-4-angelogioacchino.delregno@collabora.com
2023-11-10drm/panfrost: Tighten polling for soft reset and power onAngeloGioacchino Del Regno
In many cases, soft reset takes more than 1 microsecond, but definitely less than 10; moreover in the poweron flow, tilers, shaders and l2 will become ready (each) in less than 10 microseconds as well. Even in the cases (at least on my platforms, rarely) in which those take more than 10 microseconds, it's very unlikely to see both soft reset and poweron to take more than 70 microseconds. Shorten the polling delay to 10 microseconds to consistently reduce the runtime resume time of the GPU. As an indicative example, measurements taken on a MediaTek MT8195 SoC Average runtime resume time in nanoseconds before this commit: GDM, user selection up/down: 88435ns GDM, Text Entry (typing user/password): 91489ns GNOME Desktop, idling, GKRELLM running: 73200ns After this commit: GDM: user selection up/down: 26690ns GDM: Text Entry (typing user/password): 27917ns GNOME Desktop, idling, GKRELLM running: 25304ns Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231109102543.42971-3-angelogioacchino.delregno@collabora.com
2023-11-10drm/panfrost: Perform hard reset to recover GPU if soft reset failsAngeloGioacchino Del Regno
Even though soft reset should ideally never fail, during development of some power management features I managed to get some bits wrong: this resulted in GPU soft reset failures, where the GPU was never able to recover, not even after suspend/resume cycles, meaning that the only way to get functionality back was to reboot the machine. Perform a hard reset after a soft reset failure to be able to recover the GPU during runtime (so, without any machine reboot). Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231109102543.42971-2-angelogioacchino.delregno@collabora.com
2023-11-10drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()AngeloGioacchino Del Regno
The layout of the registers {TILER,SHADER,L2}_PWROFF_LO, used to request powering off cores, is the same as the {TILER,SHADER,L2}_PWRON_LO ones: this means that in order to request poweroff of cores, we are supposed to write a bitmask of cores that should be powered off! This means that the panfrost_gpu_power_off() function has always been doing nothing. Fix powering off the GPU by writing a bitmask of the cores to poweroff to the relevant PWROFF_LO registers and then check that the transition (from ON to OFF) has finished by polling the relevant PWRTRANS_LO registers. While at it, in order to avoid code duplication, move the core mask logic from panfrost_gpu_power_on() to a new panfrost_get_core_mask() function, used in both poweron and poweroff. Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231102141507.73481-1-angelogioacchino.delregno@collabora.com
2023-11-10drm/i915: Implement fdinfo memory stats printingTvrtko Ursulin
Use the newly added drm_print_memory_stats helper to show memory utilisation of our objects in drm/driver specific fdinfo output. To collect the stats we walk the per memory regions object lists and accumulate object size into the respective drm_memory_stats categories. v2: * Only account against the active region. * Use DMA_RESV_USAGE_BOOKKEEP when testing for active. (Tejas) v3: * Update commit text. (Aravind) * Update to use memory regions uabi names. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-6-tvrtko.ursulin@linux.intel.com
2023-11-10drm/i915: Add stable memory region namesTvrtko Ursulin
At the moment memory region names are a bit too varied and too inconsistent to be used for ABI purposes, like for upcoming fdinfo memory stats. System memory can be either system or system-ttm. Local memory has the instance number appended, others do not. Not only incosistent but thi kind of implementation detail is uninteresting for intended users of fdinfo memory stats. Add a stable name always formed as $type$instance. Could have chosen a different stable scheme, but I think any consistent and stable scheme should do just fine. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-5-tvrtko.ursulin@linux.intel.com
2023-11-10drm/i915: Account ring buffer and context state storageTvrtko Ursulin
Account ring buffers and logical context space against the owning client memory usage stats. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-4-tvrtko.ursulin@linux.intel.com
2023-11-10drm/i915: Track page table backing store usageTvrtko Ursulin
Account page table backing store against the owning client memory usage stats. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-3-tvrtko.ursulin@linux.intel.com
2023-11-10drm/i915: Record which client owns a VMTvrtko Ursulin
To enable accounting of indirect client memory usage (such as page tables) in the following patch, lets start recording the creator of each PPGTT. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-2-tvrtko.ursulin@linux.intel.com
2023-11-10drm/i915: Add ability for tracking buffer objects per clientTvrtko Ursulin
In order to show per client memory usage lets add some infrastructure which enables tracking buffer objects owned by clients. We add a per client list protected by a new per client lock and to support delayed destruction (post client exit) we make tracked objects hold references to the owning client. Also, object memory region teardown is moved to the existing RCU free callback to allow safe dereference from the fdinfo RCU read section. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-1-tvrtko.ursulin@linux.intel.com
2023-11-10drm/i915/panelreplay: enable/disable panel replayAnimesh Manna
TRANS_DP2_CTL register is programmed to enable panel replay from source and sink is enabled through panel replay dpcd configuration address. Bspec: 1407940617 v1: Initial version. v2: - Use pr_* flags instead psr_* flags. [Jouni] - Remove intel_dp_is_edp check as edp1.5 also has panel replay. [Jouni] v3: Cover letter updated and selective fetch condition check is added before updating its bit in PSR2_MAN_TRK_CTL register. [Jouni] v4: Selective fetch related PSR2_MAN_TRK_CTL programmming dropped. [Jouni] v5: Added PSR2_MAN_TRK_CTL programming as needed for Continuous Full Frame (CFF) update. v6: Rebased on latest. Note: Initial plan is to enable panel replay in full-screen live active frame update mode. In a incremental approach panel replay will be enabled in selctive update mode if there is any gap in curent implementation. Cc: Jouni Högander <jouni.hogander@intel.com> Cc: Arun R Murthy <arun.r.murthy@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231108072303.3414118-6-animesh.manna@intel.com
2023-11-10drm/i915/panelreplay: Enable panel replay dpcd initialization for DPAnimesh Manna
Due to similarity panel replay dpcd initialization got added in psr function which is specific for edp panel. This patch enables panel replay initialization for dp connector. Cc: Jouni Högander <jouni.hogander@intel.com> Cc: Arun R Murthy <arun.r.murthy@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231108072303.3414118-5-animesh.manna@intel.com
2023-11-10drm/i915/panelreplay: Initializaton and compute config for panel replayAnimesh Manna
Modify existing PSR implementation to enable panel replay feature of DP 2.0 which is similar to PSR feature of EDP panel. There is different DPCD address to check panel capability compare to PSR and vsc sdp header is different. v1: Initial version. v2: - Set source_panel_replay_support flag under HAS_PANEL_REPLAY() condition check. [Jouni] - Code restructured around intel_panel_replay_init and renamed to intel_panel_replay_init_dpcd. [Jouni] - Remove the initial code modification around has_psr2 flag. [Jouni] - Add CAN_PANEL_REPLAY() in intel_encoder_can_psr which is used to enable in intel_psr_post_plane_update. [Jouni] v3: - Initialize both psr and panel-replay. [Jouni] - Initialize both panel replay and psr if detected. [Jouni] - Refactoring psr function by introducing _psr_compute_config(). [Jouni] - Add check for !is_edp while deriving source_panel_replay_support. [Jouni] - Enable panel replay dpcd initialization in a separate patch. [Jouni] v4: - HAS_PANEL_REPLAY() check not needed during sink capability check. [Jouni] - Set either panel replay source support or psr. [Jouni] v5: - HAS_PANEL_REPLAY() removed and use HAS_DP20() instead. [Jouni] - Move psr related code to intel_psr.c. [Jani] - Reset sink_panel_replay_support flag during disconnection. [Jani] v6: return statement restored which is removed by misatke. [Jouni] v7: cosmetic changes. [Arun] Cc: Jouni Högander <jouni.hogander@intel.com> Cc: Arun R Murthy <arun.r.murthy@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231108072303.3414118-4-animesh.manna@intel.com
2023-11-10drm/i915/psr: Move psr specific dpcd init into own functionJouni Högander
This patch is preparing adding panel replay specific dpcd init. Cc: Arun R Murthy <arun.r.murthy@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231108072303.3414118-3-animesh.manna@intel.com
2023-11-10drm/sched: implement dynamic job-flow controlDanilo Krummrich
Currently, job flow control is implemented simply by limiting the number of jobs in flight. Therefore, a scheduler is initialized with a credit limit that corresponds to the number of jobs which can be sent to the hardware. This implies that for each job, drivers need to account for the maximum job size possible in order to not overflow the ring buffer. However, there are drivers, such as Nouveau, where the job size has a rather large range. For such drivers it can easily happen that job submissions not even filling the ring by 1% can block subsequent submissions, which, in the worst case, can lead to the ring run dry. In order to overcome this issue, allow for tracking the actual job size instead of the number of jobs. Therefore, add a field to track a job's credit count, which represents the number of credits a job contributes to the scheduler's credit limit. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
2023-11-09drm/panel-edp: drm/panel-edp: Add several generic edp panelsHsin-Yi Wang
Add a few generic edp panels used by mt8186 chromebooks. Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231107204611.3082200-4-hsinyi@chromium.org
2023-11-09drm/panel-edp: drm/panel-edp: Fix AUO B116XTN02 nameHsin-Yi Wang
Rename AUO 0x235c B116XTN02 to B116XTN02.3 according to decoding edid. Fixes: 3db2420422a5 ("drm/panel-edp: Add AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 V8.0") Cc: stable@vger.kernel.org Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231107204611.3082200-3-hsinyi@chromium.org
2023-11-09drm/panel-edp: drm/panel-edp: Fix AUO B116XAK01 name and timingHsin-Yi Wang
Rename AUO 0x405c B116XAK01 to B116XAK01.0 and adjust the timing of auo_b116xak01: T3=200, T12=500, T7_max = 50 according to decoding edid and datasheet. Fixes: da458286a5e2 ("drm/panel: Add support for AUO B116XAK01 panel") Cc: stable@vger.kernel.org Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231107204611.3082200-2-hsinyi@chromium.org
2023-11-09drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()Luben Tuikov
Don't "wake up" the GPU scheduler unless the entity is ready, as well as we can queue to the scheduler, i.e. there is no point in waking up the scheduler for the entity unless the entity is ready. Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Fixes: bc8d6a9df99038 ("drm/sched: Don't disturb the entity when in RR-mode scheduling") Reviewed-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110000123.72565-2-ltuikov89@gmail.com
2023-11-09drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2)Victor Lu
W/RREG32_RLC is hardedcoded to use instance 0. W/RREG32_SOC15_RLC should be used instead when inst != 0. v2: rebase Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)Victor Lu
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter. Using amdgpu_sriov_runtime to determine whether to access via kiq or RLC is sufficient for now. v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call v4: avoid using amdgpu_sriov_w/rreg v3: use W/RREG32_XCC to handle non-kiq case v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters of amdgpu_device_wreg/rreg Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query supportYang Wang
add pcs xgmi ras error query support for smu v13.0.6. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: fix software pci_unplug on some chipsVitaly Prosyak
When software 'pci unplug' using IGT is executed we got a sysfs directory entry is NULL for differant ras blocks like hdp, umc, etc. Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group' check that 'sd' is not NULL. [ +0.000001] RIP: 0010:sysfs_remove_group+0x83/0x90 [ +0.000002] Code: 31 c0 31 d2 31 f6 31 ff e9 9a a8 b4 00 4c 89 e7 e8 f2 a2 ff ff eb c2 49 8b 55 00 48 8b 33 48 c7 c7 80 65 94 82 e8 cd 82 bb ff <0f> 0b eb cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 [ +0.000001] RSP: 0018:ffffc90002067c90 EFLAGS: 00010246 [ +0.000002] RAX: 0000000000000000 RBX: ffffffff824ea180 RCX: 0000000000000000 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000001] RBP: ffffc90002067ca8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: ffff88810a395f48 R14: ffff888101aab0d0 R15: 0000000000000000 [ +0.000001] FS: 00007f5ddaa43a00(0000) GS:ffff88841e800000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f8ffa61ba50 CR3: 0000000106432000 CR4: 0000000000350ef0 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000001] ? show_regs+0x72/0x90 [ +0.000002] ? sysfs_remove_group+0x83/0x90 [ +0.000002] ? __warn+0x8d/0x160 [ +0.000001] ? sysfs_remove_group+0x83/0x90 [ +0.000001] ? report_bug+0x1bb/0x1d0 [ +0.000003] ? handle_bug+0x46/0x90 [ +0.000001] ? exc_invalid_op+0x19/0x80 [ +0.000002] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000003] ? sysfs_remove_group+0x83/0x90 [ +0.000001] dpm_sysfs_remove+0x61/0x70 [ +0.000002] device_del+0xa3/0x3d0 [ +0.000002] ? ktime_get_mono_fast_ns+0x46/0xb0 [ +0.000002] device_unregister+0x18/0x70 [ +0.000001] i2c_del_adapter+0x26d/0x330 [ +0.000002] arcturus_i2c_control_fini+0x25/0x50 [amdgpu] [ +0.000236] smu_sw_fini+0x38/0x260 [amdgpu] [ +0.000241] amdgpu_device_fini_sw+0x116/0x670 [amdgpu] [ +0.000186] ? mutex_lock+0x13/0x50 [ +0.000003] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ +0.000192] drm_minor_release+0x4f/0x80 [drm] [ +0.000025] drm_release+0xfe/0x150 [drm] [ +0.000027] __fput+0x9f/0x290 [ +0.000002] ____fput+0xe/0x20 [ +0.000002] task_work_run+0x61/0xa0 [ +0.000002] exit_to_user_mode_prepare+0x150/0x170 [ +0.000002] syscall_exit_to_user_mode+0x2a/0x50 Cc: Hawking Zhang <hawking.zhang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amd/display: remove duplicated argumentJosé Pekkarinen
Spotted by coccicheck, there is a redundant check for v->SourcePixelFormat[k] != dm_444_16. This patch will remove it. The corresponding output follows. drivers/gpu/drm/amd/display/dc/dml/dcn30/display_mode_vba_30.c:5130:86-122: duplicated argument to && or || Signed-off-by: José Pekkarinen <jose.pekkarinen@foxhound.fi> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: correct mca debugfs dump reg listYang Wang
avoid driver to touch invalid mca reg. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: correct acclerator check architecutre dumpHawking Zhang
So driver doesn't touch invalid aca entries. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: add pcs xgmi v6.4.0 ras supportYang Wang
add pcs xgmi v6.4.0 ras support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3David Yat Sin
Change local memory type to MTYPE_UC on revision id 0 Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: disable smu v13.0.6 mca debug mode by defaultYang Wang
disable mca debug mode for smu v13.0.6 by default. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Support multiple error query modesHawking Zhang
Direct error query mode and firmware error query mode are supported for now. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: refine smu v13.0.6 mca dump driverYang Wang
refine smu mca driver to support query ras error from pmfw path. - correct gfx smu bank hwid (from mp5 to smu bank) - retire unused callback function in amdgpu_mca_smu_funcs{} - add new mca_bank_set{} structure to collect mca bank - move enum mca_reg_idx into amdgpu_mca.h header - add mca status register field decode macro Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2)Victor Lu
The following regs can only be programmed by the PF: HDP_MISC_CNTL HDP_NONSURFACE_BASE HDP_NONSURFACE_BASE_HI v2: update commit message Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOVVictor Lu
PCTL0_MMHUB_DEEPSLEEP_IB is blocked for VF access Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm: amd: Resolve Sphinx unexpected indentation warningHunter Chasens
Resolves Sphinx unexpected indentation warning when compiling documentation (e.g. `make htmldocs`). Replaces tabs with spaces and adds a literal block to keep vertical formatting of the example power state list. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> (v2) Acked-by: Randy Dunlap <rdunlap@infradead.org> (v2) Signed-off-by: Hunter Chasens <hunter.chasens18@ncf.edu> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: correct smu v13.0.6 umc ras error checkYang Wang
correct smu v13.0.0 umc ras error check Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Add xcc param to SRIOV kiq write and WREG32_SOC15_IP_NO_KIQ (v4)Victor Lu
WREG32/RREG32_SOC15_IP_NO_KIQ and amdgpu_virt_kiq_reg_write_reg_wait are not using the correct rlcg interface or mec engine, respectively. Add xcc instance parameter to them. v4: Use GET_INST and squash commit with: "drm/amdgpu: Add xcc_inst param to amdgpu_virt_kiq_reg_write_reg_wait" v3: xcc not needed for MMMHUB v2: rebase Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Add flag to enable indirect RLCG access for gfx v9.4.3Victor Lu
The "rlcg_reg_access_supported" flag is missing. Add it back in. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amd/pm: raise the deep sleep clock threshold for smu 13.0.6Le Ma
The DS clock may exceed the limit as sclk dfll divider is 16 to target freq. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: correct amdgpu ip block rev infoYang Wang
correct following amdgpu ip block version information: - gfx_v9_4_3 - sdma_v4_4_2 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amd/pm: Hide pp_dpm_pcie device attributeLijo Lazar
Hide PCIe DPM attribute on SOCs with GC v9.4.2 and GC v9.4.3. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Don't warn for unsupported set_xgmi_plpd_modeTao Zhou
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to print warning for it. v2: add ret2 to save the status of psp_ras_trigger_error. Suggested-by: lijo.lazar@amd.com Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: lower CS errors to debug severityChristian König
Otherwise userspace can spam the logs by using incorrect input values. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-11-09drm/amdgpu: fix error handling in amdgpu_bo_list_get()Christian König
We should not leak the pointer where we couldn't grab the reference on to the caller because it can be that the error handling still tries to put the reference then. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org