summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-11-04net: stmmac: imx: convert to PHY_INTF_SEL_xxxRussell King (Oracle)
Convert dwmac-imx to use the PHY_INTF_SEL_xxx definitions rather than constants via: - ensuring that the prefix for the MASK and value definitions is the same. - using FIELD_PREP() to shift the PHY_INTF_SEL_xxx definition to the appropriate bitfield. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4n-0000000Choy-0IeG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: stmmac: add support for configuring the phy_intf_sel inputsRussell King (Oracle)
When dwmac is synthesised with support for multiple PHY interfaces, the core provides phy_intf_sel inputs, sampled on reset, to configure the PHY facing interface. Use stmmac_get_phy_intf_sel() in core code to determine the dwmac phy_intf_sel input value, and provide a new platform method called with this value just before we issue a soft reset to the dwmac core. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4h-0000000Chos-3wxX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: stmmac: add stmmac_get_phy_intf_sel()Russell King (Oracle)
Provide a function to translate the PHY interface mode to the phy_intf_sel pin configuration for dwmac1000 and dwmac4 cores that support multiple interfaces. We currently handle MII, GMII, RGMII, SGMII, RMII and REVMII, but not TBI, RTBI nor SMII as drivers do not appear to use these three and the driver doesn't currently support these. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4c-0000000Choe-3SII@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: stmmac: add phy_intf_sel and ACTPHYIF definitionsRussell King (Oracle)
Add definitions for the active PHY interface found in DMA hardware feature register 0, and also used to configure the core in multi- interface designs via phy_intf_sel. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1vFt4X-0000000ChoY-30p9@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: stmmac: s32: move PHY_INTF_SEL_x definitions out of the wayRussell King (Oracle)
S32's PHY_INTF_SEL_x definitions conflict with those for the dwmac cores as they use a different bitmapping. Add a S32 prefix so that they are unique. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com> Link: https://patch.msgid.link/E1vFt4S-0000000ChoS-2Ahi@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: stmmac: imx: use phylink's interface mode for set_clk_tx_rate()Russell King (Oracle)
imx_dwmac_set_clk_tx_rate() is passed the interface mode from phylink which will be the same as plat_dat->phy_interface. Use the passed-in interface mode rather than plat_dat->phy_interface. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4N-0000000ChoM-1llp@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04Input: cros_ec_keyb - fix an invalid memory accessTzung-Bi Shih
If cros_ec_keyb_register_matrix() isn't called (due to `buttons_switches_only`) in cros_ec_keyb_probe(), `ckdev->idev` remains NULL. An invalid memory access is observed in cros_ec_keyb_process() when receiving an EC_MKBP_EVENT_KEY_MATRIX event in cros_ec_keyb_work() in such case. Unable to handle kernel read from unreadable memory at virtual address 0000000000000028 ... x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: input_event cros_ec_keyb_work blocking_notifier_call_chain ec_irq_thread It's still unknown about why the kernel receives such malformed event, in any cases, the kernel shouldn't access `ckdev->idev` and friends if the driver doesn't intend to initialize them. Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Link: https://patch.msgid.link/20251104070310.3212712-1-tzungbi@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-11-04ti-sysc: allow OMAP2 and OMAP4 timers to be reserved on AM33xxMatthias Schiffer
am33xx.dtsi has the same clock setup as am35xx.dtsi, setting ti,no-reset-on-init and ti,no-idle on timer1_target and timer2_target, so AM33 needs the same workaround as AM35 to avoid ti-sysc probe failing on certain target modules. Signed-off-by: Matthias Schiffer <matthias.schiffer@tq-group.com> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://lore.kernel.org/r/20250825131114.2206804-1-alexander.stein@ew.tq-group.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-11-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: - Memory leak in bnxt GSI qp path - Failure in irdma registering large MRs - Failure to clean out the right CQ table entry in irdma - Invalid vf_id in some cases - Incorrect error unwind in EFA CQ create - hns doesn't use the optimal cq/qp relationships for it's HW banks - hns reports the wrong SGE size to userspace for its QPs - Corruption of the hns work queue entries in some cases * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: MAINTAINERS: Update irdma maintainers RDMA/irdma: Fix vf_id size to u16 to avoid overflow RDMA/hns: Remove an extra blank line RDMA/hns: Fix wrong WQE data when QP wraps around RDMA/hns: Fix the modification of max_send_sge RDMA/hns: Fix recv CQ and QP cache affinity RDMA/uverbs: Fix umem release in UVERBS_METHOD_CQ_CREATE RDMA/irdma: Set irdma_cq cq_num field during CQ create RDMA/irdma: Fix SD index calculation RDMA/bnxt_re: Fix a potential memory leak in destroy_gsi_sqp
2025-11-04Input: imx_sc_key - fix memory corruption on unloadDan Carpenter
This is supposed to be "priv" but we accidentally pass "&priv" which is an address in the stack and so it will lead to memory corruption when the imx_sc_key_action() function is called. Remove the &. Fixes: 768062fd1284 ("Input: imx_sc_key - use devm_add_action_or_reset() to handle all cleanups") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/aQYKR75r2VMFJutT@stanley.mountain Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-11-04drm/amd/display: Fix NULL deref in debugfs odm_combine_segmentsRong Zhang
When a connector is connected but inactive (e.g., disabled by desktop environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading odm_combine_segments causes kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy) e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> seq_read_iter+0x125/0x490 ? __alloc_frozen_pages_noprof+0x18f/0x350 seq_read+0x12c/0x170 full_proxy_read+0x51/0x80 vfs_read+0xbc/0x390 ? __handle_mm_fault+0xa46/0xef0 ? do_syscall_64+0x71/0x900 ksys_read+0x73/0xf0 do_syscall_64+0x71/0x900 ? count_memcg_events+0xc2/0x190 ? handle_mm_fault+0x1d7/0x2d0 ? do_user_addr_fault+0x21a/0x690 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f44d4031687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00> RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000 </TASK> Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x> snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn> platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Fix this by checking pipe_ctx->stream_res.tg before dereferencing. Fixes: 07926ba8a44f ("drm/amd/display: Add debugfs interface for ODM combine info") Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Mario Limoncello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f19bbecd34e3c15eed7e5e593db2ac0fc7a0e6d8) Cc: stable@vger.kernel.org
2025-11-04drm/amdkfd: Don't clear PT after process killedPhilip Yang
If process is killed. the vm entity is stopped, submit pt update job will trigger the error message "*ERROR* Trying to push to a killed entity", job will not execute. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 10c382ec6c6d1e11975a11962bec21cba6360391) Cc: stable@vger.kernel.org
2025-11-04drm/amdgpu/smu: Handle S0ix for vangoghAlex Deucher
Fix the flows for S0ix. There is no need to stop rlc or reintialize PMFW in S0ix. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Antheas Kapenekakis <lkml@antheas.dev> Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fd39b5a5830d8f2553e0c09d4d50bdff28b10080) Cc: <stable@vger.kernel.org> # c81f5cebe849: drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() Cc: <stable@vger.kernel.org>
2025-11-04drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()Alex Deucher
For S3 on vangogh, PMFW needs to be notified before the driver powers down RLC. This already happens in smu_disable_dpms() so drop the superfluous call in amdgpu_device_suspend(). Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 960e30a61e1a7ca5341a6cf9481e770e1cda24aa)
2025-11-04drm/amd/display: Fix black screen with HDMI outputsAlex Hung
[Why & How] This fixes the black screen issue on certain APUs with HDMI, accompanied by the following messages: amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info frame on connector DP-1: -22 amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm] Cannot find any crtc or sizes Fixes: 489f0f600ce2 ("drm/amd/display: Fix DVI-D/HDMI adapters") Suggested-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 678c901443a6d2e909e3b51331a20f9d8f84ce82)
2025-11-04drm/amd/display: Don't stretch non-native images by default in eDPMario Limonciello (AMD)
commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native resolutions on eDP") started using the GPU scaler hardware to scale when a non-native resolution was picked on eDP. This scaling was done to fill the screen instead of maintain aspect ratio. The idea was supposed to be that if a different scaling behavior is preferred then the compositor would request it. The not following aspect ratio behavior however isn't desirable, so adjust it to follow aspect ratio and still try to fill screen. Note: This will lead to black bars in some cases for non-native resolutions. Compositors can request the previous behavior if desired. Fixes: 978fa2f6d0b1 ("drm/amd/display: Use scaling for non-native resolutions on eDP") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 825df7ff4bb1a383ad4827545e09aec60d230770)
2025-11-04drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()Yang Wang
Use the correct label to complete all cleanup work. Fixes: 4d154b1ca580 ("drm/amd/pm: Add support for DPM policies") Fixes: 25e82f2e2c59 ("drm/amd/pm: Add temperature metrics sysfs entry") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4c4c138a1c86775c4d47e24f26357a1f8b64d0a3)
2025-11-04drm/amdgpu: set default gfx reset masks for gfx6-8Alex Deucher
These were not set so soft recovery was inadvertantly disabled. Fixes: 6ac55eab4fc4 ("drm/amdgpu: move reset support type checks into the caller") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1972763505d728c604b537180727ec8132e619df)
2025-11-04drm/xe: Remove unused GT page fault codeMatthew Brost
With the Xe page fault layer and GuC page layer in place, this is now dead code and can be removed. ACC code is also removed, but this was dead code. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251031165416.2871503-8-matthew.brost@intel.com
2025-11-04drm/xe: Add xe_guc_pagefault layerMatthew Brost
Add xe_guc_pagefault layer (producer) which parses G2H fault messages messages into struct xe_pagefault, forwards them to the page fault layer (consumer) for servicing, and provides a vfunc to acknowledge faults to the GuC upon completion. Replace the old (and incorrect) GT page fault layer with this new layer throughout the driver. As part of this change, the ACC handling code has been removed, as it is dead code that is currently unused. v2: - Include engine instance (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251031165416.2871503-7-matthew.brost@intel.com
2025-11-04drm/xe: Implement xe_pagefault_queue_workMatthew Brost
Implement a worker that services page faults, using the same implementation as in xe_gt_pagefault.c. v2: - Rebase on exhaustive eviction changes - Include engine instance in debug prints (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251031165416.2871503-6-matthew.brost@intel.com
2025-11-04drm/xe: Implement xe_pagefault_handlerMatthew Brost
Enqueue (copy) the input struct xe_pagefault into a queue (i.e., into a memory buffer) and schedule a worker to service it. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251031165416.2871503-5-matthew.brost@intel.com
2025-11-04drm/xe: Implement xe_pagefault_resetMatthew Brost
Squash any pending faults on the GT being reset by setting the GT field in struct xe_pagefault to NULL. v4: - Only do reset it page faults queues initialized (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251031165416.2871503-4-matthew.brost@intel.com
2025-11-04drm/xe: Implement xe_pagefault_initMatthew Brost
Create pagefault queues and initialize them. v2: - Fix kernel doc + add comment for number PF queue (Francois) v4: - Move init after GT init (CI, Francois) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251031165416.2871503-3-matthew.brost@intel.com
2025-11-04accel/amdxdna: Add IOCTL parameter for telemetry dataLizhi Hou
Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters that allow collection of telemetry data. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251104062546.833771-3-lizhi.hou@amd.com
2025-11-04accel/amdxdna: Add IOCTL parameter for resource dataLizhi Hou
Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters that allow collection of resource data. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251104062546.833771-2-lizhi.hou@amd.com
2025-11-04drm/xe: Stub out new pagefault layerMatthew Brost
Stub out the new page fault layer and add kernel documentation. This is intended as a replacement for the GT page fault layer, enabling multiple producers to hook into a shared page fault consumer interface. v2: - Fix kernel doc typo (checkpatch) - Remove comment around GT (Stuart) - Add explaination around reclaim (Francois) - Add comment around u8 vs enum (Francois) - Include engine instance (Stuart) v3: - Fix XE_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION kernel doc (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251031165416.2871503-2-matthew.brost@intel.com
2025-11-04accel/amdxdna: Add hardware specific attributesLizhi Hou
Add three hardware specific attributes to describe device capabilities: hwctx_limit: The maximum number of hardware context supported. max_tops: The maximum TOPS supported. curr_tops: The TOPS achievable with the current power and frequency configuration. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251104062546.833771-1-lizhi.hou@amd.com
2025-11-04drm/amdgpu: fix possible fence leaks from job structureAlex Deucher
If we don't end up initializing the fences, free them when we free the job. We can't set the hw_fence to NULL after emitting it because we need it in the cleanup path for the submit direct case. v2: take a reference to the fences if we emit them v3: handle non-job fence in error paths Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> (v1) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: suspend ras module before gpu resetYiPeng Chai
During gpu reset, all GPU-related resources are inaccessible. To avoid affecting ras functionality, suspend ras module before gpu reset and resume it after gpu reset is complete. V2: Rename functions to avoid misunderstanding. V3: Move flush_delayed_work to amdgpu_ras_process_pause, Move schedule_delayed_work to amdgpu_ras_process_unpause. V4: Rename functions. V5: Move the function to amdgpu_ras.c. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: add wrapper functions for pmfw eeprom interfaceGangliang Xie
add wrapper functions for pmfw eeprom interface, for these interfaces to be easily and safely called Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: add function to check if pmfw eeprom is supportedGangliang Xie
add function to check if pmfw is supported, skip eeprom check and recover when pmfw eeprom is supported Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: add smu ras driver frameworkGangliang Xie
add functions to get smu ras driver Signed-off-by: Gangliang Xie <ganglxie@amd.com> Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12Gangliang Xie
implement ras_smu_drv interface for smu v13.0.12 Signed-off-by: Gangliang Xie <ganglxie@amd.com> Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: add new message definitions for pmfw eeprom interfaceGangliang Xie
Add new message definitions for pmfw eeprom interface Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04Revert "drm/amdkfd: Improve signal event slow path"Philip Yang
To fix regression report on gfx8, which requires the exhaustive search path for signaled event. The high CPU usage of KFD interrupt wq issue is gone after HIP/ROCr add option to reduce HW event interrupts, safe to revert this optimization patch now. This reverts commit de844846f72b152119faaef1b363448dc8ea368f. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/display: Fix NULL deref in debugfs odm_combine_segmentsRong Zhang
When a connector is connected but inactive (e.g., disabled by desktop environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading odm_combine_segments causes kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy) e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> seq_read_iter+0x125/0x490 ? __alloc_frozen_pages_noprof+0x18f/0x350 seq_read+0x12c/0x170 full_proxy_read+0x51/0x80 vfs_read+0xbc/0x390 ? __handle_mm_fault+0xa46/0xef0 ? do_syscall_64+0x71/0x900 ksys_read+0x73/0xf0 do_syscall_64+0x71/0x900 ? count_memcg_events+0xc2/0x190 ? handle_mm_fault+0x1d7/0x2d0 ? do_user_addr_fault+0x21a/0x690 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f44d4031687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00> RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000 </TASK> Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x> snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn> platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Fix this by checking pipe_ctx->stream_res.tg before dereferencing. Fixes: 07926ba8a44f ("drm/amd/display: Add debugfs interface for ODM combine info") Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Mario Limoncello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdkfd: Don't clear PT after process killedPhilip Yang
If process is killed. the vm entity is stopped, submit pt update job will trigger the error message "*ERROR* Trying to push to a killed entity", job will not execute. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Add ras support for umc v12_5_0YiPeng Chai
Add ras support for umc v12_5_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Add ras support for nbio v7_9_1YiPeng Chai
Add ras support for nbio v7_9_1. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: Add ras ip block nameYiPeng Chai
Add ras ip block name. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/ras: Increase ras switch control rangeYiPeng Chai
Increase ras switch control range. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu/smu: Handle S0ix for vangoghAlex Deucher
Fix the flows for S0ix. There is no need to stop rlc or reintialize PMFW in S0ix. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Antheas Kapenekakis <lkml@antheas.dev> Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Update SMUv13.0.12 partition metricsLijo Lazar
Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Update SMUv13.0.6 partition metricsLijo Lazar
For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Add schema v1.1 for parition metricsLijo Lazar
Use a schema similar to gpu metrics v1.9 for partition metrics also. It will have field type encoded followed by the field value(s). The attribute ids used will be shared with gpu metrics. The structure definition is only to distinguish between gpu metrics and partition metrics though both gpu metrics v1.9 and partition metrics v1.1 follow the same definition. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12Lijo Lazar
Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: validate the bo from done list for NULLSunil Khatri
Make sure the bo is valid before using it. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: lock bo before calling amdgpu_vm_bo_update_sharedPierre-Eric Pelloux-Prayer
BO's reservation object must be locked before using amdgpu_vm_bo_update_shared otherwise dma_resv_assert_held will complain in amdgpu_vm_update_shared. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04drm/amdgpu: grab a BO reference in vm_lock_done_list.Christian König
Otherwise it is possible that between dropping the status lock and locking the BO that the BO is freed up. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>