summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-01-10drm/amd/pm: Remove unused legacy message functionsLijo Lazar
Messaging functions are now moved to message control block. Remove unused legacy functions around messaging. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Replace without wait with async callsLijo Lazar
Use the new async locked message function instead of without_waiting messaging function. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Add async message call supportLijo Lazar
Add asynchronous messaging (message which doesn't wait for response) using message control block. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Use message control in messagingLijo Lazar
Use message control block operations in common message functions. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Add message control for SMUv15Lijo Lazar
Initialize smu message control in SMUv15 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Add message control for SMUv14Lijo Lazar
Initialize smu message control in SMUv14 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Add message control for SMUv13Lijo Lazar
Initialize smu message control in SMUv13 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Add message control for SMUv12Lijo Lazar
Initialize smu message control in SMUv12 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Add message control for SMUv11Lijo Lazar
Initialize smu message control in SMUv11 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amd/pm: Add smu message control blockLijo Lazar
Add message control block to abstract PMFW message protocol. Message control block primarily carries message config which is set of register addresses and message ops which abstracts the protocol of sending messages. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amdgpu: Use correct address to setup gart page table for vram accessXiaogang Chen
Use dst input parameter to setup gart page table entries instead of using fixed location. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10drm/amdgpu: Skip loading SDMA_RS64 in VFYuBiao Wang
VFs use the PF SDMA ucode and are unable to load SDMA_RS64. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Signed-off-by: Victor Skvortsov <Victor.Skvortsov@amd.com> Reviewed-by: Gavin Wan <gavin.wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-10block: account for bi_bvec_done in bio_may_need_split()Ming Lei
When checking if a bio fits in a single segment, bio_may_need_split() compares bi_size against the current bvec's bv_len. However, for partially consumed bvecs (bi_bvec_done > 0), such as in cloned or split bios, the remaining bytes in the current bvec is actually (bv_len - bi_bvec_done), not bv_len. This could cause bio_may_need_split() to incorrectly return false, leading to nr_phys_segments being set to 1 when the bio actually spans multiple segments. This triggers the WARN_ON in __blk_rq_map_sg() when the actual mapped segments exceed the expected count. Fix by subtracting bi_bvec_done from bv_len in the comparison. Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Close: https://lore.kernel.org/linux-block/9687cf2b-1f32-44e1-b58d-2492dc6e7185@linux.ibm.com/ Repored-and-bisected-by: Christoph Hellwig <hch@infradead.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Tested-by: Christoph Hellwig <hch@infradead.org> Fixes: ee623c892aa5 ("block: use bvec iterator helper for bio_may_need_split()") Cc: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-10block: use pi_tuple_size in bi_offload_capable()Caleb Sander Mateos
bi_offload_capable() returns whether a block device's metadata size matches its PI tuple size. Use pi_tuple_size instead of switching on csum_type. This makes the code considerably simpler and less branchy. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-10block: zero non-PI portion of auto integrity bufferCaleb Sander Mateos
The auto-generated integrity buffer for writes needs to be fully initialized before being passed to the underlying block device, otherwise the uninitialized memory can be read back by userspace or anyone with physical access to the storage device. If protection information is generated, that portion of the integrity buffer is already initialized. The integrity data is also zeroed if PI generation is disabled via sysfs or the PI tuple size is 0. However, this misses the case where PI is generated and the PI tuple size is nonzero, but the metadata size is larger than the PI tuple. In this case, the remainder ("opaque") of the metadata is left uninitialized. Generalize the BLK_INTEGRITY_CSUM_NONE check to cover any case when the metadata is larger than just the PI tuple. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: c546d6f43833 ("block: only zero non-PI metadata tuples in bio_integrity_prep") Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-10Merge tag 'iommu-fixes-v6.19-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iomu fixes from Joerg Roedel: - several Kconfig-related build fixes - fix for when gcc 8.5 on PPC refuses to inline a function from a header file * tag 'iommu-fixes-v6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommupt: Make pt_feature() always_inline iommufd/selftest: Prevent module/builtin conflicts in kconfig iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFER iommupt: Fix the kunit building
2026-01-10accel/rocket: rocket_accel.h: fix kernel-doc warningsRandy Dunlap
Fix all kernel-doc warnings in rocket_accel.h: Warning: include/uapi/drm/rocket_accel.h:35 Incorrect use of kernel-doc format: * Output: DMA address for the BO in the NPU address space. This address and 22 warnings like these: Warning: include/uapi/drm/rocket_accel.h:43 struct member 'size' not described in 'drm_rocket_create_bo' Warning: include/uapi/drm/rocket_accel.h:60 struct member 'handle' not described in 'drm_rocket_prep_bo' Warning: include/uapi/drm/rocket_accel.h:73 struct member 'handle' not described in 'drm_rocket_fini_bo' Warning: include/uapi/drm/rocket_accel.h:86 struct member 'regcmd' not described in 'drm_rocket_task' Warning: include/uapi/drm/rocket_accel.h:116 struct member 'tasks' not described in 'drm_rocket_job' Warning: include/uapi/drm/rocket_accel.h:135 struct member 'jobs' not described in 'drm_rocket_submit' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://patch.msgid.link/20251023062440.4093661-1-rdunlap@infradead.org
2026-01-10accel/rocket: factor out code with find_core_for_dev in rocket_removeQuentin Schulz
There already is a function to return the offset of the core for a given struct device, so let's reuse that function instead of reimplementing the same logic. There's one change in behavior when a struct device is passed which doesn't match any core's. Before, we would continue through rocket_remove() but now we exit early, to match what other callers of find_core_for_dev() (rocket_device_runtime_resume/suspend()) are doing. This however should never happen. Aside from that, no intended change in behavior. Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://patch.msgid.link/20251215-rocket-reuse-find-core-v1-1-be86a1d2734c@cherry.de
2026-01-10accel/rocket: fix unwinding in error path in rocket_probeQuentin Schulz
When rocket_core_init() fails (as could be the case with EPROBE_DEFER), we need to properly unwind by decrementing the counter we just incremented and if this is the first core we failed to probe, remove the rocket DRM device with rocket_device_fini() as well. This matches the logic in rocket_remove(). Failing to properly unwind results in out-of-bounds accesses. Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL") Cc: stable@vger.kernel.org Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://patch.msgid.link/20251215-rocket-error-path-v1-2-eec3bf29dc3b@cherry.de
2026-01-10accel/rocket: fix unwinding in error path in rocket_core_initQuentin Schulz
When rocket_job_init() is called, iommu_group_get() has already been called, therefore we should call iommu_group_put() and make the iommu_group pointer NULL. This aligns with what's done in rocket_core_fini(). If pm_runtime_resume_and_get() somehow fails, not only should rocket_job_fini() be called but we should also unwind everything done before that, that is, disable PM, put the iommu_group, NULLify it and then call rocket_job_fini(). This is exactly what's done in rocket_core_fini() so let's call that function instead of duplicating the code. Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL") Cc: stable@vger.kernel.org Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://patch.msgid.link/20251215-rocket-error-path-v1-1-eec3bf29dc3b@cherry.de
2026-01-10erofs: fix file-backed mounts no longer working on EROFS partitionsGao Xiang
Sheng Yong reported [1] that Android APEX images didn't work with commit 072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for now") because "EROFS-formatted APEX file images can be stored within an EROFS-formatted Android system partition." In response, I sent a quick fat-fingered [PATCH v3] to address the report. Unfortunately, the updated condition was incorrect: if (erofs_is_fileio_mode(sbi)) { - sb->s_stack_depth = - file_inode(sbi->dif0.file)->i_sb->s_stack_depth + 1; - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { - erofs_err(sb, "maximum fs stacking depth exceeded"); + inode = file_inode(sbi->dif0.file); + if ((inode->i_sb->s_op == &erofs_sops && !sb->s_bdev) || + inode->i_sb->s_stack_depth) { The condition `!sb->s_bdev` is always true for all file-backed EROFS mounts, making the check effectively a no-op. The real fix tested and confirmed by Sheng Yong [2] at that time was [PATCH v3 RESEND], which correctly ensures the following EROFS^2 setup works: EROFS (on a block device) + EROFS (file-backed mount) But sadly I screwed it up again by upstreaming the outdated [PATCH v3]. This patch applies the same logic as the delta between the upstream [PATCH v3] and the real fix [PATCH v3 RESEND]. Reported-by: Sheng Yong <shengyong1@xiaomi.com> Closes: https://lore.kernel.org/r/3acec686-4020-4609-aee4-5dae7b9b0093@gmail.com [1] Fixes: 072a7c7cdbea ("erofs: don't bother with s_stack_depth increasing for now") Link: https://lore.kernel.org/r/243f57b8-246f-47e7-9fb1-27a771e8e9e8@gmail.com [2] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-10clk: microchip: core: remove unused include asm/traps.hBrian Masney
The asm/traps.h include file is not actually used, so let's go ahead and remove it. Signed-off-by: Brian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20251205-clk-microchip-fixes-v3-3-a02190705e47@redhat.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2026-01-10clk: microchip: core: correct return value on *_get_parent()Brian Masney
roclk_get_parent() and sclk_get_parent() has the possibility of returning -EINVAL, however the framework expects this call to always succeed since the return value is unsigned. If there is no parent map defined, then the current value programmed in the hardware is used. Let's use that same value in the case where -EINVAL is currently returned. This index is only used by clk_core_get_parent_by_index(), and it validates that it doesn't overflow the number of available parents. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202512050233.R9hAWsJN-lkp@intel.com/ Signed-off-by: Brian Masney <bmasney@redhat.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Link: https://lore.kernel.org/r/20251205-clk-microchip-fixes-v3-2-a02190705e47@redhat.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2026-01-10ARM: at91: remove unnecessary of_platform_default_populate callsRob Herring (Arm)
The DT core will call of_platform_default_populate, so it is not necessary for machine specific code to call it unless there are custom match entries, auxdata or parent device. Neither of those apply here, so remove the call. Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20260105-at91-probe-v3-3-594013ff2965@kernel.org Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2026-01-10ARM: at91: Move PM init functions to .init_late hookRob Herring (Arm)
Move the AT91 PM init functions to .init_late hook to ensure driver dependencies have probed. Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20260105-at91-probe-v3-2-594013ff2965@kernel.org Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2026-01-10clk: microchip: core: remove duplicate determine_rate on pic32_sclk_opsBrian Masney
pic32_sclk_ops previously had a sclk_round_rate() member, and this was recently converted over to sclk_determine_rate() with the help of a Coccinelle semantic patch. pic32_sclk_ops now has two conflicting determine_rate ops members. Prior to the conversion, pic32_sclk_ops already had a determine_rate member that points to __clk_mux_determine_rate(). When both the round_rate() and determine_rate() ops are defined, the clk core only uses the determine_rate() op. Let's go ahead and drop the recently converted sclk_determine_rate() to match the previous functionality prior to the conversion. Fixes: e9f039c08cdc ("clk: microchip: core: convert from round_rate() to determine_rate()") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511222115.uvHrP95A-lkp@intel.com/ Signed-off-by: Brian Masney <bmasney@redhat.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Link: https://lore.kernel.org/r/20251205-clk-microchip-fixes-v3-1-a02190705e47@redhat.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2026-01-10platform/surface: aggregator_registry: Add Surface Pro 11 (QCOM)Dale Whinham
This enables support for the Qualcomm-based Surface Pro 11. Signed-off-by: Dale Whinham <daleyo@gmail.com> Signed-off-by: Jérôme de Bretagne <jerome.debretagne@gmail.com> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://patch.msgid.link/20251220-surface-sp11-for-next-v6-3-81f7451edb77@gmail.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-01-10x86/resctrl: Enable RDT_RESOURCE_PERF_PKGTony Luck
Since telemetry events are enumerated on resctrl mount the RDT_RESOURCE_PERF_PKG resource is not considered "monitoring capable" during early resctrl initialization. This means that the domain list for RDT_RESOURCE_PERF_PKG is not built when the CPU hotplug notifiers are registered and run for the first time right after resctrl initialization. Mark the RDT_RESOURCE_PERF_PKG as "monitoring capable" upon successful telemetry event enumeration to ensure future CPU hotplug events include this resource and initialize its domain list for CPUs that are already online. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10fs/resctrl: Move RMID initialization to first mountTony Luck
L3 monitor features are enumerated during resctrl initialization and rmid_ptrs[] that tracks all RMIDs and depends on the number of supported RMIDs is allocated during this time. Telemetry monitor features are enumerated during first resctrl mount and may support a different number of RMIDs compared to L3 monitor features. Delay allocation and initialization of rmid_ptrs[] until first mount. Since the number of RMIDs cannot change on later mounts, keep the same set of rmid_ptrs[] until resctrl_exit(). This is required because the limbo handler keeps running after resctrl is unmounted and needs to access rmid_ptrs[] as it keeps tracking busy RMIDs after unmount. Rename routines to match what they now do: dom_data_init() -> setup_rmid_lru_list() dom_data_exit() -> free_rmid_lru_list() Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10x86,fs/resctrl: Compute number of RMIDs as minimum across resourcesTony Luck
resctrl assumes that only the L3 resource supports monitor events, so it simply takes the rdt_resource::num_rmid from RDT_RESOURCE_L3 as the system's number of RMIDs. The addition of telemetry events in a different resource breaks that assumption. Compute the number of available RMIDs as the minimum value across all mon_capable resources (analogous to how the number of CLOSIDs is computed across alloc_capable resources). Note that mount time enumeration of the telemetry resource means that this number can be reduced. If this happens, then some memory will be wasted as the allocations for rdt_l3_mon_domain::mbm_states[] and rdt_l3_mon_domain::rmid_busy_llc created during resctrl initialization will be larger than needed. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]Tony Luck
closid_num_dirty_rmid[] and rmid_ptrs[] are allocated together during resctrl initialization and freed together during resctrl exit. Telemetry events are enumerated on resctrl mount so only at resctrl mount will the number of RMID supported by all monitoring resources and needed as size for rmid_ptrs[] be known. Separate closid_num_dirty_rmid[] and rmid_ptrs[] allocation and free in preparation for rmid_ptrs[] to be allocated on resctrl mount. Keep the rdtgroup_mutex protection around the allocation and free of closid_num_dirty_rmid[] as ARM needs this to guarantee memory ordering. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKGTony Luck
There are now three meanings for "number of RMIDs": 1) The number for legacy features enumerated by CPUID leaf 0xF. This is the maximum number of distinct values that can be loaded into MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled will force scaling down the CPUID enumerated value by the number of SNC nodes per L3-cache. 2) The number of registers in MMIO space for each event. This is enumerated in the XML files and is the value initialized into event_group::num_rmid. 3) The number of "hardware counters" (this isn't a strictly accurate description of how things work, but serves as a useful analogy that does describe the limitations) feeding to those MMIO registers. This is enumerated in telemetry_region::num_rmids returned by intel_pmt_get_regions_by_feature(). Event groups with insufficient "hardware counters" to track all RMIDs are difficult for users to use, since the system may reassign "hardware counters" at any time. This means that users cannot reliably collect two consecutive event counts to compute the rate at which events are occurring. Disable such event groups by default. The user may override this with a command line "rdt=" option. In this case limit an under-resourced event group's number of possible monitor resource groups to the lowest number of "hardware counters". Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG resource "num_rmid" value to the smallest of these values as this value will be used later to compare against the number of RMIDs supported by other resources to determine how many monitoring resource groups are supported. N.B. Change type of resctrl_mon::num_rmid to u32 to match its usage and the type of event_group::num_rmid so that min(r->num_rmid, e->num_rmid) won't complain about mixing signed and unsigned types. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-10KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkersWill Deacon
Commit ddcadb297ce5 ("KVM: arm64: Ignore EAGAIN for walks outside of a fault") introduced a new walker flag ('KVM_PGTABLE_WALK_HANDLE_FAULT') to KVM's page-table code. When set, the walk logic maintains its previous behaviour of terminating a walk as soon as the visitor callback returns an error. However, when the flag is clear, the walk will continue if the visitor returns -EAGAIN and the error is then suppressed and returned as zero to the caller. Clearing the flag is beneficial when write-protecting a range of IPAs with kvm_pgtable_stage2_wrprotect() but is not useful in any other cases, either because we are operating on a single page (e.g. kvm_pgtable_stage2_mkyoung() or kvm_phys_addr_ioremap()) or because the early termination is desirable (e.g. when mapping pages from a fault in user_mem_abort()). Subsequently, commit e912efed485a ("KVM: arm64: Introduce the EL1 pKVM MMU") hooked up pKVM's hypercall interface to the MMU code at EL1 but failed to propagate any of the walker flags. As a result, page-table walks at EL2 fail to set KVM_PGTABLE_WALK_HANDLE_FAULT even when the early termination semantics are desirable on the fault handling path. Rather than complicate the pKVM hypercall interface, invert the flag so that the whole thing can be simplified and only pass the new flag ('KVM_PGTABLE_WALK_IGNORE_EAGAIN') from the wrprotect code. Cc: Fuad Tabba <tabba@google.com> Cc: Quentin Perret <qperret@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://msgid.link/20260105154939.11041-2-will@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-10iommu/amd: Drop incorrect NULL check for iommu in alloc_irq_table()Rakuram Eswaran
alloc_irq_table() contains a conditional check for a NULL iommu pointer when computing the NUMA node, but the function dereferences iommu in multiple places afterwards. All callers ensure that a valid iommu pointer is passed in, and a NULL iommu is not expected by the current callers. Remove the incorrect NULL check to make the assumptions consistent and address the Smatch warning. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/ Signed-off-by: Rakuram Eswaran <rakuram.e96@gmail.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: simplify list initialization in iommu_create_device_direct_mappings()Can Peng
Use LIST_HEAD() to declare and initialize the 'mappings' list head in iommu_create_device_direct_mappings() instead of separate declaration and INIT_LIST_HEAD(). This simplifies the code by combining declaration and initialization into a single idiomatic form, improving readability without changing functionality. Signed-off-by: Can Peng <pengcan@kylinos.cn> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10dt-bindings: soc: fsl: qe: Add an interrupt controller for QUICC Engine PortsChristophe Leroy (CS GROUP)
The QUICC Engine provides interrupts for a few I/O ports. This is handled via a separate interrupt ID and managed via a triplet of dedicated registers hosted by the SoC. Implement an interrupt driver for it so that those IRQs can then be linked to the related GPIOs. Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/7708243d6cca21004de8b3da87369c06dbee3848.1767804922.git.chleroy@kernel.org Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> [moved from bindings/soc/fsl/cpm_qe/ to bindings/interrupt-controller/ while applying]
2026-01-10iommu/amd: move wait_on_sem() out of spinlockAnkit Soni
With iommu.strict=1, the existing completion wait path can cause soft lockups under stressed environment, as wait_on_sem() busy-waits under the spinlock with interrupts disabled. Move the completion wait in iommu_completion_wait() out of the spinlock. wait_on_sem() only polls the hardware-updated cmd_sem and does not require iommu->lock, so holding the lock during the busy wait unnecessarily increases contention and extends the time with interrupts disabled. Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10soc: fsl: qe: Add an interrupt controller for QUICC Engine PortsChristophe Leroy (CS GROUP)
The QUICC Engine provides interrupts for a few I/O ports. This is handled via a separate interrupt ID and managed via a triplet of dedicated registers hosted by the SoC. Implement an interrupt driver for it so that those IRQs can then be linked to the related GPIOs. Link: https://lore.kernel.org/r/63f19db21a91729d91b3df336a56a7eb4206e561.1767804922.git.chleroy@kernel.org Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
2026-01-10iommu: debug-pagealloc: Check mapped/unmapped kernel memoryMostafa Saleh
Now, as the page_ext holds count of IOMMU mappings, we can use it to assert that any page allocated/freed is indeed not in the IOMMU. The sanitizer doesn’t protect against mapping/unmapping during this period. However, that’s less harmful as the page is not used by the kernel. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: debug-pagealloc: Track IOMMU pagesMostafa Saleh
Using the new calls, use an atomic refcount to track how many times a page is mapped in any of the IOMMUs. For unmap we need to use iova_to_phys() to get the physical address of the pages. We use the smallest supported page size as the granularity of tracking per domain. This is important as it is possible to map pages and unmap them with larger sizes (as in map_sg()) cases. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: Add calls for IOMMU_DEBUG_PAGEALLOCMostafa Saleh
Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC: - iommu_debug_init: Enable the debug mode if configured by the user. - iommu_debug_map: Track iommu pages mapped, using physical address. - iommu_debug_unmap_begin: Track start of iommu unmap operation, with IOVA and size. - iommu_debug_unmap_end: Track the end of unmap operation, passing the actual unmapped size versus the tracked one at unmap_begin. We have to do the unmap_begin/end as once pages are unmapped we lose the information of the physical address. This is racy, but the API is racy by construction as it uses refcounts and doesn't attempt to lock/synchronize with the IOMMU API as that will be costly, meaning that possibility of false negative exists. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOCMostafa Saleh
Add a new config IOMMU_DEBUG_PAGEALLOC, which registers new data to page_ext. This config will be used by the IOMMU API to track pages mapped in the IOMMU to catch drivers trying to free kernel memory that they still map in their domains, causing all types of memory corruption. This behaviour is disabled by default and can be enabled using kernel cmdline iommu.debug_pagealloc. Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommupt: Make pt_feature() always_inlineJason Gunthorpe
gcc 8.5 on powerpc does not automatically inline these functions even though they evaluate to constants in key cases. Since the constant propagation is essential for some code elimination and built-time checks this causes a build failure: ERROR: modpost: "__pt_no_sw_bit" [drivers/iommu/generic_pt/fmt/iommu_amdv1.ko] undefined! Caused by this: if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) && !pt_test_sw_bit_acquire(&pts, SW_BIT_CACHE_FLUSH_DONE)) flush_writes_item(&pts); Where pts_feature() evaluates to a constant false. Mark them as __always_inline to force it to evaluate to a constant and trigger the code elimination. Fixes: 7c5b184db714 ("genpt: Generic Page Table base API") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512230720.9y9DtWIo-lkp@intel.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-01-10iommufd/selftest: Prevent module/builtin conflicts in kconfigJason Gunthorpe
The selftest now depends on the AMDv1 page table, however the selftest kconfig itself is just an sub-option of the main IOMMUFD module kconfig. This means it cannot be modular and so kconfig allowed a modular IOMMU_PT_AMDV1 with a built in IOMMUFD. This causes link failures: ld: vmlinux.o: in function `mock_domain_alloc_pgtable.isra.0': selftest.c:(.text+0x12e8ad3): undefined reference to `pt_iommu_amdv1_init' ld: vmlinux.o: in function `BSWAP_SHUFB_CTL': sha1-avx2-asm.o:(.rodata+0xaa36a8): undefined reference to `pt_iommu_amdv1_read_and_clear_dirty' ld: sha1-avx2-asm.o:(.rodata+0xaa36f0): undefined reference to `pt_iommu_amdv1_map_pages' ld: sha1-avx2-asm.o:(.rodata+0xaa36f8): undefined reference to `pt_iommu_amdv1_unmap_pages' ld: sha1-avx2-asm.o:(.rodata+0xaa3720): undefined reference to `pt_iommu_amdv1_iova_to_phys' Adjust the kconfig to disable IOMMUFD_TEST if IOMMU_PT_AMDV1 is incompatible. Fixes: e93d5945ed5b ("iommufd: Change the selftest to use iommupt instead of xarray") Suggested-by: Arnd Bergmann <arnd@arndb.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512210135.freQWpxa-lkp@intel.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFERJason Gunthorpe
The test doesn't build without it, dma-buf.h does not provide stub functions if it is not enabled. Compilation can fail with: ERROR:root:ld: vmlinux.o: in function `iommufd_test': (.text+0x3b1cdd): undefined reference to `dma_buf_get' ld: (.text+0x3b1d08): undefined reference to `dma_buf_put' ld: (.text+0x3b2105): undefined reference to `dma_buf_export' ld: (.text+0x3b211f): undefined reference to `dma_buf_fd' ld: (.text+0x3b2e47): undefined reference to `dma_buf_move_notify' Add the missing select. Fixes: d2041f1f11dd ("iommufd/selftest: Add some tests for the dmabuf flow") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommupt: Fix the kunit buildingJason Gunthorpe
The kunit doesn't work since the below commit made GENERIC_PT unselectable: $ make ARCH=x86_64 O=build_kunit_x86_64 olddefconfig ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config. This is probably due to unsatisfied dependencies. Missing: CONFIG_DEBUG_GENERIC_PT=y, CONFIG_IOMMUFD_TEST=y, CONFIG_IOMMU_PT_X86_64=y, CONFIG_GENERIC_PT=y, CONFIG_IOMMU_PT_AMDV1=y, CONFIG_IOMMU_PT_VTDSS=y, CONFIG_IOMMU_PT=y, CONFIG_IOMMU_PT_KUNIT_TEST=y Also remove the unneeded CONFIG_IOMMUFD_TEST reference as the iommupt kunit doesn't interact with iommufd, and it doesn't currently build for the kunit due problems with DMA_SHARED buffer either. Fixes: 01569c216dde ("genpt: Make GENERIC_PT invisible") Fixes: 1dd4187f53c3 ("iommupt: Add a kunit test for Generic Page Table") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10HID: logitech-hidpp: Check maxfield in hidpp_get_report_length()Günther Noack
Do not crash when a report has no fields. Fake USB gadgets can send their own HID report descriptors and can define report structures without valid fields. This can be used to crash the kernel over USB. Cc: stable@vger.kernel.org Signed-off-by: Günther Noack <gnoack@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2026-01-10HID: prodikeys: Check presence of pm->input_ep82Günther Noack
Fake USB devices can send their own report descriptors for which the input_mapping() hook does not get called. In this case, pm->input_ep82 stays NULL, which leads to a crash later. This does not happen with the real device, but can be provoked by imposing as one. Cc: stable@vger.kernel.org Signed-off-by: Günther Noack <gnoack@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2026-01-10PCI: Suspend iommu function prior to resetting a deviceNicolin Chen
PCIe permits a device to ignore ATS invalidation TLPs while processing a reset. This creates a problem visible to the OS where an ATS invalidation command will time out: e.g. an SVA domain will have no coordination with a reset event and can racily issue ATS invalidations to a resetting device. The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and block ATS before initiating a Function Level Reset. It also mentions that other reset methods could have the same vulnerability as well. The IOMMU subsystem provides pci_dev_reset_iommu_prepare/done() callback helpers for this matter. Use them in all the existing reset functions. This will attach the device to its iommu_group->blocking_domain during the device reset, so as to allow IOMMU driver to: - invoke pci_disable_ats() and pci_enable_ats(), if necessary - wait for all ATS invalidations to complete - stop issuing new ATS invalidations - fence any incoming ATS queries Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommu: Introduce pci_dev_reset_iommu_prepare/done()Nicolin Chen
PCIe permits a device to ignore ATS invalidation TLPs while processing a reset. This creates a problem visible to the OS where an ATS invalidation command will time out. E.g. an SVA domain will have no coordination with a reset event and can racily issue ATS invalidations to a resetting device. The OS should do something to mitigate this as we do not want production systems to be reporting critical ATS failures, especially in a hypervisor environment. Broadly, OS could arrange to ignore the timeouts, block page table mutations to prevent invalidations, or disable and block ATS. The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and block ATS before initiating a Function Level Reset. It also mentions that other reset methods could have the same vulnerability as well. Provide a callback from the PCI subsystem that will enclose the reset and have the iommu core temporarily change all the attached RID/PASID domains group->blocking_domain so that the IOMMU hardware would fence any incoming ATS queries. And IOMMU drivers should also synchronously stop issuing new ATS invalidations and wait for all ATS invalidations to complete. This can avoid any ATS invaliation timeouts. However, if there is a domain attachment/replacement happening during an ongoing reset, ATS routines may be re-activated between the two function calls. So, introduce a new resetting_domain in the iommu_group structure to reject any concurrent attach_dev/set_dev_pasid call during a reset for a concern of compatibility failure. Since this changes the behavior of an attach operation, update the uAPI accordingly. Note that there are two corner cases: 1. Devices in the same iommu_group Since an attachment is always per iommu_group, this means that any sibling devices in the iommu_group cannot change domain, to prevent race conditions. 2. An SR-IOV PF that is being reset while its VF is not In such case, the VF itself is already broken. So, there is no point in preventing PF from going through the iommu reset. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>