summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-05-18sched/cache: Calculate the LLC size and store it in sched_domainChen Yu
Cache aware scheduling needs to know the LLC size that a process can use, so as to avoid memory-intensive tasks from being over-aggregated on a single LLC. Introduce a preparation patch to add get_effective_llc_bytes() to get the LLC size that a CPU can use. The function can be further enhanced by subtracting the LLC cache ways reserved by resctrl (CAT in Intel RDT, etc). Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingyin Duan <tingyin.duan@gmail.com> Link: https://patch.msgid.link/37afee09ff608034da0ce149e72d33b6f4698edf.1778703694.git.tim.c.chen@linux.intel.com
2026-05-18sched/cache: Disable cache aware scheduling for processes with high thread ↵Chen Yu
counts A performance regression was observed by Prateek when running hackbench with many threads per process (high fd count). To avoid this, processes with a large number of active threads are excluded from cache-aware scheduling. With sched_cache enabled, record the number of active threads in each process during the periodic task_cache_work(). While iterating over CPUs, if the currently running task belongs to the same process as the task that launched task_cache_work(), increment the active thread count. If the number of active threads within the process exceeds the number of Cores (divided by the SMT number) in the LLC, do not enable cache-aware scheduling. However, on systems with a smaller number of CPUs within 1 LLC, like Power10/Power11 with SMT4 and an LLC size of 4, this check effectively disables cache-aware scheduling for any process. One possible solution suggested by Peter is to use an LLC-mask instead of a single LLC value for preference. Once there are a 'few' LLCs as preference, this constraint becomes a little easier. It could be an enhancement in the future. For users who wish to perform task aggregation regardless, a debugfs knob is provided for tuning in a subsequent change. Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Aaron Lu <ziqianlu@bytedance.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingyin Duan <tingyin.duan@gmail.com> Link: https://patch.msgid.link/d076cd21a8e6c6341d1e2d927e118db770ebb650.1778703694.git.tim.c.chen@linux.intel.com
2026-05-18sched/cache: Allow only 1 thread of the process to calculate the LLC occupancyJianyong Wu
Scanning online CPUs to calculate the occupancy might be time-consuming. Only allow 1 thread of the process to scan the CPUs at the same time, which is similar to what NUMA balance does in task_numa_work(). Signed-off-by: Jianyong Wu <wujianyong@hygon.cn> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/5672b52e588b855b01e5a1a17822f7c6c7237a3d.1778703694.git.tim.c.chen@linux.intel.com
2026-05-18cgroup/rstat: validate cpu before css_rstat_cpu() accessQing Ming
css_rstat_updated() is exposed as a BPF kfunc and accepts a caller-provided cpu argument. The function uses cpu for per-cpu rstat lookups without checking whether it refers to a valid possible CPU. A BPF iter/cgroup program with CAP_BPF and CAP_PERFMON can pass an invalid cpu value. On an unfixed UBSCAN_BOUNDS test kernel, cpu == 0x7fffffff triggers: UBSAN: array-index-out-of-bounds in kernel/cgroup/rstat.c:31:9 index 2147483647 is out of range for type 'long unsigned int [64]' Call Trace: css_rstat_updated bpf_iter_run_prog cgroup_iter_seq_show bpf_seq_read Add cpu validation to the BPF-facing css_rstat_updated() kfunc and move the common implementation to __css_rstat_updated() for in-kernel callers. Fixes: a319185be9f5 ("cgroup: bpf: enable bpf programs to integrate with rstat") Signed-off-by: Qing Ming <a0yami@mailbox.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-18mailbox: Make mbox_send_message() return error code when tx failsJoonwon Kang
When the mailbox controller failed transmitting message, the error code was only passed to the client's tx done handler and not to mbox_send_message() in blocking mode. For this reason, the function could return a false success. This commit resolves the issue by introducing the tx status and checking it before mbox_send_message() returns. This commit works with the premise that the multi-threads' access to a channel in blocking mode is serialized by clients, not by the mailbox APIs, since the current mbox_send_message() in blocking mode does not support multi-threads. Signed-off-by: Joonwon Kang <joonwonkang@google.com> Reviewed-by: Sudeep Holla <sudeep.holla@kernel.org> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2026-05-18ASoC: Add support for GPIOs driven amplifiersMark Brown
Herve Codina <herve.codina@bootlin.com> says: On some embedded system boards, audio amplifiers are designed using discrete components such as op-amp, several resistors and switches to either adjust the gain (switching resistors) or fully switch the audio signal path (mute and/or bypass features). Those switches are usually driven by simple GPIOs. This kind of amplifiers are not handled in ASoC and the fallback is to let the user-space handle those GPIOs out of the ALSA world. In order to have those kind of amplifiers fully integrated in the audio stack, this series introduces the audio-gpio-amp to handle them. This new ASoC component allows to have the amplifiers seen as ASoC auxiliarty devices and so it allows to control them through audio mixer controls. In order to ease the review, I choose to split modifications related to the merge of the gpio-audio-amp part into the simple-amplfier driver in several commits. Link: https://patch.msgid.link/20260513081702.317117-1-herve.codina@bootlin.com
2026-05-18of: Introduce of_property_read_s32_index()Herve Codina
Signed integers can be read from single value properties using of_property_read_s32() but nothing exist to read signed integers from multi-value properties. Fix this lack adding of_property_read_s32_index(). Signed-off-by: Herve Codina <herve.codina@bootlin.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260513081702.317117-2-herve.codina@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-05-18Merge tag 'soc_fsl-7.1-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux into soc/drivers FSL SOC Changes for 7.1 Freescale QUICC Engine: - Add missing cleanup on device removal and switch to irq_domain_create_linear() in interrupt controller for IO Ports - Panic on ioremap() failure in qe_reset() Freescale Management Complex: - Move fsl-mc over to device MSI infrastructure - Wait for the MC firmware to complete its boot Freescale Hypervisor: - Fix header kernel-doc warnings * tag 'soc_fsl-7.1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux: bus: fsl-mc: wait for the MC firmware to complete its boot soc: fsl: qe: panic on ioremap() failure in qe_reset() soc: fsl: qe_ports_ic: switch to irq_domain_create_linear() soc: fsl: qe_ports_ic: Add missing cleanup on device removal virt: fsl_hypervisor: fix header kernel-doc warnings platform-msi: Remove stale comment fsl-mc: Remove legacy MSI implementation fsl-mc: Switch over to per-device platform MSI irqchip/gic-v3-its: Add fsl_mc device plumbing to the msi-parent handling fsl-mc: Add minimal infrastructure to use platform MSI fsl-mc: Remove MSI domain propagation to sub-devices Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-05-18Merge tag 'vfs-7.1-rc5.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a fixes for the current development cycle. Note that AI related review sometimes delays fixes a bit because we find more fixes for the fixes. I might try and send smaller but more fixes PRs if this trend keeps up. - Fix various netfslib bugs - Fix an out-of-bounds write when listing idmappings - Fix the return values in jfs_mkdir() and orangefs_mkdir() - Fix a writeback writeback array overflow in fuse - Fix a forced iversion increment on lazytime timestamp updates - Reject a negative timeval component in kern_select() - Fix error return when vfs_mkdir() fails in the cachefiles code - Fix wrong error code returned for pidns ioctls" * tag 'vfs-7.1-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits) cachefiles: Fix error return when vfs_mkdir() fails afs: Fix the locking used by afs_get_link() netfs, afs: Fix write skipping in dir/link writepages netfs: Fix netfs_read_folio() to wait on writeback netfs: Fix folio->private handling in netfs_perform_write() netfs: Fix partial invalidation of streaming-write folio netfs: Fix potential UAF in netfs_unlock_abandoned_read_pages() netfs: Fix leak of request in netfs_write_begin() error handling netfs: Fix early put of sink folio in netfs_read_gaps() netfs: Fix write streaming disablement if fd open O_RDWR netfs: Fix read-gaps to remove netfs_folio from filled folio netfs: Fix potential deadlock in write-through mode netfs: Fix streaming write being overwritten netfs: Defer the emission of trace_netfs_folio() netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone netfs: Fix overrun check in netfs_extract_user_iter() netfs: fix error handling in netfs_extract_user_iter() netfs: Fix potential uninitialised var in netfs_extract_user_iter() netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call netfs: Fix zeropoint update where i_size > remote_i_size ...
2026-05-18iomap: don't make REQ_POLLED imply REQ_NOWAITChristoph Hellwig
As described in commit 2bc057692599 ("block: don't make REQ_POLLED imply REQ_NOWAIT"), which fixed the same issue for the block device node, there are valid cases to poll for I/O completion without REQ_NOWAIT. Additionally, sing REQ_NOWAIT for file system writes is currently not supported as file systems writes are not idempotent and would need a retry of just the bio and not the entire operation to be fully supported. Switch iomap to set REQ_POLLED and remove the now unused bio_set_polled helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260518062917.506483-1-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-18coresight: Move CPU hotplug callbacks to core layerLeo Yan
This commit moves CPU hotplug callbacks from ETMv4 driver to core layer. The motivation is the core layer can control all components on an activated path rather but not only managing tracer in ETMv4 driver. The perf event layer will disable CoreSight PMU event 'cs_etm' when hotplug off a CPU. That means a perf mode will be always converted to disabled mode in CPU hotplug. Arm CoreSight CPU hotplug callbacks only need to handle the Sysfs mode and ignore the perf mode. Add a 'mode' argument to coresight_pm_get_active_path() so it only returns active paths for the relevant mode. Define the enum with bit flags so it is safe for bitwise operations. Change CPUHP_AP_ARM_CORESIGHT_STARTING to CPUHP_AP_ARM_CORESIGHT_ONLINE so that the CPU hotplug callback runs in the online state and thread context, allowing coresight_disable_sysfs() to be called directly to disable the path. Tested-by: James Clark <james.clark@linaro.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Jie Gan <jie.gan@oss.qualcomm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-27-f88c4a3ecfe9@arm.com
2026-05-18coresight: sysfs: Increment refcount only for software sourceLeo Yan
Except for software sources (e.g. STM), other sources treat multiple enables as equivalent to a single enable. The device mode already tracks the binary state, so it is redundant to operate refcount. Introduce a helper coresight_is_software_source() for check software source. Refactor to maintain the refcount only for software sources. This simplifies future CPU PM handling without refcount logic. Tested-by: James Clark <james.clark@linaro.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Jie Gan <jie.gan@oss.qualcomm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-26-f88c4a3ecfe9@arm.com
2026-05-18ata: libata-scsi: do not needlessly defer commands when using PMP with FBSNiklas Cassel
The ACS specification does not allow a non-NCQ command to be issued while an NCQ command is outstanding. Commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") introduced a feature where a deferred non-NCQ command gets issued from a workqueue. The design stores a single non-NCQ command per port. However, when using Port Multipliers (PMPs), specifically PMPs that support FIS-Based Switching (FBS), non-NCQ and NCQ commands can be mixed on the same port, just not for the same link, see e.g. ata_std_qc_defer() which is, and always has operated on a per-link basis. Therefore, move the deferred_qc from struct ata_port to struct ata_link. This way, when using a PMP with FBS, we will not needlessly defer commands to all other links, just because one link issued a non-NCQ command while having an NCQ command outstanding. Only commands for that specific link will be deferred. This is in line with how PMPs with FBS worked before commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation"). Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Tested-by: Tommy Kelly <linux@tkel.ly> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
2026-05-18ata: libata-scsi: do not use the deferred QC feature on PMPs with CBSNiklas Cassel
When using Port Multipliers (PMPs) with Command-Based Switching (CBS), you can only issue commands to one link at a time. For PMPs with CBS, there is already code to handle commands being sent to different links in sata_pmp_qc_defer_cmd_switch() using ap->excl_link. sata_sil24 also makes use of ap->excl_link. A user on the list reported that commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") broke PMPs with CBS. The commit introduced code that stores a deferred qc in ap->deferred_qc, to later be issued via a workqueue. It turns out that this change is incompatible with the existing ap->excl_link handling used by PMPs with CBS. Thus, modify sata_pmp_qc_defer_cmd_switch() and sil24_qc_defer() to return ATA_DEFER_LINK_EXCL, and make sure that the deferred QC handling via workqueue is not used for this return value. This way, PMPs with CBS will work once again. Note that the starvation referenced in commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") can only happen on libsas ports, and libsas does not support Port Multipliers, thus there is no harm of reverting back to the previous way of deferring commands for PMPs with CBS. Non-libsas ports connected to anything but a PMP with CBS (e.g. a normal drive or a PMP with FBS) will continue using the deferred workqueue, since it does result in lower completion latencies for non-NCQ commands, even though the workqueue is not strictly needed to avoid starvation for non-libsas ports. If we want to modify the scope of the workqueue issuing to also handle PMPs with CBS, then we should ensure that we can save both NCQ and non-NCQ commands in ap->deferred_qc, while also removing the existing PMP CBS handling using ap->excl_link, such that we don't duplicate features. While at it, also add a comment explaining how the ap->excl_link mechanism works. Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Tested-by: Tommy Kelly <linux@tkel.ly> Reported-by: Tommy Kelly <linux@tkel.ly> Closes: https://lore.kernel.org/linux-ide/ce09cc21-a8e9-4845-b205-35411e22fba9@tkel.ly/ Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
2026-05-18spi: switch to managed controller allocation (part 3/3)Mark Brown
Johan Hovold <johan@kernel.org> says: In preparation for fixing the SPI controller API so that it no longer drops a reference when deregistering (non-managed) controllers (cf. [1]), this series converts drivers using managed registration to also use managed allocation. Included is also a related cleanup of a lp8841-rtc. This leaves us with 18 drivers using non-managed allocation, which is few enough to be able to fix the API in tree-wide change. Johan [1] https://lore.kernel.org/lkml/20260325145319.1132072-1-johan@kernel.org/ Link: https://patch.msgid.link/20260511150408.796155-1-johan@kernel.org
2026-05-18coresight: Save active path for system tracersLeo Yan
This commit only set the path pointer for system tracers (e.g. STM) in coresight_{enable|disable}_source(). Later changes will set the path pointer locally for per-CPU sources. This is because the mode and path pointer must be set together, so that they are observed atomically by the CPU PM notifier. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Jie Gan <jie.gan@oss.qualcomm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-19-f88c4a3ecfe9@arm.com
2026-05-18coresight: Register CPU PM notifier in core layerLeo Yan
The current implementation only saves and restores the context for ETM sources while ignoring the context of links. However, if funnels or replicators on a linked path resides in a CPU or cluster power domain, the hardware context for the link will be lost after resuming from low power states. To support context management for links during CPU low power modes, a better way is to implement CPU PM callbacks in the Arm CoreSight core layer. As the core layer has sufficient information for linked paths, from tracers to links, which can be used for power management. As a first step, this patch registers CPU PM notifier in the core layer. If a source device provides callbacks for saving and restoring context, these callbacks will be invoked in CPU suspend and resume. Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: James Clark <james.clark@linaro.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Tested-by: Jie Gan <jie.gan@oss.qualcomm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-11-f88c4a3ecfe9@arm.com
2026-05-18coresight: Remove .cpu_id() callback from source opsLeo Yan
The CPU ID can be fetched directly from the coresight_device structure, so the .cpu_id() callback is no longer needed. Remove the .cpu_id() callback from source ops and update callers accordingly. Tested-by: Jie Gan <jie.gan@oss.qualcomm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-5-f88c4a3ecfe9@arm.com
2026-05-18coresight: Populate CPU ID into coresight_deviceLeo Yan
Add a new flag CORESIGHT_DESC_CPU_BOUND to indicate components that are CPU bound. Populate CPU ID into the coresight_device structure; otherwise, set CPU ID to -1 for non CPU bound devices. Use the {0} initializer to clear coresight_desc structures to avoid uninitialized values. Tested-by: Jie Gan <jie.gan@oss.qualcomm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-4-f88c4a3ecfe9@arm.com
2026-05-18dio: Update DIO_SCMAX commentGeert Uytterhoeven
DIO-II support was added in 2004, update a comment to reflect this. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://patch.msgid.link/5aa3901baaa5d145804e1a836dd8ee3fb07ea144.1777897387.git.geert@linux-m68k.org
2026-05-18clocksource: Add devm_clocksource_register_*() helpersDaniel Lezcano
Introduce device-managed helpers for clocksource registration. The clocksource framework currently provides __clocksource_register_scale() along with convenience wrappers for Hz and kHz registration. However, drivers must handle error paths and cleanup manually, typically by pairing registration with an explicit clocksource_unregister() call. Add a devm-based variant, __devm_clocksource_register_scale(), along with devm_clocksource_register_hz() and devm_clocksource_register_khz() helpers. These helpers register the clocksource and attach a devres action to automatically unregister it on driver detach or probe failure. This simplifies driver code by: * removing explicit cleanup paths * ensuring correct teardown ordering * aligning with the devm-based resource management model widely used across the kernel While drivers can open-code devm_add_action_or_reset(), providing a dedicated helper avoids duplication, reduces boilerplate, and ensures consistent usage across drivers, following patterns used in other subsystems. This is also particularly useful for drivers built as modules, where device-managed resource handling avoids manual cleanup in remove paths and ensures correct teardown on module unload. This helper is self-contained and can be adopted progressively by drivers. No functional change. Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260506153831.605159-1-daniel.lezcano@oss.qualcomm.com
2026-05-17bpf,x86: Fix exception unwinding with outgoing stack argumentsYonghong Song
When a main program with exception_boundary has outgoing stack arguments (e.g. from calling subprogs with >5 args), bpf_throw() fails to correctly restore callee-saved registers, causing a kernel crash. The x86 JIT allocates the outgoing stack arg area below the callee-saved registers via 'sub rsp, outgoing_rsp' in the prologue. When bpf_throw() unwinds, it captures the main program's sp (which includes this outgoing area) and passes it to the exception callback. The callback gets rsp and rbp, followed by pop_callee_regs, but rsp points into the outgoing arg area rather than the callee-saved registers, so the pops restore garbage values. Returning to the kernel with corrupted callee-saved registers causes a crash. Fix this by adjusting the sp (adding stack_arg_sp_adjust) passed to the exception callback, so it points to the bottom of the callee-saved registers instead of the outgoing arg area. When stack_arg_sp_adjust is 0 (the common case), this is a no-op. Fixes: 324c3ca6eed6 ("bpf,x86: Implement JIT support for stack arguments") Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20260517150702.288031-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-17bpf: Check global subprog exception pathsKumar Kartikeya Dwivedi
Global subprogs are verified independently and are not descended into when their callers are symbolically executed. This means a caller can hold references or locks across a global subprog call that may throw, while the verifier only checks the non-exceptional return path at the call site. Record whether a subprog might throw in the CFG summary pass, alongside the existing might_sleep and packet-data-changing summaries, and propagate that effect through reachable callees. When a global subprog is marked as possibly throwing, push the normal continuation and validate the exceptional path immediately at the call site, avoiding a synthetic exception state and associated special case in the pruning checks. Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260517075530.3461166-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-17Merge tag 'sched-urgent-2026-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: - Fix ARM64-specific rseq regressions (Mark Rutland) * tag 'sched-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/entry: Fix arm64-specific rseq brokenness
2026-05-17Merge tag 'irq-urgent-2026-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull IRQ fixes from Ingo Molnar: - Fix use-after-free in irq_work_single() on PREEMPT_RT (Jiayuan Chen) - Don't call add_interrupt_randomness() for NMIs in handle_percpu_devid_irq() (Mark Rutland) - Remove unused function in the ath79-cpu irqchip driver causing LKP CI build warnings (Rosen Penev) - Fix IRQ allocation/teardown leakage regressions in the GICv5 irqchip driver (Sascha Bischoff) - Fix an IRQ trigger type regression in the Meson S4 SoC irqchip driver (Xianwei Zhao) - Fix CPU offlining regression in the RiscV IMSIC irqchip driver (Yong-Xuan Wang) * tag 'irq-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT irqchip/riscv-imsic: Clear interrupt move state during CPU offlining irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type() irqchip/ath79-cpu: Remove unused function genirq/chip: Don't call add_interrupt_randomness() for NMIs irqchip/gic-v5: Allocate ITS parent LPIs as a range irqchip/gic-v5: Support range allocation for LPIs irqchip/gic-v5: Move LPI allocation into the LPI domain
2026-05-17firmware: arm_ffa: Set the core device as FF-A device parentSudeep Holla
Pass a parent device into ffa_device_register() and use the synthetic arm-ffa platform device as the parent for each registered FF-A device. This keeps the enumerated FF-A partition devices anchored below the FF-A core device in the driver model, matching the platform-driver conversion of the core transport. Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://patch.msgid.link/20260508-b4-ffa_plat_dev-v1-3-c5a30f8cf7b8@kernel.org Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-05-16filelock: move LEASE_BREAK_* flags out of #ifdef CONFIG_FILE_LOCKINGrefs/merge-window/a7e15565268e294de1ce6b56f6eb3e3b6edc43abJeff Layton
This was causing a build break when CONFIG_FILE_LOCKING was disabled. Move the LEASE_BREAK_* flags into the non-#ifdef'ed part of the file. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605161232.1lY6pZoM-lkp@intel.com/ Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260516-dir-deleg-fix-v1-1-1b68f0aa990a@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15Merge tag 'vfio-v7.1-rc4' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fixes from Alex Williamson: - Convert vfio-pci BAR resource requests and iomaps initialization from a lazy, on-demand model to an eager pre-allocation model to avoid races while preserving legacy error behavior. Fix unchecked barmap access in dma-buf export path (Matt Evans) - Introduce an implicit unsigned cast in converting vfio-pci device offsets to region indexes, closing a potential out-of-bounds access through the vfio_pci_ioeventfd() interface (Matt Evans) - Fix a dma-buf kref underflow and stuck wait_for_completion() when closing a previously revoked dma-buf (Alex Williamson) * tag 'vfio-v7.1-rc4' of https://github.com/awilliam/linux-vfio: vfio/pci: Check BAR resources before exporting a DMABUF vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() vfio/pci: Make VFIO_PCI_OFFSET_TO_INDEX() return unsigned vfio/pci: fix dma-buf kref underflow after revoke
2026-05-15Merge tag 'block-7.1-20260515' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - NVMe merge request via Keith: - Fix memory leak on a passthrough integrity mapping failure (Keith) - Hide secrets behind debug option (Hannes) - Fix pci use-after-free for host memory buffer (Chia-Lin Kao) - Fix tcp taregt use-after-free for data digest (Sagi) - Revert a mistaken quirk (Alan Cui) - Fix uevent and controller state race condition (Maurizio) - Fix apple submission queue re-initialization (Nick Chan) - Three fixes for blk-integrity, fixing an issue with the user data mapping and two problems with recomputing number of segments - Two fixes for the iov_iter bounce buffering - Fix for the handling of dead zoned write plugs - ublk max_sectors validation fix, with associated selftest addition * tag 'block-7.1-20260515' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: nvme-apple: Reset q->sq_tail during queue init block: align down bounces bios block: pass a minsize argument to bio_iov_iter_bounce selftests: ublk: cap nthreads to kernel's actual nr_hw_queues block: fix handling of dead zone write plugs block: bio-integrity: Fix null-ptr-deref in bio_integrity_map_user() block: recompute nr_integrity_segments in blk_insert_cloned_request block: don't overwrite bip_vcnt in bio_integrity_copy_user() nvme: fix race condition between connected uevent and STARTED_ONCE flag Revert "nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808" nvmet-tcp: Fix potential UAF when ddgst mismatch nvme-pci: fix use-after-free in nvme_free_host_mem() nvmet-auth: Do not print DH-HMAC-CHAP secrets nvme: fix bio leak on mapping failure nvme: make prp passthrough usage less scary ublk: reject max_sectors smaller than PAGE_SECTORS in parameter validation
2026-05-15Merge tag 'platform-drivers-x86-v7.1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - asus-nb-wmi: - Use existing keyboard quirk for ASUS Zenbook Duo UX8407AA - hp-wmi: - Add support for Victus 16-r0xxx (8BC2) - intel/vsec_tpmi: - Move debugfs register before creating devices - Prevent fault during unbind - lenovo-wmi-*: - Fix memory leak in lwmi_dev_evaluate_int() - Balance IDA id allocation and free - Balance component bind and unbind - Prevent sending uninitialized WMI arguments to the device - Decouple lenovo-wmi-gamezone and lenovo-wmi-other to simplify module dependency graph - Limit adding attributes to supported devices - samsung-galaxybook: - Handle kbd backlight, mic mute and camera block hotkeys * tag 'platform-drivers-x86-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8407AA platform/x86: lenovo-wmi-other: Limit adding attributes to supported devices platform/x86: lenovo-wmi-other: Add Attribute ID helper functions platform/x86: lenovo-wmi-helpers: Move gamezone enums to wmi-helpers platform/x86: lenovo: Decouple lenovo-wmi-gamezone and lenovo-wmi-other platform/x86: lenovo-wmi-other: Fix tunable_attr_01 struct members platform/x86: lenovo-wmi-other: Zero initialize WMI arguments platform/x86: lenovo-wmi-other: Balance component bind and unbind platform/x86: lenovo-wmi-other: Balance IDA id allocation and free platform/x86: lenovo-wmi-helpers: Fix memory leak in lwmi_dev_evaluate_int() platform/x86: hp-wmi: Add support for Victus 16-r0xxx (8BC2) platform/x86/intel/tpmi/plr: Prevent fault during unbind platform/x86: intel: Add notifiers support platform/x86: intel: Move debugfs register before creating devices platform/x86: samsung-galaxybook: Handle ACPI hotkey notifications platform/x86: samsung-galaxybook: Refactor camera lens cover input device
2026-05-15Merge tag 'v7.1-p4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - Fix potential dead-lock in rhashtable when used by xattr - Avoid calling kvfree on atomic path in rhashtable * tag 'v7.1-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: rhashtable: Add bucket_table_free_atomic() helper mm/slab: Add kvfree_atomic() helper rhashtable: drop ht->mutex in rhashtable_free_and_destroy()
2026-05-15fsnotify: add FSNOTIFY_EVENT_RENAME data typeJeff Layton
Add a new fsnotify_rename_data struct and FSNOTIFY_EVENT_RENAME data type that carries both the moved dentry and the inode that was overwritten by the rename (if any). Update fsnotify_data_inode(), fsnotify_data_dentry(), and fsnotify_data_sb() to handle the new type, and add a new fsnotify_data_rename_target() helper for extracting the overwritten target inode. Update fsnotify_move() to use the new data type for FS_RENAME and FS_MOVED_TO events, passing the overwritten target inode through the event data. FS_MOVED_FROM is unchanged since the source directory doesn't need overwrite information. This is done so that fsnotify consumers like nfsd can atomically observe the overwritten file when a rename replaces an existing entry, without needing a separate FS_DELETE event. Assisted-by: Claude (Anthropic Claude Code) Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260428-dir-deleg-v3-7-5a0780ba9def@kernel.org Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15fsnotify: add fsnotify_modify_mark_mask()Jeff Layton
nfsd needs to be able to modify the mask on an existing mark when new directory delegations are set or unset. Add an exported function that allows the caller to set and clear bits in the mark->mask, and does the recalculation if something changed. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260428-dir-deleg-v3-6-5a0780ba9def@kernel.org Acked-by: Jan Kara <jack@suse.cz> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15filelock: add an inode_lease_ignore_mask helperJeff Layton
Add a new routine that returns a mask of all dir change events that are currently ignored by any leases. nfsd will use this to determine how to configure the fsnotify_mark mask. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260428-dir-deleg-v3-4-5a0780ba9def@kernel.org Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15filelock: add support for ignoring deleg breaks for dir change eventsJeff Layton
If a NFS client requests a directory delegation with a notification bitmask covering directory change events, the server shouldn't recall the delegation. Instead the client will be notified of the change after the fact. Add support for ignoring lease breaks on directory changes. Add a new flags parameter to try_break_deleg() and teach __break_lease how to ignore certain types of delegation break events. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260428-dir-deleg-v3-2-5a0780ba9def@kernel.org Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15cgroup: Add per-subsys-css kill_css_finish deferralTejun Heo
93618edf7538 ("cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated") deferred kill_css_finish() at the cgroup level: rmdir waits for the entire cgroup's populated count to drop to zero, then fires kill_css_finish() on every subsystem css at once. Replace that with per-subsys-css deferral. Each subsystem css now tracks its own hierarchical populated count and independently defers its kill_css_finish() until its own subtree drains. The rmdir-race fix carries through unchanged in shape. The dying css's ->css_offline() still waits until no PF_EXITING task references it, and v2's cgroup-level machinery goes away. cgroup_apply_control_disable() has the same race shape (PF_EXITING tasks pinning a css whose ->css_offline() is about to run) and stays synchronous here. This patch lays the groundwork for fixing it - per-cgroup waiting can't gate one subsys css being killed while the rest of the cgroup stays live, but per-css can. Subtree-wide invariant preserved: a dying ancestor css stays populated through nr_populated_children until every dying descendant's task drains, so the walker fires the ancestor's kill_finish_work only after all descendants have drained. Add paired smp_mb()s in kill_css_sync() and css_update_populated() to fence the StoreLoad on (CSS_DYING, populated counter), guaranteeing that either the walker queues kill_finish_work or the caller fires synchronously. cgroup_destroy_locked() was implicitly fenced by an unrelated css_set_lock pair; cgroup_apply_control_disable() in the next patch is not. Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-15cgroup: Move populated counters to cgroup_subsys_stateTejun Heo
Later patches replace the cgroup-level finish_destroy_work deferral added by 93618edf7538 ("cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated") with a per-subsys-css deferral. That needs each subsystem css to track its own populated count. Move the populated counters from cgroup onto cgroup_subsys_state. cgroup->self is itself a cgroup_subsys_state and self.parent walks the same chain as cgroup_parent(), so cgroup_update_populated() generalizes to a single css_update_populated() taking a css. The cgroup-side bookkeeping runs only when the walk started from a self css. Keep nr_populated_{domain,threaded}_children on cgroup. Both sum to self.nr_populated_children, but staying as dedicated fields to allow readers like cgroup_can_be_thread_root() unlocked access. css_set_update_populated() also walks the per-subsys-css chain so each subsystem css's hierarchical populated count is maintained. No reader consumes those counts yet. Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-15cgroup: Annotate unlocked nr_populated_* accesses with READ_ONCE/WRITE_ONCETejun Heo
cgroup_update_populated() updates nr_populated_csets, nr_populated_domain_children, and nr_populated_threaded_children under css_set_lock, but cgroup_has_tasks(), cgroup_is_populated(), and cgroup_can_be_thread_root() read them without holding it. Use READ_ONCE/WRITE_ONCE. Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-15cgroup: Inline cgroup_has_tasks() in cgroup.hTejun Heo
cpuset reads cs->css.cgroup->nr_populated_csets directly in two places to test whether a cgroup has tasks. cgroup.c already has a matching helper, cgroup_has_tasks(). Move it to cgroup.h as static inline and use that instead. This is to prepare for relocation of cgroup->nr_populated_csets. No semantic change. Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-15Merge patch series "io_uring related epoll cleanups"Christian Brauner
Jens Axboe <axboe@kernel.dk> says: One of the nastier things about epoll is how it allows nesting contexts inside each other, leading to the necessity of loop detection and the issues that have come with that. I don't believe there's any reason to support nesting on the io_uring side, in fact IORING_OP_EPOLL_CTL is a historical mistake, imho. But let's at least try and contain the damage and disallow nested contexts from our side. Christian Brauner <brauner@kernel.org> says: Bring in the eventpoll specific io_uring changes together with the eventpoll cleanup I did this cycle. The io_uring changes can go on top of both through the block tree. * patches from https://patch.msgid.link/20260514140817.623026-1-axboe@kernel.dk: eventpoll: rename struct epoll_filefd to epoll_key eventpoll: add file based control interface eventpoll: export is_file_epoll() eventpoll: pass struct epoll_filefd through ep_find() and ep_insert() Link: https://patch.msgid.link/20260514140817.623026-1-axboe@kernel.dk Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-15eventpoll: rename struct epoll_filefd to epoll_keyJens Axboe
This more accurately describes what purpose this structure serves, as a lookup key. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://patch.msgid.link/20260514140817.623026-5-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15eventpoll: add file based control interfaceJens Axboe
Add do_epoll_ctl_file(), which takes a pre-resolved epoll file and a struct epoll_filefd for the target rather than two integer file descriptors. do_epoll_ctl() remains as a thin wrapper. In preparation for using the file based interface from io_uring. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://patch.msgid.link/20260514140817.623026-4-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15eventpoll: export is_file_epoll()Jens Axboe
Make is_file_epoll() available outside of epoll. This is in preparation from using it from io_uring. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://patch.msgid.link/20260514140817.623026-3-axboe@kernel.dk Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-15block: unexport blk_status_to_strChristoph Hellwig
Only used in core block code, so unexport and move the prototype to blk.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260515045547.3790129-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-15block: remove bio_copy_data_iterChristoph Hellwig
Only used by bio_copy_data, so implement that directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260515045547.3790129-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-15block: remove zero_fill_bio_iterChristoph Hellwig
Only used to implement zero_fill_bio, so directly implement that. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260515045547.3790129-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-15Merge branch 'for-linus' into for-nextTakashi Iwai
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2026-05-14net: block MSG_NO_SHARED_FRAGS in sendmsg()Jann Horn
This change should cause no difference in behavior; it just cleans up some hazardous code that could have become a problem in the future. MSG_NO_SHARED_FRAGS is a kernel-internal flag that cancels the effect of MSG_SPLICE_PAGES, another kernel-internal flag that influences the data-sharing semantics of SKBs. Prevent passing this flag in from userspace via sendmsg() by adding it to MSG_INTERNAL_SENDMSG_FLAGS. This is not currently an observable problem because MSG_NO_SHARED_FRAGS only has an effect if kernel code adds MSG_SPLICE_PAGES to it. The only codepath that adds MSG_SPLICE_PAGES to user-supplied flags from which MSG_NO_SHARED_FRAGS hasn't been cleared is the path tcp_bpf_sendmsg -> tcp_bpf_send_verdict -> tcp_bpf_push, and that is not a problem because tcp_bpf_sendmsg always intentionally sets MSG_NO_SHARED_FRAGS anyway. Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260512-msg_no_shared_frags-v1-1-55ea46760331@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-14Merge tag 'hid-for-linus-2026051401' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - fixes for a few OOB/UAF in several HID drivers (Florian Pradines, Lee Jones, Michael Zaidman, Rosalie Wanders, Sangyun Kim and Tomasz Pakuła) - more general sanitation of input data, dealing with potentially malicious hardware in hid-core (Benjamin Tissoires) - a few device-specific quirks and fixups * tag 'hid-for-linus-2026051401' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (22 commits) HID: logitech-hidpp: Add support for newer Bluetooth keyboards HID: pidff: Fix integer overflow in pidff_rescale HID: i2c-hid: add reset quirk for BLTP7853 touchpad HID: core: introduce hid_safe_input_report() HID: pass the buffer size to hid_report_raw_event HID: google: hammer: stop hardware on devres action failure HID: appletb-kbd: run inactivity autodim from workqueues HID: appletb-kbd: fix UAF in inactivity-timer cleanup path HID: playstation: Clamp num_touch_reports HID: magicmouse: Prevent out-of-bounds (OOB) read during DOUBLE_REPORT_ID HID: mcp2221: fix OOB write in mcp2221_raw_event() HID: quirks: really enable the intended work around for appledisplay HID: hid-sjoy: race between init and usage HID: uclogic: Fix regression of input name assignment HID: intel-thc-hid: Intel-quickspi: Fix some error codes HID: hid-lenovo-go-s: restore OS_TYPE after resume from s2idle HID: elan: Add support for ELAN SB974D touchpad HID: sony: add missing size validation for Rock Band 3 Pro instruments HID: sony: add missing size validation for SMK-Link remotes HID: sony: remove unneeded WARN_ON() in sony_leds_init() ...
2026-05-14cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attributionTao Cui
Add per-cgroup local event counters to track RDMA resource limit exhaustion from the perspective of individual cgroups. The rdma.events.local file reports two per-resource counters: - max: number of times this cgroup's limit was the one that blocked an allocation in the subtree - alloc_fail: number of allocation attempts originating from this cgroup that failed due to an ancestor's limit This mirrors the design of pids.events.local, where events are attributed to the cgroup that imposed the limit, not necessarily the cgroup where the allocation was attempted. Also extend rdma.events with a hierarchical alloc_fail counter that tracks allocation failures propagating upward from the requesting cgroup, complementing the existing max counter, so that rdma.events and rdma.events.local share the same output format. Signed-off-by: Tao Cui <cuitao@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>