linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
3 days	Merge tag 'pm-7.2-rc1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix the schedutil cpufreq governor and drop a bogus warning from the cpuidle core: - Remove a misguided warning along with an inaccurate comment next to it from the cpuidle core (Rafael Wysocki) - Clear need_freq_update as appropriate in the .adjust_perf() path of the schedutil cpufreq governor to avoid calling cpufreq_driver_adjust_perf() unnecessarily on every scheduler utilization update (Zhongqiu Han)" * tag 'pm-7.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: Allow exit latency to exceed target residency cpufreq: schedutil: Fix uncleared need_freq_update on the .adjust_perf() path
4 days	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds
	Pull bpf fixes from Alexei Starovoitov: - Fix effective prog array index with BPF_F_PREORDER (Amery Hung) - Zero-initialize the fib lookup flow struct (Avinash Duduskar) - Disable xfrm_decode_session hook attachment (Bradley Morgan) - Allow type tag BTF records to succeed other modifier records (Emil Tsalapatis) - Fix build_id caching in stack_map_get_build_id_offset() (Ihor Solodrai) - Add missing access_ok call to copy_user_syms (Jiri Olsa) - Fix stack slot index in nospec checks (Nuoqi Gui) - Preserve pointer spill metadata during half-slot cleanup (Nuoqi Gui) - Fix partial copy of non-linear test_run output (Sun Jian) - Fix BPF_PROG_ASSOC_STRUCT_OPS last field check (Thiébaud Weksteen) - Reset register bounds before narrowing retval range (Tristan Madani) - Fix vmlinux BTF leak in bpftool cgroup commands (Yichong Chen) - Guard error writes in conntrack kfuncs (Yiyang Chen) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Disable xfrm_decode_session hook attachment selftests/bpf: Add test for stale bounds on LSM retval context load bpf: Reset register bounds before narrowing retval range in check_mem_access() selftests/bpf: Cover small conntrack opts error writes bpf: Guard conntrack opts error writes selftests/bpf: Cover half-slot cleanup of pointer spills bpf: Preserve pointer spill metadata during half-slot cleanup selftests/bpf: Test cgroup link replace with BPF_F_PREORDER bpf: Fix effective prog array index with BPF_F_PREORDER bpf: Fix BPF_PROG_ASSOC_STRUCT_OPS last field check bpf: zero-initialize the fib lookup flow struct bpftool: Fix vmlinux BTF leak in cgroup commands bpf: Add missing access_ok call to copy_user_syms bpf: Allow type tag BTF records to succeed other modifier records bpf: Emit verbose message when prog-specific btf_struct_access rejects a write bpf: Fix build_id caching in stack_map_get_build_id_offset() bpf: Fix partial copy of non-linear test_run output selftests/bpf: Cover stack nospec slot indexing bpf: Fix stack slot index in nospec checks
4 days	Merge tag 'block-7.2-20260625' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - blk-cgroup locking rework and fixes: - fix a use-after-free in __blkcg_rstat_flush() - defer freeing policy data until after an RCU grace period - defer the blkcg css_put until the blkg is unlinked from the queue - unwind the queue_lock nesting under RCU / blkcg->lock across the lookup, create, associate and destroy paths - NVMe fixes via Keith: - Fix a crash and memory leak during invalid cdev teardown, and related cdev cleanups (Maurizio, John) - nvmet fixes: handle TCP_CLOSING in the tcp state_change handler, reject short AUTH_RECEIVE buffers, handle inline data with a nonzero offset in rdma, fix an sq refcount leak, and allocate ana_state with the port (Maurizio, Michael, Bryam, Wentao, Rosen) - nvme-fc fix to not cancel requests on an IO target before it is initialized (Mohamed) - nvme-apple fix to prevent shared tags across queues on Apple A11 (Nick) - Various smaller fixes and cleanups (John) - MD fixes via Yu Kuai: - raid1/raid10 fixes for writes_pending and barrier reference leaks on write and discard failures, plus REQ_NOWAIT handling fixes (Abd-Alrhman) - raid5 discard accounting and validation, and a batch of fixes for stripe batch races (Yu Kuai, Chen) - Protect raid1 head_position during read balancing (Chen) - block bio-integrity fixes: correct an error injection static key decrement, fix GFP flag confusion in bio_integrity_alloc_buf(), and handle REQ_OP_ZONE_APPEND in __bio_integrity_action() (Christoph) - Fixes for bio_iov_iter_bounce_write(): revert the iov_iter after a short copy, and respect the iov_iter nofault flag (Qu) - Invalidate the cached plug timestamp after a task switch, and clear PF_BLOCK_TS in copy_process() (Usama) - Fix the IORING_URING_CMD_REISSUE flags check in blkdev_uring_cmd() (Yitang) - Remove a redundant plug in __submit_bio() (Wen) - Don't warn when reclassifying a busy socket lock in nbd (Deepanshu) * tag 'block-7.2-20260625' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (45 commits) block: handle REQ_OP_ZONE_APPEND in __bio_integrity_action block: fix GFP_ flags confusion in bio_integrity_alloc_buf block, bfq: don't grab queue_lock to initialize bfq mm/page_io: don't nest queue_lock under rcu in bio_associate_blkg_from_page() blk-cgroup: don't nest queue_lock under blkcg->lock in blkcg_destroy_blkgs() blk-cgroup: don't nest queue_lock under rcu in bio_associate_blkg() blk-cgroup: don't nest queue_lock under rcu in blkg_lookup_create() blk-cgroup: don't nest queue_lock under rcu in blkcg_print_blkgs() blk-cgroup: delay freeing policy data after rcu grace period blk-cgroup: protect iterating blkgs with blkcg->lock in blkcg_print_stat() md/raid5: avoid R5_Overlap races while breaking stripe batches md/raid5: use stripe state snapshot in break_stripe_batch_list() blk-cgroup: defer blkcg css_put until blkg is unlinked from queue blk-cgroup: fix UAF in __blkcg_rstat_flush() block, bfq: protect async queue reset with blkcg locks nbd: don't warn when reclassifying a busy socket lock block: fix incorrect error injection static key decrement md/raid5: let stripe batch bm_seq comparison wrap-safe md/raid1: protect head_position for read balance md/raid1: free r1_bio when REQ_NOWAIT is set and read would block on retry ...
5 days	Merge tag 'pci-v7.2-changes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Remove MPS/MRRS Kconfig settings (CONFIG_PCIE_BUS_) that worked around a WiFi device defect; use a quirk or boot-time "pci=pcie_bus_tune_" kernel parameter instead (Bjorn Helgaas) - Always lift 2.5GT/s restriction in PCIe failed link retraining to avoid clamping a link to 2.5GT/s after hot-plug changes the device (Maciej W. Rozycki) - Request bus reassignment when not probe-only to fix an enumeration regression on Marvell CN106XX and possibly other DT-based systems (Ratheesh Kannoth) - Fix procfs race between pci_proc_init() and pci_bus_add_device() that resulted in 'proc_dir_entry ... already registered' warnings and pointer corruption (Krzysztof Wilczyński) - Fix sysfs race that causes 'duplicate filename' warnings and boot panics by converting PCI resource files to static attributes (Krzysztof Wilczyński) - Expose sysfs 'resourceN_resize' attributes only on platforms with PCI mmap (Krzysztof Wilczyński) - Require CAP_SYS_ADMIN to write to sysfs 'resourceN_resize' attributes (Krzysztof Wilczyński) - Add security_locked_down(LOCKDOWN_PCI_ACCESS) to alpha PCI resource mmap path to match the generic path (Krzysztof Wilczyński) - Use kstrtobool() to parse the 'rom' attribute input to avoid the unexpected behavior of enabling the ROM when writing '0' with no trailing newline (Krzysztof Wilczyński) Resource management: - Improve resource claim logging for debuggability (Ilpo Järvinen) - Clean up several uses of const parameters (Ilpo Järvinen) - Check option ROM header signatures and lengths before accessing to avoid page faults and alignment faults (Guixin Liu) ASPM: - Don't reconfigure ASPM when entering low-power D-state; only do it when returning back to D0 (Carlos Bilbao) Power management: - During suspend, set power state to 'unknown' for all devices, not just those with drivers (Lukas Wunner) - Skip restoring Resizable BARs and VF Resizable BARs if device doesn't respond to config reads, to avoid invalid array accesses (Marco Nenciarini) - Add pci_suspend_retains_context() so drivers can tell whether devices retain internal state across suspend/resume, since some platforms reset devices on suspend; use this in nvme to avoid issues on Qcom RCs (Manivannan Sadhasivam) Power control: - Only to power on/off devices that actually support power control to avoid poking at incompatible devices mentioned in DT (Manivannan Sadhasivam) Virtualization and resets: - Log device readiness timeouts as errors, not warnings, because the device is likely unusable in this case (Bjorn Helgaas) - Wait for device readiness after soft reset (D3hot -> D0uninitialized transition), when the device may respond with Request Retry Status (RRS) if it needs more time to initialize (Bjorn Helgaas) - Drop unnecessary retries when restoring BARs because resets should now already include all required delays (Lukas Wunner) - Avoid FLR for MediaTek MT7925 WiFi, where FLR fails after a VM terminates uncleanly (Jose Ignacio Tornos Martinez) - Avoid SBR for Qualcomm WCN6855/WCN7850 WiFi, SDX62/SDX65 modems, which seem not to support it correctly (Jose Ignacio Tornos Martinez) Peer-to-peer DMA: - Prevent P2PDMA as well as CPU access to non-mappable BARs, e.g., s390 ISM BARs (Matt Evans) - Add Intel QAT, DSA, IAA devices to whitelist (Lukas Wunner) Endpoint framework: - Add endpoint controller APIs for use by function drivers to discover auxiliary blocks like DMA engines (Koichiro Den) - Remember DesignWare eDMA engine base/size and expose them via the EPC aux-resource API (Koichiro Den) - Add endpoint embedded doorbell fallback, used if MSI allocation fails (Koichiro Den) - Validate BAR index and remove dead BAR read in endpoint doorbell test (Carlos Bilbao) - Unwind MSI/MSI-X vectors if NTB initialization fails part-way through (Koichiro Den) - Cache sleepable pci_irq_vector() value at ISR setup to avoid calling it from hardirq context (Koichiro Den) - Call sleepable pci_epc_raise_irq() from a work item instead of atomic context, e.g., when setting bits in NTB peer doorbells in the ntb_peer_db_set() path (Koichiro Den) - Report 0-based vNTB doorbell vector to account for link event 0 and historically skipped slot 1 (Koichiro Den) - Prevent configfs writes to vNTB db_count and other values that are already in use after EPC attach (Koichiro Den) - Account for vNTB db_valid reserved slots (link event 0 and historically skipped slot 1) so they don't appear as valid doorbells (Koichiro Den) - Implement vNTB .db_vector_count()/mask() for doorbells so clients can use multiple vectors and avoid thundering herds (Koichiro Den) - Report 0-based NTB doorbell vector to account for link event 0 and historically skipped slot 1 (Koichiro Den) - Fix doorbell bitmask and IRQ vector handling to clear only specified bits, use the correct vector for non-contiguous Linux IRQ numbers, and validate incoming vectors (Koichiro Den) - Implement NTB .db_vector_count()/mask() for doorbells so clients can use multiple vectors (Koichiro Den) Native PCIe controller infrastructure: - Add pci_host_common_link_train_delay() for the mandatory delay after > 5GT/s Link training completes and use it for cadence HPA, j721e, LGA; dwc; aardvark, mediatek-gen3, rzg3s (Hans Zhang) - Protect root bus removal with rescan lock in altera, brcmstb, cadence, dwc, iproc, mediatek, plda, rockchip to prevent use-after-free or crashes when racing with sysfs rescan or hotplug (Hans Zhang) - Add pci_host_common_parse_ports() for use by any native driver to parse Root Port properties (per-Link features like width, speed, PHY, power and reset control, etc should be described in Root Port stanzas, not the host bridge; currently only reset GPIOs implemented) (Sherry Sun) New native PCIe controller drivers: - Add DT binding and driver for UltraRISC DP1000 PCIe controller (Xincheng Zhang, Jia Wang) Altera PCIe controller driver: - Do not dispose of the parent IRQ mapping, which belongs to the parent interrupt controller (Mahesh Vaidya) - Fix chained IRQ handler ordering issue and resource leaks on probe failure (Mahesh Vaidya) AMD MDB PCIe controller driver: - Assert PERST# on shutdown so any connected Endpoints are held in reset during shutdown (Sai Krishna Musham) Amlogic Meson PCIe controller driver: - Propagate devm_add_action_or_reset() failure to fix probe error path (Shuvam Pandey) - Add .remove() callback to deinitialize the host bridge and power off the PHY (Shuvam Pandey) Broadcom iProc PCIe controller driver: - Restore .map_irq() assignment; its removal broke INTx on the iproc platform bus driver (Mark Tomlinson) Broadcom STB PCIe controller driver: - No change, but products using certain WiFi devices may be affected by removal of CONFIG_PCIE_BUS_* (see above) Freescale i.MX6 PCIe controller driver: - Move IMX6SX_GPR12_PCIE_TEST_POWERDOWN handling into the core reset functions (Richard Zhu) - Assert PERST# before enabling regulators to ensure that even if power is enabled, endpoint stays inactive until REFCLK is stable (Sherry Sun) - Parse reset properties in Root Port nodes (falling back to host bridge) to help support Key E connectors and the pwrctrl framework (Sherry Sun) - Configure i.MX95 REF_USE_PAD before PHY reset (Richard Zhu) - Assert i.MX95 ref_clk_en after reference clock stabilizes (Richard Zhu) - Integrate new pwrctrl API for DTs with Root Port-level power supplies (Sherry Sun) Intel Gateway PCIe controller driver: - Enable clock before PHY init for correct ordering (Florian Eckert) - Add .start_link() callback so the driver works again (Florian Eckert) - Stop overwriting the ATU base address discovered by dw_pcie_get_resources() (Florian Eckert) - Add DT 'atu' region since this is hardware-specific, and fall back to driver default if lacking (Florian Eckert) Loongson PCIe controller driver: - Ignore downstream devices only on internal bridges to avoid Loongson hardware issue (Rong Zhang) - Quirk old Loongson-3C6000 bridges that advertise incorrect supported link speeds (Ziyao Li) Marvell MVEBU PCIe controller driver: - Use fixed-width interrupt masks to avoid truncation in 64-bit builds (Rosen Penev) MediaTek PCIe controller driver: - Use FIELD_PREP() to fix incorrect operator precedence in PCIE_FTS_NUM_L0 (Li RongQing) - Fix IRQ domain leak when port fails to enable (Manivannan Sadhasivam) - Use actual physical address for MSI message address instead of virt_to_phys() (Manivannan Sadhasivam) - Add EcoNet EN7528 to DT binding (Caleb James DeLisle) MediaTek PCIe Gen3 controller driver: - Deassert PCIE_PHY_RSTB so REFCLK is stable for at least 100ms (PCIE_T_PVPERL_MS) before deasserting PERST# (Jian Yang) - Add .shutdown() to assert PERST# before powering down device (Jian Yang) - Do full device power down on removal, including asserting PERST#, when removing driver (Chen-Yu Tsai) - Fix a 'failed to create pwrctrl devices' error message that was inadvertently skipped (Chen-Yu Tsai) NVIDIA Tegra194 PCIe controller driver: - Program the DesignWare PORT_AFR L1 entrance latency based on the 'aspm-l1-entry-delay-ns' DT property (Manikanta Maddireddy) Qualcomm PCIe controller driver: - Add Eliza SoC compatible in DT binding (Krishna Chaitanya Chundru) - Set max OPP during resume so DBI register accesses don't fail with NoC errors (Qiang Yu) - Add pci_host_common_d3cold_possible() to determine whether downstream devices are already in D3hot and wakeup-enabled devices are capable of generating PME from D3cold (Krishna Chaitanya Chundru) - Add .get_ltssm() callback to get the LTSSM status without DBI, since DBI may be inaccessible after PME_Turn_Off (Krishna Chaitanya Chundru) - Power down PHY via PARF_PHY_CTRL before disabling rails/clocks to avoid power leakage (Krishna Chaitanya Chundru) - Decide whether suspend should put the link in L2 and power down using pci_host_common_d3cold_possible() instead of checking whether ASPM L1 is enabled (Krishna Chaitanya Chundru) - Add qcom D3cold support to tear down interconnect bandwidth and OPP votes (Krishna Chaitanya Chundru) - Handle unsupported mixed PERST#/PHY DT configurations, e.g., PHY in RP node while PERST# is in the RC node, but warn about the DT issue (Qiang Yu) - Program T_POWER_ON based on DT 't-power-on-us' property in case hardware advertises incorrect values (Krishna Chaitanya Chundru) - Disable ASPM L0s for SA8775P (Shawn Guo) - Initialize DWC MSI lock for firmware-managed ECAM hosts, which don't use the dw_pcie_host_init() path that initializes the lock (Yadu M G) Renesas RZ/G3S PCIe controller driver: - Add RZ/V2N DT support (Lad Prabhakar) SOPHGO PCIe controller driver: - Add 'dma-coherent' DT property for sg2042-pcie driver (Han Gao) Synopsys DesignWare PCIe controller driver: - Apply ECRC TLP Digest workaround for all DesignWare cores prior to 5.10a, not just 4.90a and 5.00a (Manikanta Maddireddy) - Use common struct dw_pcie 'mode' rather than duplicating it in artpec6, dra7xx, dwc-pcie, and keembay driver structs (Hans Zhang) - Use DEFINE_SHOW_ATTRIBUTE for ltssm_status debugfs to reduce boilerplate and fix a seq_file memory leak by including a .release() callback (Hans Zhang) - Fix a signedness bug in fault injection test code (Dan Carpenter) - Avoid NULL pointer dereference when tearing down debugfs for controller that lacks RAS DES capability (Shuvam Pandey) MicroSemi Switchtec management driver: - Add Gen6 Device IDs (Ben Reed) Miscellaneous: - Remove unused gpio.h include from amd-mdb, designware-plat, fu740, visconti drivers (Andy Shevchenko) - Fix typos in documentation (josh ziegler) - Use FIELD_MODIFY() instead of open-coding it (Hans Zhang)" * tag 'pci-v7.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (168 commits) PCI/sysfs: Use kstrtobool() to parse the ROM attribute input PCI/sysfs: Limit BAR resize attribute scope to platforms with PCI mmap PCI/sysfs: Remove pci_create_legacy_files() and pci_sysfs_init() PCI/sysfs: Convert legacy I/O and memory attributes to static definitions PCI/sysfs: Add __weak pci_legacy_has_sparse() helper alpha/PCI: Compute legacy size in pci_mmap_legacy_page_range() PCI: Add macros for legacy I/O and memory address space sizes PCI/sysfs: Remove pci_{create,remove}_sysfs_dev_files() alpha/PCI: Convert resource files to static attributes alpha/PCI: Add static PCI resource attribute macros alpha/PCI: Remove WARN from __pci_mmap_fits() and __legacy_mmap_fits() alpha/PCI: Fix __pci_mmap_fits() overflow for zero-length BARs alpha/PCI: Use PCI resource accessor macros alpha/PCI: Use BAR index in sysfs attr->private instead of resource pointer alpha/PCI: Add security_locked_down() check to pci_mmap_resource() PCI/sysfs: Limit pci_sysfs_init() late_initcall compile scope PCI/sysfs: Add stubs for pci_{create,remove}_sysfs_dev_files() PCI/sysfs: Warn about BAR resize failure in __resource_resize_store() PCI/sysfs: Convert PCI resource files to static attributes PCI/proc: Fix race between pci_proc_init() and pci_bus_add_device() ...
6 days	Merge tag 'timers-urgent-2026-06-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc timer fixes from Ingo Molnar: - Fix timekeeping locking order bug in the timekeeping init code (Mikhail Gavrilov) - Fix u64 multiplication bug in the posix-cpu-timers code on 32-bit kernels (Zhan Xusheng) - Fix macro name in comment block (Ethan Nelson-Moore) - Fix off-by-one bug in the compat settimeofday() usecs validation code (Wang Yan) * tag 'timers-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Fix off-by-one in compat settimeofday() usec validation hrtimer: Correct CONFIG_NO_HZ_COMMON macro name in comment posix-cpu-timers: Use u64 multiplication in update_rlimit_cpu() timekeeping: Register default clocksource before taking tk_core.lock
6 days	Merge tag 'smp-urgent-2026-06-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc CPU hotplug fixes from Ingo Molnar: - Fix CPU hotplug error handling rollback bug (Bradley Morgan) - Fix possible output OOB write bug in the sysfs hotplug states printing code (Bradley Morgan) * tag 'smp-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: hotplug: Bound hotplug states sysfs output cpu: hotplug: Preserve per instance callback errors
6 days	Merge tag 'perf-urgent-2026-06-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: - Fix event::addr_filter_ranges lifetime bug (Peter Zijlstra) * tag 'perf-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix addr_filter_ranges lifetime
6 days	Merge tag 'locking-urgent-2026-06-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: - Fix the incorrect RCU protection in rt_spin_unlock() (Thomas Gleixner) * tag 'locking-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
6 days	Merge tag 'core-urgent-2026-06-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc core fixes from Ingo Molnar: - Fix an MM-CID race that can cause an OOB write (Rik van Riel) - Fix a debugobjects OOM handling race (Thomas Gleixner) * tag 'core-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Plug race against a concurrent OOM disable sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
6 days	Merge tag 'irq-urgent-2026-06-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc irqchip driver fixes from Ingo Molnar: - Fix indexing bug in the Crossbar irqchip driver (Bhargav Joshi) - Fix a parent domain resource leak in the Crossbar irqchip driver (Bhargav Joshi) - Fix resource leak in the ImgTec PDC irqchip driver's exit logic (Qingshuang Fu) - Fix macro name in comment block (Ethan Nelson-Moore) * tag 'irq-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi: Correct CONFIG_PCI_MSI_ARCH_FALLBACKS macro name in comment irqchip/imgpdc: Fix resource leak, add missing chained handler cleanup on remove irqchip/crossbar: Fix parent domain resource leak irqchip/crossbar: Use correct index in crossbar_domain_free()
6 days	Merge tag 'sched_ext-for-7.2-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext tree reorg from Tejun Heo: "Pure source reorganization with no functional change: - the kernel/sched/ext* files move into a new kernel/sched/ext/ subdirectory - the headers and sources are made self-contained so editor tooling can parse each file on its own" * tag 'sched_ext-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Move shared helpers from ext.c into internal.h and cid.h sched_ext: Make kernel/sched/ext/ sources self-contained for clangd sched_ext: Move sources under kernel/sched/ext/
6 days	resource: Make resource_alignment() input const resource	Ilpo Järvinen
	resource_alignment() does not need to change resource so it can be made const. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260429122617.7324-6-ilpo.jarvinen@linux.intel.com
6 days	bpf: Disable xfrm_decode_session hook attachment	Bradley Morgan
	BPF LSM programs can currently attach to xfrm_decode_session(). That hook may return an error, but security_skb_classify_flow() calls it from a void path and triggers BUG_ON() if an error is returned. Disable BPF attachment to the hook to prevent a BPF LSM program from turning packet classification into a full panic. Fixes: 9e4e01dfd325 ("bpf: lsm: Implement attach, detach and execution") Signed-off-by: Bradley Morgan <include@grrlz.net> Link: https://lore.kernel.org/r/20260619130305.27779-1-include@grrlz.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	bpf: Reset register bounds before narrowing retval range in check_mem_access()	Tristan Madani
	When the BPF verifier processes a context load of an LSM hook return value, it calls __mark_reg_s32_range() to narrow the register to the hook's valid range. However, __mark_reg_s32_range() intersects the new range with the register's existing bounds using max_t()/min_t() rather than replacing them. If the destination register carries stale bounds from a prior instruction (e.g. BPF_MOV64_IMM), the intersection can produce a range narrower than reality. The verifier then believes it knows the register's exact value, while at runtime the actual hook return value is loaded, creating a verifier/runtime mismatch that can be used to bypass BPF memory safety checks. The else branch already calls mark_reg_unknown() to reset register state before any narrowing. Apply the same reset in the is_retval path so stale bounds are cleared before __mark_reg_s32_range() intersects. Fixes: 5d99e198be27 ("bpf, lsm: Add check for BPF LSM return value") Cc: stable@vger.kernel.org Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260622230123.3695446-2-tristmd@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	sched_ext: Move shared helpers from ext.c into internal.h and cid.h	Tejun Heo
	idle.c and cid.c are included into build_policy.c together with ext.c and use helpers that ext.c defines. Because the helpers live in ext.c, the two files can not parse as standalone units and clangd reports errors in them. Move the helpers to the headers they belong to. The op-dispatch macros and helpers plus scx_parent() to internal.h, and scx_cpu_arg()/scx_cpu_ret() to cid.h. No functional change. idle.c and cid.c now parse clean standalone. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
7 days	sched_ext: Make kernel/sched/ext/ sources self-contained for clangd	Tejun Heo
	The sources under kernel/sched/ext/ build as a single translation unit: build_policy.c includes the source files and headers. An LSP/clangd editor parses each as a standalone unit, sees no types, and reports a flood of errors. Give each header its dependencies and include guard, and have each source include the headers it uses. ext.c, arena.c and the ext headers now parse clean standalone. idle.c and cid.c still reference a few macros and helpers defined in ext.c. The next patch moves those to shared headers. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
7 days	bpf: Preserve pointer spill metadata during half-slot cleanup	Nuoqi Gui
	__clean_func_state() cleans dead stack slots in 4-byte halves. When the high half of a STACK_SPILL slot is dead and the low half remains live, cleanup converts the live low half to STACK_MISC or STACK_ZERO and clears the saved spilled_ptr metadata. That conversion is safe only for scalar spills. For a pointer spill, this metadata clear lets a later 32-bit fill from the still-live half avoid the normal non-scalar register-fill check and be treated as an ordinary scalar stack read. Leave non-scalar spill slots intact in this half-live shape. This is conservative for pruning and preserves the existing check_stack_read_fixed_off() rejection path for partial fills from pointer spills. Fixes: be23266b4a08 ("bpf: 4-byte precise clean_verifier_state") Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Nuoqi Gui <gnq25@mails.tsinghua.edu.cn> Link: https://lore.kernel.org/r/20260617-f01-06-half-slot-pointer-spill-v2-1-42b9cdc3cf64@mails.tsinghua.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	genirq/msi: Correct CONFIG_PCI_MSI_ARCH_FALLBACKS macro name in comment	Ethan Nelson-Moore
	A comment in kernel/irq/msi.c incorrectly refers to CONFIG_PCI_MSI_ARCH_FALLBACK instead of CONFIG_PCI_MSI_ARCH_FALLBACKS. Correct it. Discovered while searching for CONFIG_* symbols referenced in code but not defined in any Kconfig file. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260613213544.90613-1-enelsonmoore@gmail.com
7 days	sched_ext: Move sources under kernel/sched/ext/	Tejun Heo
	The sched_ext sources had grown to ten ext* files directly under kernel/sched/. Move them into a new kernel/sched/ext/ subdirectory and drop the now-redundant ext_ prefix. ext.c/h keep their names. kernel/sched/ext.{c,h} -> kernel/sched/ext/ext.{c,h} kernel/sched/ext_internal.h -> kernel/sched/ext/internal.h kernel/sched/ext_types.h -> kernel/sched/ext/types.h kernel/sched/ext_idle.{c,h} -> kernel/sched/ext/idle.{c,h} kernel/sched/ext_cid.{c,h} -> kernel/sched/ext/cid.{c,h} kernel/sched/ext_arena.{c,h} -> kernel/sched/ext/arena.{c,h} The include paths in build_policy.c and sched.h, the MAINTAINERS glob, and a few documentation and comment references are updated to match. No code or symbol changes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
7 days	Merge tag 's390-7.2-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Alexander Gordeev: - consolidate s390 idle time accounting by moving all CPU time tracking to the architecture backend and eliminate the mix of architecture- specific and common code accounting - Add missing EXPORT_SYMBOL_GPL() to kcpustat_field_idle() and kcpustat_field_iowait() functions - Finalize ptep_get() conversion by replacing direct page table entry dereferencing with proper accessors (ptep_get(), pmdp_get(), etc.) - Explicitly check the buffer length in PKEY_VERIFYPROTK ioctl and pkey_pckmo implementations and fail if the length is exceeded * tag 's390-7.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pkey: Check length in pkey_pckmo handler implementation s390/pkey: Check length in PKEY_VERIFYPROTK ioctl s390/idle: Add missing EXPORT_SYMBOL_GPL() s390/mm: Complete ptep_get() conversion s390/idle: Remove idle time and count sysfs files s390/idle: Provide arch specific kcpustat_field_idle()/kcpustat_field_iowait() s390/irq/idle: Use stcke instead of stckf for time stamps s390/timex: Move union tod_clock type to separate header
7 days	time: Fix off-by-one in compat settimeofday() usec validation	Wang Yan
	The compat version of settimeofday() uses '>' instead of '>=' when validating tv_usec against USEC_PER_SEC, allowing the value 1000000 to pass the check. After the subsequent conversion to nanoseconds (tv_nsec *= NSEC_PER_USEC), this results in tv_nsec == NSEC_PER_SEC, which violates the timespec invariant that tv_nsec must be strictly less than NSEC_PER_SEC. The native settimeofday() was already fixed in commit ce4abda5e126 ("time: Fix off-by-one in settimeofday() usec validation"), but the compat counterpart was missed. Fix it by using '>=' to reject tv_usec values outside the valid range [0, USEC_PER_SEC - 1]. Fixes: 5e0fb1b57bea ("y2038: time: avoid timespec usage in settimeofday()") Signed-off-by: Wang Yan <wangyan01@kylinos.cn> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260622103348.120255-1-wangyan01@kylinos.cn
7 days	bpf: Fix effective prog array index with BPF_F_PREORDER	Amery Hung
	replace_effective_prog() and purge_effective_progs() located the slot in the effective array by walking the program hlist and counting entries linearly. That count does not match the array layout: compute_effective_ progs() places BPF_F_PREORDER programs at the front (ancestor cgroup first, attach order within a cgroup) and the rest after them (descendant cgroup first). So when a preorder program is present, the linear hlist position no longer equals the program's index in the effective array. For replace_effective_prog() (bpf_link_update()) this overwrote the wrong slot, corrupting the effective order. For purge_effective_progs(), it could dummy out a slot belonging to a different program and leave the detached program in the array while bpf_prog_put() drops its reference, i.e. a use-after-free. Fix both by replaying compute_effective_progs()'s placement (including the per-cgroup preorder reversal) in a shared effective_prog_pos() helper. Identify the entry by its struct bpf_prog_list pointer rather than by (prog, link) value, so the lookup resolves to exactly the attachment the syscall selected even when the same bpf_prog is attached to several cgroups in the hierarchy. Fixes: 4b82b181a26c ("bpf: Allow pre-ordering for bpf cgroup progs") Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20260619063520.2690547-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	bpf: Fix BPF_PROG_ASSOC_STRUCT_OPS last field check	Thiébaud Weksteen
	When struct prog_assoc_struct_ops was added, BPF_PROG_ASSOC_STRUCT_OPS_LAST_FIELD referenced prog_fd instead of the actual last field, flags. Fixes: b5709f6d26d6 ("bpf: Support associating BPF program with struct_ops") Signed-off-by: Thiébaud Weksteen <tweek@google.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20260618040934.4113938-1-tweek@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	bpf: Add missing access_ok call to copy_user_syms	Jiri Olsa
	As reported by sashiko we use __get_user without prior access_ok call on the user space pointer. Adding the missing call for the whole pointer array. Plus removing the err check in the error path, because it's not needed and also we can return -ENOMEM directly from the first kvmalloc_array fail path. Cc: stable@vger.kernel.org [1] https://lore.kernel.org/bpf/20260611115503.AC16D1F00893@smtp.kernel.org/ Fixes: 0236fec57a15 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link") Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://lore.kernel.org/bpf/20260611115503.AC16D1F00893@smtp.kernel.org/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260616083056.405652-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	bpf: Allow type tag BTF records to succeed other modifier records	Emil Tsalapatis
	llvm commit [1] allowed attaching type tag records to modifier BTF records. This is useful for using typedefs that encompass a base type and a type tag, e.g.: typedef struct rbtree __arena rbtree_t; Modify btf_check_type_tags() so that it allows this sequence of records. The function now only checks for record loops in BTF modifier record chains. Rename to btf_check_modifier_chain_length to reflect this. Also expand the BTF modifier traversal code to take into account that type record can be interleaved with other modifier records. In effect this means traversing all modifiers to collect the type tags. Also modify existing selftests to now accept modifier records (const, typedef) that point to type tag records. [1] https://github.com/llvm/llvm-project/pull/203089 Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260616061454.7869-1-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	bpf: Emit verbose message when prog-specific btf_struct_access rejects a write	Alexei Starovoitov
	When BPF_WRITE goes through a PTR_TO_BTF_ID register, check_ptr_to_btf_access() delegates to env->ops->btf_struct_access(). Most implementations (bpf_scx_btf_struct_access, tc_cls_act_btf_struct_access, etc.) return -EACCES for disallowed fields without logging anything, so the verifier rejects the program with an empty message. For example a scx program doing 1: R1=trusted_ptr_task_struct() ... 4: (7b) (u64 )(r1 +0) = r2 verification time 83 usec the program is rejected leaves the user guessing which field is off-limits. Emit verbose message. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20260615232146.5491-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 days	bpf: Fix build_id caching in stack_map_get_build_id_offset()	Ihor Solodrai
	This patch is a follow up to recent implementation of stack_map_get_build_id_offset_sleepable() [1]. stack_map_get_build_id_offset() and its sleepable variant each cached only the last successfully resolved VMA, with separate bookkeeping in each function. A run of IPs in a VMA with no usable build ID will repeat the lookup for every frame: find_vma() in the non-sleepable path, a VMA lock and a blocking build_id_parse_file() in the sleepable. Factor the per-call cache into a shared struct stack_map_build_id_cache with two independent slots [2][3], used by both functions: * resolved - last VMA that produced a build ID (file, build_id and range), reused to skip the lookup and the parse; * unresolved - last VMA with no usable build ID (range only), reused to emit a raw IP without another lookup or parse. Keeping the slots independent means a build-ID-less VMA no longer evicts the last resolved build ID, so a trace alternating between a binary and a region without one stops re-resolving the binary on every return. The shared lookup tests [vm_start, vm_end), matching the sleepable path; the non-sleepable path previously reused a build ID for ip == vm_end (range_in_vma() is inclusive) and now re-resolves it correctly. [1] https://lore.kernel.org/bpf/20260525223948.1920986-1-ihor.solodrai@linux.dev/ [2] https://lore.kernel.org/bpf/CAEf4Bza2fRDGhLQoPE-EzM7F34xaEJfi5Exmxb-iWVUN3F06=g@mail.gmail.com/ [3] https://lore.kernel.org/bpf/CAEf4BzZXJFr=1iiVx937ht=4PYQkQHg=eFk810zhMDzXQG3ihw@mail.gmail.com/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Link: https://lore.kernel.org/r/20260615195536.1065107-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 days	bpf: Fix stack slot index in nospec checks	Nuoqi Gui
	check_stack_write_fixed_off() computes the byte slot for a fixed-offset stack write as -off - 1, and records each written byte in slot_type[] with (slot - i) % BPF_REG_SIZE. The Spectre v4 sanitization pre-check uses slot_type[i] instead. For a 4-byte write at fp-8 after the lower half of fp-8 has been zeroed, the pre-check scans bytes 0..3 and sees STACK_ZERO while the actual write updates bytes 7..4. That can leave the second half-slot write without nospec_result even though the bytes being overwritten still require sanitization. Use the same slot index in the sanitization pre-check that the write path uses when updating slot_type[]. Fixes: 2039f26f3aca ("bpf: Fix leakage due to insufficient speculative store bypass mitigation") Acked-by: Luis Gerhorst <luis.gerhorst@fau.de> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Nuoqi Gui <gnq25@mails.tsinghua.edu.cn> Link: https://lore.kernel.org/r/20260618-f01-11-stack-nospec-slot-index-v3-1-780297041721@mails.tsinghua.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 days	Merge tag 'mm-nonmm-stable-2026-06-21-10-22' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "taskstats: fix TGID dead-thread stat retention" (Yiyang Chen) Fix a taskstats TGID aggregation bug where fields added in the TGID query path were not preserved after thread exit, and adds a kselftest covering the regression. - "lib/tests: string_helpers: Slight improvements" (Andy Shevchenko) Improve lib/tests/string_helpers_kunit.c a little - "lib/base64: decode fixes" (Josh Law) Address minor issues in lib/base64.c - "selftests/filelock: Make output more kselftestish" (Mark Brown) Make the output from the ofdlocks test a bit easier for tooling to work with. Also ignore the generated file - "uaccess: unify inline vs outline copy_{from,to}_user() selection" (Yury Norov) Simplify the usercopy code by removing the selectability of inlining copy_{from,to}_user(). - "ocfs2: validate inline xattr header consumers" (ZhengYuan Huang) Fix a number of possible issues in the ocfs2 xattr code - "lib and lib/cmdline enhancements" (Dmitry Antipov) Provide additional robustness checking in the cmdline handling code and its in-kernel testing and selftests - "cleanup the RAID6 P/Q library" (Christoph Hellwig) Clean up the RAID6 P/Q library to match the recent updates to the RAID 5 XOR library and other CRC/crypto libraries - "ocfs2: harden inode validators against forged metadata" (Michael Bommarito) Add three structural checks to OCFS2 dinode validation so malformed on-disk fields are rejected before ocfs2_populate_inode() copies them into the in-core inode - "lib/raid: replace __get_free_pages() call with kmalloc()" (Mike Rapoport) Clean up the lib/raid code by using kmalloc() in more places * tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (108 commits) ocfs2: fix circular locking dependency in ocfs2_dio_end_io_write ocfs2: fix NULL h_transaction deref in ocfs2_assure_trans_credits lib: interval_tree_test: validate benchmark parameters ocfs2: avoid moving extents to occupied clusters treewide: fix transposed "sign" typos and update spelling.txt ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec fat: reject BPB volumes whose data area starts beyond total sectors selftests/uevent: increase __UEVENT_BUFFER_SIZE to avoid ENOBUFS on busy systems lib/test_firmware: allocate the configured into_buf size fs: efs: remove unneeded debug prints checkpatch: cuppress warnings when Reported-by: is followed by Link: MAINTAINERS: add Alexander as a kcov reviewer mailmap: update Alexander Sverdlin's Email addresses fs: fat: inode: replace sprintf() with scnprintf() ocfs2: fix out-of-bounds write in ocfs2_remove_refcount_extent ocfs2: fix race between ocfs2_control_install_private() and ocfs2_control_release() ocfs2/dlm: require a ref for locking_state debugfs open ocfs2: reject FITRIM ranges shorter than a cluster ocfs2: validate fast symlink target during inode read ocfs2: add journal NULL check in ocfs2_checkpoint_inode() ...
8 days	cpu: hotplug: Bound hotplug states sysfs output	Bradley Morgan
	states_show() adds CPU hotplug state names into a single sysfs buffer using sprintf(). With enough registered states, this can write past the end of the PAGE_SIZE buffer. Use sysfs_emit_at() so output is bounded. Fixes: 98f8cdce1db5 ("cpu/hotplug: Add sysfs state interface") Signed-off-by: Bradley Morgan <include@grrlz.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260619163719.12103-2-include@grrlz.net
8 days	cpu: hotplug: Preserve per instance callback errors	Bradley Morgan
	cpuhp_invoke_callback() unwinds earlier callbacks for the same hotplug state when one instance fails. The rollback path currently reuses ret, so a successful rollback can hide the original error and make the failed transition look successful. Keep the rollback result separate from the original error. Fixes: 724a86881d03 ("smp/hotplug: Callback vs state-machine consistency") Signed-off-by: Bradley Morgan <include@grrlz.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260619163719.12103-1-include@grrlz.net
8 days	Merge tag 'liveupdate-v7.2-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux Pull liveupdate updates from Mike Rapoport: "Kexec Handover (KHO): - make memory preservation compatible with deferred initialization of the memory map Live Update Orchestrator (LUO): - add LIVEUPDATE_SESSION_GET_NAME ioctl and parameter verification for LIVEUPDATE_IOCTL_CREATE_SESSION ioctl - documentation updates for liveupdate=on command line option, systemd support and the current compatibility status - remove the fixed limits on the number of files that can be preserved within a single session, and the total number of sessions managed by the LUO Misc fixes: - reference count incoming File-Lifecycle-Bound (FLB) data so it cannot be freed while a subsystem is still using it - fixes for a TOCTOU race in luo_session_retrieve(), a use- after-free in the file finish and unpreserve paths, concurrent session mutations during reboot and serialization on preserve_context kexec - make sure ioctls for incoming LUO sessions are blocked for outgoing sessions and vice versa - make sure KHO scratch size is always aligned by CMA_MIN_ALIGNMENT_BYTES - fix memblock tests build issue introduced by KHO changes" * tag 'liveupdate-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux: (36 commits) liveupdate: Document that retrieve failure is permanent docs: memfd_preservation: fix rendering of ABI documentation selftests/liveupdate: Add stress-files kexec test selftests/liveupdate: Add stress-sessions kexec test selftests/liveupdate: Test session and file limit removal liveupdate: Remove limit on the number of files per session liveupdate: Remove limit on the number of sessions liveupdate: defer session block allocation and physical address setting kho: add support for linked-block serialization liveupdate: Extract luo_session_deserialize_one helper liveupdate: Extract luo_file_deserialize_one helper liveupdate: register luo_ser as KHO subtree liveupdate: centralize state management into struct luo_ser liveupdate: avoid mixing cleanup guards with goto in luo_session_retrieve_fd liveupdate: change file_set->count type to u64 for type safety liveupdate: Remove unused ser field from struct luo_session liveupdate: fix u-a-f in luo_file_unpreserve_files() and luo_file_finish() liveupdate: block session mutations during reboot liveupdate: fix TOCTOU race in luo_session_retrieve() liveupdate: skip serialization for context-preserving kexec ...
8 days	locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()	Thomas Gleixner
	rt_spin_unlock() releases the RCU protection before unlocking the lock. That opens the door for the following UAF scenario: T1 T2 spin_lock(&p->lock); rcu_read_lock(); invalidate(p); p = rcu_dereference(ptr); rcu_assign_pointer(ptr, NULL); if (!p) return; spin_unlock(&p->lock); spin_lock(&p->lock) lock(&lock->lock); rcu_read_lock(); kfree_rcu(p); rcu_read_unlock(); .... spin_unlock(&p->lock) rcu_read_unlock(); // Ends grace period rcu_do_batch() kfree(p); UAF -> rt_mutex_cmpxchg_release(&lock->lock...) Regular spinlocks keep preemption disabled accross the unlock operation, which provides full RCU protection, but the RT substitution fails to resemble that. Same applies for the rwlock substitution. Move the rcu_read_unlock() invocation past the unlock operations to match the non-RT semantics. This makes it asymmetric vs. rt_xxx_lock(), but that's harmless as the caller needs to hold RCU read lock across the lock operation. The migrate_enable() call stays before the unlock operation because there is no per CPU operation in the unlock path which would require migration to be kept disabled. Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant") Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com Decoded-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/87jyrud75z.ffs@fw13
10 days	sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path	Rik van Riel
	In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and mm_cid.active is set, the CID is checked with cid_in_transit() before setting the transition bit. In per-CPU mode a newly forked or exec'd task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are assigned lazily on schedule-in. With cid_in_transit() the guard passes for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET \| MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this to clear_bit() with MM_CID_UNSET as the bit number, triggering an out-of-bounds write. Symptoms: this is genuine memory corruption, but a bounded out-of-bounds write, not an arbitrary one. MM_CID_UNSET is the fixed sentinel BIT(31), so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid() strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET, mm_cidmask(mm)). The cid bitmap is embedded in the mm_struct slab object (after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus() bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a fixed offset of 2^31 / 8 == 256 MiB past the bitmap base. The address is not attacker-influenced (fixed sentinel -> fixed offset) and the op only clears a single bit; what sits 256 MiB further along the direct map is whatever kernel object happens to live there, so this corrupts one bit of unpredictable kernel memory -- it is not an arbitrary-address or arbitrary-value write. It triggers only in per-CPU CID mode, when a CPU is running an active task of the target mm whose cid is still MM_CID_UNSET -- the fork()/execve() window before that task's next schedule-in assigns it a real CID -- and a per-CPU -> per-task fixup walks over it (the mode fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred max_cids recompute in mm_cid_work_fn()). In practice syzkaller surfaced it as a KASAN use-after-free reported in __schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined via mm_cid_schedout() -> mm_drop_cid(). Guard the transition-bit assignment against MM_CID_UNSET, in addition to the existing cid_in_transit() check, so the bit is only set on a genuine task-owned CID. A CPU-owned (MM_CID_ONCPU) CID of a running active task is handled by the cid_on_cpu(pcp->cid) branch above and never reaches this path, so excluding MM_CID_UNSET (and the already-transitioning case) is sufficient. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions") Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Assisted-by: Claude:claude-opus-4-8 syzkaller Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260616203818.1516263-1-riel@surriel.com
10 days	Merge tag 'mm-stable-2026-06-18-09-26' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "selftests/mm: clean up build output and verbosity" (Li Wang) Remove some noise from the MM selftests build - "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts) Speed up the freeing of a batch of 0-order pages by first scanning them for coalescing opportunities. This is applicable to vfree() and to the releasing of frozen pages - "mm/damon: introduce DAMOS failed region quota charge ratio" (SeongJae Park) Address a DAMOS usability issue: The DAMOS quota often exhausts prematurely because it charges for all memory attempted, causing slow and inconsistent performance when actions fail on unreclaimable memory. To fix this, a new feature lets users set a smaller, flexible quota charge ratio (via a numerator and denominator) for failed regions. Since failed actions cause less overhead, reducing their quota cost ensures more predictable and efficient DAMOS processing - "selftests/cgroup: improve zswap tests robustness and support large page sizes" (Li Wang) Fix various spurious failures and improves the overall robustness of the cgroup zswap selftests - "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga) Fix an issue in the mlock selftests on arm32 - "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao) Some maintenance work in the huge_memory code - "treewide: fixup gfp_t printks" (Brendan Jackman) Use the special vprintf() gfp_t conversion in various places - "mm: Fix vmemmap optimization accounting and initialization" (Muchun Song) Fix several bugs in the vmemmap optimization, mainly around incorrect page accounting and memmap initialization in the DAX and memory hotplug paths. It also fixes pageblock migratetype initialization and struct page initialization for ZONE_DEVICE compound pages - "mm/damon: repost non-hotfix reviewed patches in damon/next tree" A sprinkle of unrelated minor bugfixes for DAMON - "mm: remove page_mapped()" (David Hildenbrand) Remove this function from the tree, replacing it with folio_mapped() - "mm/damon: let DAMON be paused and resumed" (SeongJae Park) Allow DAMON to be paused and resumed without losing its current state - "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad Usama Anjum) Simplify and speed up kasan by removing its ineffective tagging of stacks and page tables - "mm/damon/reclaim,lru_sort: monitor all system rams by default" (SeongJae Park) Simplify deployment on diverse hardware like NUMA systems by updating DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the physical address range covering all System RAM areas by default, replacing the overly restrictive behavior that only targeted the single largest memory block to save on negligible overhead - "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae Park) Update some DAMON docs - "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin) Switch zone->lock handling over to using the guard() mechanisms - "mm/filemap: tighten mmap_miss hit accounting" (fujunjie) Fix a flaw where the mmap_miss counter over-credited page cache hits during fault-arounds and page-fault retries. This results in significant reduction of redundant synchronous mmap readahead I/O, drastically cutting down execution time and gigabytes read for sparse random or strided memory access workloads - "selftests/cgroup: Fix false positive failures in test_percpu_basic" (Li Wang) Fix a couple of false-positives in the cgroup kmem selftests - "mm/damon/reclaim: support monitoring intervals auto-tuning" (SeongJae Park) Add a new parameter to DAMON permitting DAMON_RECLAIM to automatically tune DAMON's sampling and aggregation intervals - "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park) Change DAMON_STAT to provide the pid of its kdamond - "mm/kmemleak: dedupe verbose scan output" (Breno Leitao) Remove large amounts of duplicated backtraces from the verbose-mode kmemleak output - "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David Hildenbrand) Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to removing it entirely in a later series - "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan) Prevent users from passing a non-power-of-2 value of `addr_unit', as this later results in undesirable behavior - "mm: document read_pages and simplify usage" (Frederick Mayle) - "tools/mm/page-types: Fix misc bugs" (Ye Liu) Fix three issues in tools/mm/page-types.c - "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman) Implement several cleanups in the page allocator and related code - "mm, swap: swap table phase IV: unify allocation" (Kairui Song) Unify the allocation and charging of anon and shmem swap in folios, provides better synchronization, consolidates the metadata management, hence dropping the static array and map, and improves performance - "mm/damon: introduce data attributes monitoring" (SeongJae Park( Extend DAMON to monitor general data attributes other than accesses - "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra) Implement the TODO in vrealloc() to unmap and free unused pages when shrinking across a page boundary - "mm/damon: documentation and comment fixes" (niecheng) - "remove mmap_action success, error hooks" (Lorenzo Stoakes) Eliminate custom hooks from mmap_action by removing the problematic success_hook which allowed drivers to improperly access uninitialized VMAs. It replaces the error_hook with a simple error-code field and updates the memory char driver accordingly - "mm/damon: minor improvements for code readability and tests" (SeongJae Park) - "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym Shcherba) - "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike Rapoport) - "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and others) Clean up and slightly improves MGLRU's reclaim loop and dirty writeback handling. Large performance improvements are measured - "use vma locks for proc/pid/{smaps\|numa_maps} reads" (Suren Baghdasaryan) Use per-vma locks when reading /proc/pid/smaps and numa_maps similar to reduce contention on central mmap_lock - "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()" (Ran Xiaokai) Some cleanup work in the THP code - "selftests/memfd: fix compilation warnings" (Konstantin Khorenko) Fix a few build glitches in the memfd selftest code. - "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel Butt) Resolve a 68% performance regression caused by NUMA-node cache thrashing around struct obj_stock_pcp by shrinking its existing fields and expanding it into a multi-slot array that caches up to five obj_cgroup pointers per CPU, allowing per-node variants of the same memcg to coexist within a single 64-byte cache line. - "zram: writeback fixes" (Sergey Senozhatsky) address a couple of unrelated zram writeback issues - "mm: switch THP shrinker to list_lru" (Johannes Weiner) Resolve NUMA-awareness issues and streamlines callsite interaction by refactoring and extending the list_lru API to completely replace the complex, open-coded deferred split queue for Transparent Huge Pages - "mm: improve large folio readahead for exec memory" (Usama Arif) Improve large-folio readahead on systems like 64K-page arm64 by preventing the mmap_miss check from permanently disabling target-oriented VM_EXEC readahead, and by generalizing the force_thp_readahead gate to support mappings with any usefully large maximum folio order under the cache cap. - "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau) Fix a bunch of minor issues in the userfaultfd/pagemap, all of which were flagged by Sashiko review of proposed new material - "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and vmemmap_check_pmd()" (Muchun Song) Provide generic versions of these two functions so the four arch-specific implementations can be removed. - "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device" (Youngjun Park) Address a uswsusp-vs-swapoff race and reduces the swap device reference taking/releasing frequency. - "mm/hmm: A fix and a selftest" (Dev Jain) * tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries fs/proc/task_mmu: do not warn on seeing non-migration pmd entry lib/test_hmm: check alloc_page_vma() return value and handle OOM mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX mm/swap: remove redundant swap device reference in alloc/free mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device mm/filemap: use folio_next_index() for start vmalloc: fix NULL pointer dereference in is_vm_area_hugepages() sparc/mm: drop vmemmap_check_pmd helper and use generic code loongarch/mm: drop vmemmap_check_pmd helper and use generic code riscv/mm: drop vmemmap_pmd helpers and use generic code arm64/mm: drop vmemmap_pmd helpers and use generic code mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd() rust: page: mark Page::nid as inline userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks userfaultfd: gate must_wait writability check on pte_present() mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole() fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry() fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race ...
10 days	perf: Fix addr_filter_ranges lifetime	Peter Zijlstra
	Lee Jia Jie reported that since event::addr_filter_ranges is used under RCU, it should be RCU freed. Reported-by: Lee Jia Jie <jiajie.lee@starlabs.sg> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
10 days	Merge tag 'trace-ring-buffer-v7.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt - Do not invalidate entire buffer for invalid sub-buffers For the persistent ring buffer, if one sub-buffer is found to be invalid, it invalidates the entire per CPU ring buffer. This can lose a lot of valuable data if there's some corruption with the writes to the buffer not syncing properly on a hard crash. Instead, if a sub-buffer is found to be invalid, simply zero it out and mark it for "missed events". When the persistent ring buffer is read and a sub-buffer that was cleared due to being invalid on boot up is discovered, the output will show "[LOST EVENTS]" to let the user know that events were missing at that location. Displaying the events from valid buffers can still be useful. - Add a test to be able to test corrupted sub-buffers If a persistent ring buffer is created as "ptraingtest" and the new config that adds the test is enabled, when a panic happens, the kernel will randomly corrupt one of the per CPU ring buffers. On boot up, the sub-buffers with the corruption should be cleared and flagged. When reading this buffer, the missed events should should [LOST EVENTS]. - Add commit number in the sub-buffer meta debug info The commit is used to know the content of a meta page. Add it to the buffer_meta file that is shown for each per CPU buffer. - Clean up the persistent ring buffer validation code Add some helper functions and make variable names more consistent. * tag 'trace-ring-buffer-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Better comment the use of RB_MISSED_EVENTS ring-buffer: Show persistent buffer dropped events in trace_pipe file ring-buffer: Show persistent buffer dropped events in trace file ring-buffer: Have dropped subbuffers be persistent across reboots ring-buffer: Cleanup buffer_data_page related code ring-buffer: Cleanup persistent ring buffer validation ring-buffer: Show commit numbers in buffer_meta file ring-buffer: Add persistent ring buffer invalid-page inject test ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
10 days	Merge tag 'trace-v7.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Remove a redundant IS_ERR() check trace_pipe_open() already checks for IS_ERR() and does it again in the return path. Remove the return check. - Export seq_buf_putmem_hex() to allow kunit tests against them To add Kunit tests on seq_buf_putmem_hex(), it needs to be exported. - Replace strcat() and strcpy() with seq_buf() logic The code for synthetic events uses a series of strcat() and strcpy() which can be error prone. Replace them with seq_buf() logic that does all the necessary bound checking. - Add a lockdep rcu_is_watching() to trace_##event##_enabled() call The trace_##event##_enabled() is a static branch that is true if the "event" is enabled. But this can hide bugs if this logic is in a location where RCU is disabled and not "watching". It would only trigger if lockdep is enabled and the event is enabled. Add a "rcu_is_watching()" warning if lockdep is enabled in that helper function to trigger regardless if the event is enabled or not. - Remove the local variable in the trace_printk() macro For name space integrity, remove the _______STR variable in the trace_printk() macro for using the sizeof() macro directly. - Use guard()s for the trace_recursion_record.c file - Fix typo in a comment of eventfs_callback() kerneldoc - Use trace_call__##event() in events within trace_##event##_enabled() A couple of events are called within an if block guarded by trace_##event##_enabled(). That is a static key that is only enabled when the event is enabled. The trace_call_##event() calls the tracepoint code directly without adding a redundant static key for that check. - Allow perf to read synthetic events Currently, perf does not have the ability to enable a synthetic event. If it does, it will either cause a kernel warning or error with "No such device". Synthetic events are not much different than kprobes and perf can handle fine with a few modifications. - Replace printk(KERN_WARNING ...) with pr_warn() - Replace krealloc() on an array with krealloc_array() - Fix README file path name for synthetic events - Change tracing_map tracing_map_array to use a flexible array Instead of allocating a separate pointer to hold the pages field of tracing_map_array, allocate the pages field as a flexible array when allocating the structure. - Fold trace_iterator_increment() into trace_find_next_entry_inc() The function trace_iterator_increment() was only used by trace_find_next_entry_inc(). It's not big enough to be a helper function for one user. Fold it into its caller. - Make field_var_str field a flexible array of hist_elt_data Instead of allocating a separate pointer for the field_var_str array of the hist_elt_data structure, allocate it as a flexible array when allocating the structure. - Disable KCOV for trace_irqsoff.c Like trace_preemptirq.c, trace_irqsoff.c has code that will crash when KCOV is enabled on ARM. The irqsoff tracing can be called on ARM because the irqsoff tracing code can be run from early interrupt code and produce coverage unrelated to syscall inputs. - Fix warning in __unregister_ftrace_function() called by perf Perf calls unregister_ftrace_function() without checking if its ftrace_ops has already been unregistered. There's an error path where on clean up it will unregister the ftrace_ops even if it wasn't registered and causes a warning. * tag 'trace-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: perf/ftrace: Fix WARNING in __unregister_ftrace_function tracing: Disable KCOV instrumentation for trace_irqsoff.o tracing: Turn hist_elt_data field_var_str into a flexible array tracing: Move trace_iterator_increment() into trace_find_next_entry_inc() tracing: Simplify pages allocation for tracing_map logic tracing: Fix README path for synthetic_events tracing: Use krealloc_array() for trace option array growth tracing/branch: Use pr_warn() instead of printk(KERN_WARNING) tracing: Allow perf to read synthetic events HID: Use trace_call__##name() at guarded tracepoint call sites cpufreq: amd-pstate: Use trace_call__##name() at guarded tracepoint call site tracefs: Fix typo in a comment of eventfs_callback() kerneldoc tracing: Switch trace_recursion_record.c code over to use guard() tracing: Remove local variable for argument detection from trace_printk() tracepoint: Add lockdep rcu_is_watching() check to trace_##name##_enabled() tracing: Bound synthetic-field strings with seq_buf seq_buf: Export seq_buf_putmem_hex() and add KUnit tests tracing: Remove redundant IS_ERR() check in trace_pipe_open()
12 days	treewide: fix transposed "sign" typos and update spelling.txt	Shardul Deshpande
	Several comments transpose the letters in "assigned" and "unsigned", spelling them with "sing" instead of "sign". Correct all of them. Of these, the misspelling of "assigned" is not yet flagged by checkpatch, so also add it to scripts/spelling.txt. The remaining matches of `grep -ri singed` are RISINGEDGE register and enum names, not typos. Link: https://lore.kernel.org/20260612181633.734458-1-iamsharduld@gmail.com Signed-off-by: Shardul Deshpande <iamsharduld@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 days	Merge tag 'dma-mapping-7.2-2026-06-16' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - added checks for DMA attributes in the debug code, especially to ensure that mappings are created and released with matching attributes (Leon Romanovsky) - better default configuration for CMA on NUMA machines (Feng Tang) - code cleanup in dma benchmark tool (Rosen Penev) * tag 'dma-mapping-7.2-2026-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma: map_benchmark: turn dma_sg_map_param buf into a flexible array dma-contiguous: simplify numa cma area handling dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly dma-debug: Ensure mappings are created and released with matching attributes dma-debug: Feed DMA attribute for unmapping flows too dma-debug: Record DMA attributes in debug entry dma-debug: Remove unused DMA attribute parameter ntb: Use consistent DMA attributes when freeing DMA mappings ntb: Store original DMA address for future release
12 days	Merge tag 'printk-for-7.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add upper case flavor for printing MAC addresses (%p[mM][U]) and use it in the nintendo driver - Fix matching of hash_pointers= parameter modes - Fix size check of vsprintf() field_width and precision values - Add check of size returned by vsprintf() - Add KUnit test for restricted pointer printing (%pK) - Some code cleanup * tag 'printk-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: HID: nintendo: Use %pM format specifier for MAC addresses vsprintf: Add upper case flavour to %p[mM] lib/vsprintf: replace min_t/max_t with min/max printk: fix typos in comments lib/vsprintf: Require exact hash_pointers mode matches vsprintf: Add test for restricted kernel pointers vsprintf: Only export no_hash_pointers to test module lib/vsprintf: Limit the returning size to INT_MAX lib/vsprintf: Fix to check field_width and precision
12 days	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio updates from Michael Tsirkin: - new virtio CAN driver - support for LoongArch architecture in fw_cfg - support for firmware notifications in vdpa/octeon_ep - support for VFs in virtio core - fixes, cleanups all over the place, notably: - vhost: fix vhost_get_avail_idx for a non empty ring fixing an significant old perf regression - READ_ONCE() annotations mean virtio ring is now free of KCSAN warnings * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits) can: virtio: Fix comment in UAPI header can: virtio: Add virtio CAN driver virtio: add num_vf callback to virtio_bus fw_cfg: Add support for LoongArch architecture vdpa/octeon_ep: fix IRQ-to-ring mapping in interrupt handler vdpa/octeon_ep: Add vDPA device event handling for firmware notifications vdpa/octeon_ep: Use 4 bytes for mailbox signature vdpa/octeon_ep: Fix PF->VF mailbox data address calculation vhost_task_create: kill unnecessary .exit_signal initialization vhost: remove unnecessary module_init/exit functions vdpa/mlx5: Use kvzalloc_flex() for MTT command memory vdpa_sim_net: switch to dynamic root device vdpa_sim_blk: switch to dynamic root device virtio-mem: Destroy mutex before freeing virtio_mem virtio-balloon: Destroy mutex before freeing virtio_balloon tools/virtio: fix build for kmalloc_obj API and missing stubs virtio_ring: Add READ_ONCE annotations for device-writable fields vduse: fix compat handling for VDUSE_IOTLB_GET_FD/VDUSE_VQ_GET_INFO tools/virtio: check mmap return value in vringh_test vhost/net: complete zerocopy ubufs only once ...
12 days	cpufreq: schedutil: Fix uncleared need_freq_update on the .adjust_perf() path	Zhongqiu Han
	The need_freq_update flag makes sugov_should_update_freq() return true regardless of the rate_limit_us throttling, and is cleared in sugov_update_next_freq(). sugov_update_single_freq() and sugov_update_shared() go through that helper, so the flag does not persist there. However, sugov_update_single_perf(), used by drivers implementing the .adjust_perf() callback (e.g. intel_pstate or amd-pstate in passive mode) calls cpufreq_driver_adjust_perf() directly and never goes through sugov_update_next_freq(), so the need_freq_update flag is not cleared in that path. Before commit 75da043d8f88 ("cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()"), this was effectively harmless because sugov_should_update_freq() still honored the rate limit even when need_freq_update was set. After that change, the flag forces sugov_should_update_freq() to always return true, so once set, it stays effective indefinitely on the .adjust_perf() path. As a result, cpufreq_driver_adjust_perf() gets called on every scheduler utilization update (with the runqueue lock held) rather than being throttled by rate_limit_us, even if the driver itself may skip redundant hardware updates. Clear need_freq_update at the end of the adjust_perf path as well. Fixes: 75da043d8f88 ("cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()") Signed-off-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Reviewed-by: Hongyan Xia <hongyan.xia@transsion.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Cc: All applicable <stable@vger.kernel.org> [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20260616154733.2405236-1-zhongqiu.han@oss.qualcomm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
12 days	hrtimer: Correct CONFIG_NO_HZ_COMMON macro name in comment	Ethan Nelson-Moore
	A comment in kernel/time/hrtimer.c incorrectly refers to CONFIG_NOHZ_COMMON instead of CONFIG_NO_HZ_COMMON. Correct it. Discovered while searching for CONFIG_* symbols referenced in code but not defined in any Kconfig file. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260609034314.25029-1-enelsonmoore@gmail.com
12 days	posix-cpu-timers: Use u64 multiplication in update_rlimit_cpu()	Zhan Xusheng
	update_rlimit_cpu() converts the RLIMIT_CPU value to nanoseconds with u64 nsecs = rlim_new * NSEC_PER_SEC; On 32-bit kernels both rlim_new (unsigned long) and NSEC_PER_SEC (1000000000L) are 32-bit, so the multiplication is performed in unsigned long and truncated for rlim_new > 4 seconds before being widened to u64. The same file already casts to u64 for the matching computation in check_process_timers(): u64 softns = (u64)soft * NSEC_PER_SEC; As a result, the truncated value is installed into the CPUCLOCK_PROF expiry cache (nextevt), causing the process CPU timer to be programmed to fire prematurely for any RLIMIT_CPU soft limit >= 5 seconds. The actual SIGXCPU/SIGKILL decision in check_process_timers() already casts to u64 and is therefore correct, so limit enforcement is not broken; only the expiry-cache programming is wrong. Apply the same cast here so both paths convert rlim_cur identically. 64-bit kernels are unaffected. Fixes: 858cf3a8c599 ("timers/itimer: Convert internal cputime_t units to nsec") Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260616112017.1681372-1-zhanxusheng@xiaomi.com
12 days	timekeeping: Register default clocksource before taking tk_core.lock	Mikhail Gavrilov
	Commit f24df84cbe05 ("time/jiffies: Register jiffies clocksource before usage") moved the jiffies clocksource registration into clocksource_default_clock(), so that it is registered lazily on the first call. __clocksource_register() acquires clocksource_mutex, but the first caller is timekeeping_init(), which invokes clocksource_default_clock() while holding tk_core.lock, a raw spinlock. Acquiring a sleeping mutex while holding a raw spinlock is invalid. The default clocksource only has to be registered before tk_setup_internals() consumes its mult/shift/maxadj. Neither clocksource_default_clock(), the ->enable() callback, nor the registration itself need tk_core.lock, so fetch and enable the clock before acquiring the lock. This preserves the "register before usage" ordering while keeping clocksource_mutex out of the raw spinlock section. clocksource_default_clock() has a second caller, clocksource_done_booting(), which invokes it with clocksource_mutex already held. That path avoids a recursive lock because timekeeping_init() has already run and set cs_jiffies_registered, so the registration is skipped there. This change does not alter that; it only fixes the invalid wait context in timekeeping_init(). Fixes: f24df84cbe05 ("time/jiffies: Register jiffies clocksource before usage") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reported-by: Breno Leitao <leitao@debian.org> Reported-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260616070914.65818-1-mikhail.v.gavrilov@gmail.com
12 days	Merge tag 'audit-pr-20260615' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Fix a recursive deadlock when duplicating executable file rules Avoid multiple lookups and attempted I_MUTEX_PARENT locks when moving watched files by passing the already resolved inodes through the audit code. - Fix removal of executable watch rules after the file is deleted Prior to this fix we were unable to remove an executable file watch where the file had been previously deleted due to a negative dentry check in the code that performs the lookup on the file watches. - Convert our basic "unsigned" type usage to "unsigned int". * tag 'audit-pr-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix recursive locking deadlock in audit_dupe_exe() audit: fix removal of dangling executable rules audit: use 'unsigned int' instead of 'unsigned'
12 days	Merge tag 'sched_ext-for-7.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: "Most of this continues the in-development sub-scheduler support, which lets a root BPF scheduler delegate to nested sub-schedulers. The dispatch-path building blocks landed in 7.1. A follow-up patchset in development will complete enqueue-path support for hierarchical scheduling. This cycle adds most of that infrastructure: - Topological CPU IDs (cids): a dense, topology-ordered CPU numbering where the CPUs of a core, LLC, or NUMA node form contiguous ranges, so a topology unit becomes a (start, length) slice. Raw CPU numbers are sparse and don't track topological closeness, which makes them clumsy for sharding work across sub-schedulers and awkward in BPF. - cmask: bitmaps windowed over a slice of cid space, so a sub-scheduler can track, for example, the idle cids of its shard without a full NR_CPUS cpumask. - A struct_ops variant that cid-form sub-schedulers register with, along with the cid-form kfuncs they call. - BPF arena integration, which sub-scheduler support is built on. The bpf-next additions let the kernel read and write the BPF scheduler's arena directly, turning it into a real kernel/BPF shared-memory channel. Shared state like the per-CPU cmask now lives there. - scx_qmap is reworked to exercise the new arena and cid interfaces. Additionally: - Exit-dump improvements: dump the faulting CPU first, expose the exit CPU to BPF and userspace, and normalize the dump header. - Misc kfuncs and cleanups: a task-ID lookup kfunc, __printf checking on the error and dump formatters, header reorganization, and assorted fixes" * tag 'sched_ext-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (59 commits) sched_ext: Add scx_arena_to_kaddr() / scx_kaddr_to_arena() sched_ext: Make scx_bpf_kick_cid() return s32 sched_ext: Add scx_cmask_test() and scx_cmask_for_each_cid() tools/sched_ext: Order single-cid cmask helpers as (cid, mask) sched_ext: Order single-cid cmask helpers as (cid, mask) selftests/sched_ext: Fix dsq_move_to_local check sched_ext: Guard BPF arena helper calls to fix 32-bit build sched_ext: idle: Fix errno loss in scx_idle_init() sched_ext: Convert ops.set_cmask() to arena-resident cmask sched_ext: Sub-allocator over kernel-claimed BPF arena pages sched_ext: Require an arena for cid-form schedulers sched_ext: Add cmask mask ops sched_ext: Track bits[] storage size in struct scx_cmask sched_ext: Rename scx_cmask.nr_bits to nr_cids tools/sched_ext: scx_qmap: Fix qa arena placement sched_ext: Mark !CONFIG_EXT_SUB_SCHED dummy stubs static inline sched_ext: Replace tryget_task_struct() with get_task_struct() sched_ext: Add scx_task_iter_relock() and use it in scx_root_enable_workfn() sched_ext: Fix ops_cid layout assert sched_ext: Use offsetofend on both sides of the ops_cid layout assert ...
12 days	Merge tag 'cgroup-for-7.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Last cycle deferred css teardown on cgroup removal until the cgroup depopulated, so a css is not taken offline while tasks can still reference it. Disabling a controller through cgroup.subtree_control still had the same problem. This reworks the deferral from per-cgroup to per-css so that path is covered too. - New RDMA controller monitoring files: rdma.peak for per-device peak usage and rdma.events / rdma.events.local for resource-limit exhaustion. The max-limit parser was rewritten, fixing two input parsing bugs. - cpuset: fix a sched-domain leak on the domain-rebuild failure path and skip a redundant hardwall ancestor scan on v2. - Misc: pair the remaining lockless cgroup.max.* reads with WRITE_ONCE, assorted selftest robustness fixes, and doc path corrections. * tag 'cgroup-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (22 commits) cgroup: Migrate tasks to the root css when a controller is rebound docs: cgroup: Fix stale source file paths cgroup/cpuset: Free sched domains on rebuild guard failure cgroup: pair max limit READ_ONCE() with WRITE_ONCE() selftests/cgroup: enable memory controller in hugetlb memcg test cgroup/rdma: Drop unnecessary READ_ONCE() on event counters cgroup: Defer kill_css_finish() in cgroup_apply_control_disable() cgroup: Add per-subsys-css kill_css_finish deferral cgroup: Move populated counters to cgroup_subsys_state cgroup: Annotate unlocked nr_populated_* accesses with READ_ONCE/WRITE_ONCE cgroup: Inline cgroup_has_tasks() in cgroup.h cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution cgroup/rdma: add rdma.events to track resource limit exhaustion cgroup/rdma: add rdma.peak for per-device peak usage tracking selftests/cgroup: check malloc return value in alloc_anon functions cgroup/cpuset: Skip hardwall ancestor scan in cpuset v2 in cpuset_current_node_allowed() selftests/cgroup: fix misleading debug message in test_cgfreezer_time_child selftests/cgroup: fix child process escaping to parent cleanup in test_cpucg_nice selftests/cgroup: Add NULL check after malloc in cgroup_util.c ...
12 days	Merge tag 'wq-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	Linus Torvalds
	Pull workqueue updates from Tejun Heo: - Continued progress toward making alloc_workqueue() unbound by default: more callers converted to WQ_PERCPU / system_percpu_wq / system_dfl_wq, and new warnings for queues that use neither WQ_PERCPU nor WQ_UNBOUND or the legacy system_wq / system_unbound_wq. - Misc: drop the now-trivial apply_wqattrs_lock()/unlock() wrappers, forbid the TEST_WORKQUEUE benchmark from being built-in, and fix a spurious pointer level in the worker debug-dump path. * tag 'wq-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: drm/bridge: anx7625: Add WQ_PERCPU add to alloc_workqueue wifi: ath6kl: fix invalid workqueue flags in ath6kl_usb_create() btrfs: Drop WQ_PERCPU from ordered_flags in btrfs_init_workqueues() workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present workqueue: Add warnings and fallback if system_{unbound}_wq is used workqueue: drop spurious '*' from print_worker_info() fn declaration workqueue: forbid TEST_WORKQUEUE from being built-in workqueue: drop apply_wqattrs_lock()/unlock() wrappers umh: replace use of system_unbound_wq with system_dfl_wq rapidio: rio: add WQ_PERCPU to alloc_workqueue users media: ddbridge: add WQ_PERCPU to alloc_workqueue users platform: cznic: turris-omnia-mcu: replace use of system_wq with system_percpu_wq media: synopsys: hdmirx: replace use of system_unbound_wq with system_dfl_wq virt: acrn: Add WQ_PERCPU to alloc_workqueue users