summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-05-24torture: Add torture_sched_set_normal() for user-specified nice valuesPaul E. McKenney
This new torture_sched_set_normal() function clamps the nice value at the MIN_NICE..MAX_NICE limits, splatting it these limits are exceeded. It then invokes sched_set_normal() to set the new value. This prevents more difficult-to-debug failures within the scheduler. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2026-05-23Merge tag 'driver-core-7.1-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Remove the software node on platform device release(); without this, the software node remains registered after the device is gone and a subsequent platform_device_register_full() reusing the same node fails with -EBUSY - In sysfs_update_group(), do not remove a pre-existing directory when create_files() fails; the previous code would silently destroy a sysfs group that the caller did not create - Set fwnode->secondary to NULL in fwnode_init() to avoid dereferencing uninitialized memory (e.g. in dev_to_swnode()) when the firmware node is allocated on the stack or via a non-zeroing allocator * tag 'driver-core-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: device property: set fwnode->secondary to NULL in fwnode_init() sysfs: don't remove existing directory on update failure driver core: platform: remove software node on release()
2026-05-23PCI: Indicate context lost if L1SS exit is broken during resume from system ↵Manivannan Sadhasivam
suspend Per PCIe v7.0, sec 5.5.3.3.1, when exiting L1.2 due to an endpoint asserting CLKREQ# signal, the REFCLK must be turned on within the latency advertised in the LTR message. This requirement applies to L1.1 as well. On some platforms like Qcom, these requirements are satisfied during OS runtime, but not while resuming from the system suspend. This happens because the PCIe RC driver may remove all resource votes and turn off the PHY analog circuitry during suspend to maximize power savings while keeping the link in L1SS. Consequently, when the endpoint asserts CLKREQ# to wake up, the RC driver must restore the PHY and enable the REFCLK. When this recovery process exceeds the L1SS exit latency time (roughly L10_REFCLK_ON + T_COMMONMODE), the endpoint may treat it as a fatal condition and trigger Link Down (LDn). This results in a reset that destroys the internal device state. So to indicate this platform limitation to the client drivers, introduce a new flag 'pci_host_bridge::broken_l1ss_resume' and check it in pci_suspend_retains_context(). If the flag is set by the RC driver, the API will return 'false' indicating the client drivers that the device context may not be retained and the drivers must be prepared for context loss. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260519-l1ss-fix-v2-2-b2c3a4bdeb15@oss.qualcomm.com
2026-05-23char: dtlk: remove driver for ISA speech synthesizer cardEthan Nelson-Moore
The dtlk driver supports the RC Systems DoubleTalk PC ISA speech synthesizer card. It has severe coding style issues and has only received tree-wide fixes and drive-by cleanups in the entire Git history (since Linux 2.6.12-rc2). The same hardware is supported by drivers/accessibility/speakup for screen reader use, but that implementation does not share any code with this driver. Given all of these factors, it is likely the driver is entirely unused. Remove it to reduce future maintenance workload. Note: The removed maintainer is already listed in CREDITS. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260502043341.34324-1-enelsonmoore@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-23PCI: Add pci_suspend_retains_context() to check if device state is preserved ↵Manivannan Sadhasivam
during suspend Currently, PCI endpoint drivers (e.g. nvme) use pm_suspend_via_firmware() to check whether device state is preserved during system suspend. If firmware will be invoked at the end of suspend, we don't know whether devices will retain their internal state. But device context might be lost due to platform issues as well. Having those checks in endpoint drivers will not scale and will cause a lot of code duplication. Add pci_suspend_retains_context() as a sole point of truth that the endpoint drivers can rely on to check whether they can expect the device context to be retained or not. If pci_suspend_retains_context() returns 'false', drivers need to prepare for context loss by performing actions such as resetting the device, saving the context, shutting it down etc. If it returns 'true', drivers do not need to perform any special action and can leave the device in active state. Right now, this API only incorporates pm_suspend_via_firmware(), but will be extended in future commits. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260519-l1ss-fix-v2-1-b2c3a4bdeb15@oss.qualcomm.com
2026-05-23bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena()Tejun Heo
struct bpf_arena is opaque to callers outside arena.c. Add two helpers for struct_ops subsystems that need to reach into an arena: bpf_arena_map_kern_vm_start(struct bpf_map *map) returns @map's kern_vm_start. A sched_ext follow-up needs this to translate kern_va <-> uaddr. bpf_prog_arena(struct bpf_prog *prog) returns the bpf_map of the arena referenced by @prog (NULL if @prog references no arena). The verifier enforces at most one arena per program. Used by struct_ops callers that auto-discover an arena from a member prog and need to take a map reference. Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260522172219.1423324-6-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-23bpf: Add bpf_struct_ops_for_each_prog()Tejun Heo
Add a helper that walks the member progs of the struct_ops map containing a given @kdata vmtable. struct_ops ->reg() callbacks (and similar) sometimes need to inspect the loaded BPF programs, e.g. to discover maps they reference via prog->aux->used_maps. The implementation mirrors bpf_struct_ops_id(): container_of @kdata to recover the bpf_struct_ops_map, then iterate st_map->links[i]->prog for i in [0, funcs_cnt). Same access pattern, no new locking - by the time ->reg() fires st_map is fully populated and stable. A sched_ext follow-up walks the member progs of a cid-form scheduler's struct_ops map, reads prog->aux->arena directly, and requires all member progs to reference exactly one arena, without requiring the BPF program to call a registration kfunc. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260522172219.1423324-5-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-23bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callersTejun Heo
The existing kernel-side export of bpf_arena_alloc_pages is _non_sleepable only - it's used by the verifier to inline the kfunc when the call site is non-sleepable. There is no sleepable equivalent for kernel callers. The kfunc bpf_arena_alloc_pages itself is BPF-only. sched_ext needs sleepable kernel-side allocs for its arena pool init/grow paths. Add bpf_arena_alloc_pages_sleepable() mirroring the _non_sleepable wrapper but passing sleepable=true to arena_alloc_pages(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260522172219.1423324-4-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-23bpf: Recover arena kernel faults with scratch pageKumar Kartikeya Dwivedi
BPF arena usage is becoming more prevalent, but kernel <-> BPF communication over arena memory is awkward today. Data has to be staged through a trusted kernel pointer with extra code and copying on the BPF side. While reads through arena pointers can use a fault-safe helper, writes don't have a good solution. The in-line alternative would need instruction emulation or asm fixup labels. Enable direct kernel-side reads and writes within GUARD_SZ / 2 of any handed-in arena pointer, without bounds checking. A per-arena scratch page is installed by the arch fault path into empty arena kernel PTEs - x86 from page_fault_oops() for not-present faults, arm64 from __do_kernel_fault() for translation faults, both after the existing exception-table and KFENCE handling. The faulting instruction retries and the access is also reported through the program's BPF stream, preserving error reporting. bpf_prog_find_from_stack() resolves the current BPF program (and its arena) from the kernel stack - no new bpf_run_ctx state is added. Recovery covers the 4 GiB arena plus the upper half-guard (GUARD_SZ / 2). The lower half-guard is excluded because well-behaved kfuncs only access forward from arena pointers. The kfunc-author contract - access at most GUARD_SZ / 2 past a handed-in pointer - is documented in Documentation/bpf/kfuncs.rst. The install is lock-free via ptep_try_set(). On race-loss the winning installer's PTE is already valid, so the access retry succeeds. The arena clear path uses ptep_get_and_clear() so installer and clearer race through atomic accessors. No flush_tlb_kernel_range() afterwards. Stale "not mapped" entries just cause one extra re-fault, cheaper than a global IPI on every install. Scratch exists only to keep the kernel from oopsing on an in-line arena access. Its presence at a PTE means the BPF program has already malfunctioned, and the violation is reported through the program's BPF stream. The only requirement for behavior on a scratched PTE is that the kernel doesn't crash. In particular, any user-side access through such a PTE may segfault. The shared scratch page is freed once during map destruction. BPF instruction faults continue to use the existing JIT exception-table path. This patch changes only the kernel-text fault path. No UAPI flag is added. The new behavior is the default. v2: Use ptep_get_and_clear() in apply_range_clear_cb(). (David) v3: Stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL. (lkp) Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Cc: David Hildenbrand <david@kernel.org> Link: https://lore.kernel.org/r/20260522172219.1423324-3-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-23mm: Add ptep_try_set() for lockless empty-slot installsTejun Heo
Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is currently pte_none(). Returns true on success, false if the slot was already populated or the arch has no implementation. The intended caller is the upcoming bpf_arena kernel-side fault recovery path. The install runs from a page fault that can be nested under locks held by the faulting kernel caller (e.g. a BPF program holding raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry would A-A deadlock. Lock-free cmpxchg is the only viable option, which constrains this helper to special kernel page tables where concurrent writers cooperate via atomic accessors. The generic version in <linux/pgtable.h> returns false. x86 and arm64 override with try_cmpxchg-based implementations on the underlying pteval. Other architectures get the false stub - the callers there already fall through to oops. v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei) v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea) Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Acked-by: David Hildenbrand (arm) <david@kernel.org> Link: https://lore.kernel.org/r/20260522172219.1423324-2-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-22dpll: change dpll_netdev_pin_handle_size() to assume DPLL_A_PIN_ID will be usedEric Dumazet
We plan to no longer hold RTNL in "ip link show", and use RCU instead. Assume rtnl_fill_dpll_pin() will have to fill DPLL_A_PIN_ID. It is fine to over-estimate skb size (by 8 bytes) in if_nlmsg_size(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260521171440.114956-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-22Merge tag 'cgroup-for-7.1-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two rstat fixes: - Out-of-bounds access in the css_rstat_updated() BPF kfunc when called with an unchecked user-supplied cpu - Over-strict NMI guard after the recent switch to try_cmpxchg left sparc and ppc64 unable to queue rstat updates from NMI" * tag 'cgroup-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: rstat: relax NMI guard after switch to try_cmpxchg cgroup/rstat: validate cpu before css_rstat_cpu() access
2026-05-22cpufreq: Avoid redundant target() calls for unchanged limitsViresh Kumar
Drivers setting CPUFREQ_NEED_UPDATE_LIMITS expect target() to be invoked even if the target frequency remains unchanged, so they can update their internal policy limits state. Currently the core invokes target() unconditionally whenever the requested frequency matches policy->cur for such drivers, even if policy->min and policy->max haven't changed since the previous update. Track pending policy limit updates explicitly and skip redundant target() invocations when neither the target frequency nor the effective limits changed. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Link: https://patch.msgid.link/d0107c364b709abca21acf88072220bc05478594.1779423281.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-05-22cpufreq: Fix typo in commentViresh Kumar
Replace "diver" with "driver" in the comment describing CPUFREQ_NEED_UPDATE_LIMITS. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/396f64411431ffbb5b4f07d1f2e0bbf9763d468f.1779423281.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-05-22vfio/pci: Use a private flag to prevent power state change with VFsRaghavendra Rao Ananta
The current implementation uses pci_num_vf() while holding the memory_lock to prevent changing the power state of a PF when VFs are enabled. This creates a lockdep circular dependency warning because memory_lock is held during device probing. [ 286.997167] ====================================================== [ 287.003363] WARNING: possible circular locking dependency detected [ 287.009562] 7.0.0-dbg-DEV #3 Tainted: G S [ 287.015074] ------------------------------------------------------ [ 287.021270] vfio_pci_sriov_/18636 is trying to acquire lock: [ 287.026942] ff45bea2294d4968 (&vdev->memory_lock){+.+.}-{4:4}, at: vfio_pci_core_runtime_resume+0x1f/0xa0 [ 287.036530] [ 287.036530] but task is already holding lock: [ 287.042383] ff45bea3a96b8230 (&new_dev_set->lock){+.+.}-{4:4}, at: vfio_group_fops_unl_ioctl+0x44d/0x7b0 [ 287.051879] [ 287.051879] which lock already depends on the new lock. [ 287.051879] [ 287.060070] [ 287.060070] the existing dependency chain (in reverse order) is: [ 287.067568] [ 287.067568] -> #2 (&new_dev_set->lock){+.+.}-{4:4}: [ 287.073941] __mutex_lock+0x92/0xb80 [ 287.078058] vfio_assign_device_set+0x66/0x1b0 [ 287.083042] vfio_pci_core_register_device+0xd1/0x2a0 [ 287.088638] vfio_pci_probe+0xd2/0x100 [ 287.092933] local_pci_probe_callback+0x4d/0xa0 [ 287.098001] process_scheduled_works+0x2ca/0x680 [ 287.103158] worker_thread+0x1e8/0x2f0 [ 287.107452] kthread+0x10c/0x140 [ 287.111230] ret_from_fork+0x18e/0x360 [ 287.115519] ret_from_fork_asm+0x1a/0x30 [ 287.119983] [ 287.119983] -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}: [ 287.127219] __flush_work+0x345/0x490 [ 287.131429] pci_device_probe+0x2e3/0x490 [ 287.135979] really_probe+0x1f9/0x4e0 [ 287.140180] __driver_probe_device+0x77/0x100 [ 287.145079] driver_probe_device+0x1e/0x110 [ 287.149803] __device_attach_driver+0xe3/0x170 [ 287.154789] bus_for_each_drv+0x125/0x150 [ 287.159346] __device_attach+0xca/0x1a0 [ 287.163720] device_initial_probe+0x34/0x50 [ 287.168445] pci_bus_add_device+0x6e/0x90 [ 287.172995] pci_iov_add_virtfn+0x3c9/0x3e0 [ 287.177719] sriov_add_vfs+0x2c/0x60 [ 287.181838] sriov_enable+0x306/0x4a0 [ 287.186038] vfio_pci_core_sriov_configure+0x184/0x220 [ 287.191715] sriov_numvfs_store+0xd9/0x1c0 [ 287.196351] kernfs_fop_write_iter+0x13f/0x1d0 [ 287.201338] vfs_write+0x2be/0x3b0 [ 287.205286] ksys_write+0x73/0x100 [ 287.209233] do_syscall_64+0x14d/0x750 [ 287.213529] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 287.219120] [ 287.219120] -> #0 (&vdev->memory_lock){+.+.}-{4:4}: [ 287.225491] __lock_acquire+0x14c6/0x2800 [ 287.230048] lock_acquire+0xd3/0x2f0 [ 287.234168] down_write+0x3a/0xc0 [ 287.238019] vfio_pci_core_runtime_resume+0x1f/0xa0 [ 287.243436] __rpm_callback+0x8c/0x310 [ 287.247730] rpm_resume+0x529/0x6f0 [ 287.251765] __pm_runtime_resume+0x68/0x90 [ 287.256402] vfio_pci_core_enable+0x44/0x310 [ 287.261216] vfio_pci_open_device+0x1c/0x80 [ 287.265947] vfio_df_open+0x10f/0x150 [ 287.270148] vfio_group_fops_unl_ioctl+0x4a4/0x7b0 [ 287.275476] __se_sys_ioctl+0x71/0xc0 [ 287.279679] do_syscall_64+0x14d/0x750 [ 287.283975] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 287.289559] [ 287.289559] other info that might help us debug this: [ 287.289559] [ 287.297582] Chain exists of: [ 287.297582] &vdev->memory_lock --> (work_completion)(&arg.work) --> &new_dev_set->lock [ 287.297582] [ 287.310023] Possible unsafe locking scenario: [ 287.310023] [ 287.315961] CPU0 CPU1 [ 287.320510] ---- ---- [ 287.325059] lock(&new_dev_set->lock); [ 287.328917] lock((work_completion)(&arg.work)); [ 287.336153] lock(&new_dev_set->lock); [ 287.342523] lock(&vdev->memory_lock); [ 287.346382] [ 287.346382] *** DEADLOCK *** [ 287.346382] [ 287.352315] 2 locks held by vfio_pci_sriov_/18636: [ 287.357125] #0: ff45bea208ed3e18 (&group->group_lock){+.+.}-{4:4}, at: vfio_group_fops_unl_ioctl+0x3e3/0x7b0 [ 287.367048] #1: ff45bea3a96b8230 (&new_dev_set->lock){+.+.}-{4:4}, at: vfio_group_fops_unl_ioctl+0x44d/0x7b0 [ 287.376976] [ 287.376976] stack backtrace: [ 287.381353] CPU: 191 UID: 0 PID: 18636 Comm: vfio_pci_sriov_ Tainted: G S 7.0.0-dbg-DEV #3 PREEMPTLAZY [ 287.381355] Tainted: [S]=CPU_OUT_OF_SPEC [ 287.381356] Call Trace: [ 287.381357] <TASK> [ 287.381358] dump_stack_lvl+0x54/0x70 [ 287.381361] print_circular_bug+0x2e1/0x300 [ 287.381363] check_noncircular+0xf9/0x120 [ 287.381364] ? __lock_acquire+0x5b4/0x2800 [ 287.381366] __lock_acquire+0x14c6/0x2800 [ 287.381368] ? pci_mmcfg_read+0x4f/0x220 [ 287.381370] ? pci_mmcfg_write+0x57/0x220 [ 287.381371] ? lock_acquire+0xd3/0x2f0 [ 287.381373] ? pci_mmcfg_write+0x57/0x220 [ 287.381374] ? lock_release+0xef/0x360 [ 287.381376] ? vfio_pci_core_runtime_resume+0x1f/0xa0 [ 287.381377] lock_acquire+0xd3/0x2f0 [ 287.381378] ? vfio_pci_core_runtime_resume+0x1f/0xa0 [ 287.381379] ? lock_is_held_type+0x76/0x100 [ 287.381382] down_write+0x3a/0xc0 [ 287.381382] ? vfio_pci_core_runtime_resume+0x1f/0xa0 [ 287.381383] vfio_pci_core_runtime_resume+0x1f/0xa0 [ 287.381384] ? __pfx_pci_pm_runtime_resume+0x10/0x10 [ 287.381385] __rpm_callback+0x8c/0x310 [ 287.381386] ? ktime_get_mono_fast_ns+0x3d/0xb0 [ 287.381389] ? __pfx_pci_pm_runtime_resume+0x10/0x10 [ 287.381390] rpm_resume+0x529/0x6f0 [ 287.381392] ? lock_is_held_type+0x76/0x100 [ 287.381394] __pm_runtime_resume+0x68/0x90 [ 287.381396] vfio_pci_core_enable+0x44/0x310 [ 287.381398] vfio_pci_open_device+0x1c/0x80 [ 287.381399] vfio_df_open+0x10f/0x150 [ 287.381401] vfio_group_fops_unl_ioctl+0x4a4/0x7b0 [ 287.381402] __se_sys_ioctl+0x71/0xc0 [ 287.381404] do_syscall_64+0x14d/0x750 [ 287.381405] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 287.381406] ? trace_irq_disable+0x25/0xd0 [ 287.381409] entry_SYSCALL_64_after_hwframe+0x77/0x7f Introduce a private flag 'sriov_active' in the vfio_pci_core_device struct. This allows the driver to track the SR-IOV power state requirement without relying on pci_num_vf() while holding the memory_lock. The lock is now only held to set the flag and ensure the device is in D0, after which pci_enable_sriov() can be called without the lock. Fixes: f4162eb1e2fc ("vfio/pci: Change the PF power state to D0 before enabling VFs") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20260514173449.3282188-1-rananta@google.com [promote bitfield to plain bool to avoid storage-unit races] Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-05-22vfio/pci: Fix sparse warning in vfio_pci_core_get_iomap()Matt Evans
Use IOMEM_ERR_PTR() when returning a void __iomem * rather than ERR_PTR(). This fixes a sparse warning, "different address spaces". Fixes: 859dc0f6253b ("vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605211601.U1OvmuqY-lkp@intel.com/ Signed-off-by: Matt Evans <mattev@meta.com> Link: https://lore.kernel.org/r/20260522124215.3268565-1-mattev@meta.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-05-22blk-mq: introduce blk_rq_has_data()Caleb Sander Mateos
Add blk_rq_has_data(), an analogue of bio_has_data() for struct request. This skips one dereference relative to bio_has_data(rq->bio). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260513211846.1956810-2-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-22include: Remove unused crypto-ux500.hCosta Shulyupin
The UX500 crypto drivers were removed in commit 453de3eb08c4 ("crypto: ux500/cryp - delete driver") and commit dd7b7972cb89 ("crypto: ux500/hash - delete driver"). No file includes this header. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-05-22driver core: Allow the constification of device attributesThomas Weißschuh
Allow device attribute to reside in read-only memory. Both const and non-const attributes are handled by the utility macros and attributes can be migrated one-by-one. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-4-cb7c17b34d52@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22driver core: Stop using generic sysfs macros for device attributesThomas Weißschuh
The constification of device attributes will require a transition phase, where 'struct device_attribute' contains a classic non-const and a new const variant of the 'show' and 'store' callbacks. As __ATTR() and friends can not handle this duplication stop using them. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-3-cb7c17b34d52@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22driver core: Add low-level macros for device attributesThomas Weißschuh
For the upcoming constification of device attributes the generic __ATTR() macros are insufficient. Prepare for a split by introducing new low-level macros specific to device attributes. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-2-cb7c17b34d52@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22driver core: Delete DEVICE_ATTR_PREALLOC()Thomas Weißschuh
This macro is unused and would create extra work during the upcoming constification of device attributes. Remove it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-1-cb7c17b34d52@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22of: dynamic: Fix overlayed devices not probing because of fw_devlinkSaravana Kannan
When an overlay is applied, if the target device has already probed successfully and bound to a device, then some of the fw_devlink logic that ran when the device was probed needs to be rerun. This allows newly created dangling consumers of the overlayed device tree nodes to be moved to become consumers of the target device. [Herve: Add the call to driver_deferred_probe_trigger()] [Herve: Use fwnode_test_flag() to test fwnode flags value] Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays") Reported-by: Herve Codina <herve.codina@bootlin.com> Closes: https://lore.kernel.org/lkml/CAMuHMdXEnSD4rRJ-o90x4OprUacN_rJgyo8x6=9F9rZ+-KzjOg@mail.gmail.com/ Closes: https://lore.kernel.org/all/20240221095137.616d2aaa@bootlin.com/ Closes: https://lore.kernel.org/lkml/20240312151835.29ef62a0@bootlin.com/ Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/lkml/20240411235623.1260061-3-saravanak@google.com/ [Herve: Rebase on top of recent kernel] Signed-off-by: Herve Codina <herve.codina@bootlin.com> Tested-by: Kalle Niemi <kaleposti@gmail.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260511155755.34428-3-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22device core: make struct device_driver groups members constant arraysHeiner Kallweit
Constify the groups arrays, allowing to assign constant arrays. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/42624513-923c-4970-834d-036282e24e24@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22driver core: make struct bus_type groups members constant arraysHeiner Kallweit
Constify the groups arrays, allowing to assign constant arrays. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/265f6584-8edd-48a0-9568-a9d584b9ec3a@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22driver core: delete useless forward declaration of "struct class"Alexey Dobriyan
"struct class" is defined earlier on both cases. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://patch.msgid.link/6d5937c5-9d41-4cfe-9e42-0946e12dc72d@p183 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22netfilter: nf_conntrack_gre: fix gre keymap list corruptionFlorian Westphal
Quoting reporter: A race between GRE keymap insertion and destruction can corrupt the kernel list or use a freed object. `nf_ct_gre_keymap_add()` publishes a new keymap pointer before the embedded `list_head` is linked, while `nf_ct_gre_keymap_destroy()` can concurrently delete and free that same object. An unprivileged user can reach this through the PPTP conntrack helper by racing PPTP control messages or helper teardown, leading to KASAN-detectable list corruption/UAF in kernel context. ## Root Cause Analysis `exp_gre()` installs GRE expectations for a PPTP control flow and then adds two GRE keymap entries [..] The add path publishes `ct_pptp_info->keymap[dir]` before linking the embedded list node [..] Concurrent teardown deletes that partially initialized object. Make add/destroy symmetric: install both, destroy both while under lock. Furthermore, we should refuse to publish a new mapping in case ct is going away, else we may leak the allocation. The "retrans" detection is strange: existing mapping is checked for key equality with the new mapping, then for "is on the list" via list walk. But I can't see how an existing keymap entry can be NOT on list. Change this to only check if we're asked to map same tuple again -- if so, skip re-install, else signal failure. Last, add a bug trap for the keymap list; it has to be empty when namespace is going away. Reported-by: Leo Lin <leo@depthfirst.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-05-22device property: set fwnode->secondary to NULL in fwnode_init()Bartosz Golaszewski
If a firmware node is allocated on the stack (for instance: temporary software node whose life-time we control) or on the heap - but using a non-zeroing allocation function - and initialized using fwnode_init(), its secondary pointer will contain uninitalized memory which likely will be neither NULL nor IS_ERR() and so may end up being dereferenced (for example: in dev_to_swnode()). Set fwnode->secondary to NULL on initialization. Cc: stable <stable@kernel.org> Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22parport: Fix race between port and client registrationBen Hutchings
The parport subsystem registers port devices before they are fully initialised, resulting in a race condition where client drivers such as lp can attach to ports that are not completely initialised or even being torn down. When the port and client drivers are built as modules and loaded around the same time during boot, this occasionally results in a crash. I was able to make this happen reliably in a VM with a PC-style parallel port by patching parport_pc to fail probing: > --- a/drivers/parport/parport_pc.c > +++ b/drivers/parport/parport_pc.c > @@ -2069,7 +2069,7 @@ static struct parport *__parport_pc_probe_port(unsigned long int base, > if (!p) > goto out3; > > - base_res = request_region(base, 3, p->name); > + base_res = NULL; > if (!base_res) > goto out4; > and then running: while true; do modprobe lp & modprobe parport_pc wait rmmod lp parport_pc done for a few seconds. In the long term I think port registration should be changed to put the call to device_add() inside parport_announce_port(), but since the latter currently cannot fail this will require changing all port drivers. For now, add a flag to indicate whether a port has been "announced" and only try to attach client drivers to ports when the flag is set. Fixes: 6fa45a226897 ("parport: add device-model to parport subsystem") Closes: https://bugs.debian.org/1130365 Closes: https://lore.kernel.org/all/6ba903ad-9897-42bb-8c2d-337385cc3746@molgen.mpg.de/ Cc: stable <stable@kernel.org> Signed-off-by: Ben Hutchings <benh@debian.org> Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://patch.msgid.link/afo6uBv68GDevbMD@decadent.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22writeback: use a per-sb counter to drain inode wb switches at umountBaokun Li
Tracking in-flight inode wb switches with a single global counter (isw_nr_in_flight) plus a synchronize_rcu() based wait in cgroup_writeback_umount() forces every umount to take a global hit whenever any other superblock on the system has wb switches in flight, even if the superblock being unmounted has none of its own. Replace the global synchronize_rcu()/flush_workqueue() pair with a per-sb counter, s_isw_nr_in_flight, plus three small helpers: - cgroup_writeback_pin(sb) - increment counter - cgroup_writeback_unpin(sb) - decrement and wake drainer if last - cgroup_writeback_drain(sb) - wait for counter to reach zero The wiring is: - inode_prepare_wbs_switch() pins before checking SB_ACTIVE and grabbing the inode; failure paths unpin before returning. A lockless SB_ACTIVE check at the top of the function lets us skip the atomic_inc/smp_mb dance once SB_ACTIVE has been cleared (it is monotonic and never set back). - process_inode_switch_wbs() unpins after the matching iput(). - cgroup_writeback_umount() drains the per-sb counter via wait_var_event(). The smp_mb() pair between inode_prepare_wbs_switch() and cgroup_writeback_umount() keeps the SB_ACTIVE / counter ordering: either the umounter sees a non-zero counter and waits, or the switcher sees SB_ACTIVE cleared and aborts before grabbing the inode. The global isw_nr_in_flight is left in place, since it is still used to throttle in-flight switches via WB_FRN_MAX_IN_FLIGHT. The rcu_read_lock() extension in inode_switch_wbs() and cleanup_offline_cgwb() that the race fix added is no longer needed and is reverted; the synchronize_rcu() that the race fix added to cgroup_writeback_umount() is dropped as well. The following numbers were measured on a 16 vCPU QEMU guest with 4 background superblocks each churning "create memcg -> write 1 MiB -> rmdir memcg" to keep the global isw_nr_in_flight non-zero. Latencies are wall-clock around umount(8); only the target sb's umount is measured. Target sb runs its own cgwb churn: p50 p95 p99 max global synchronize_rcu() 67.6 ms 88.3 ms 88.3 ms 96.8 ms per-sb counter (this) 7.9 ms 10.0 ms 10.0 ms 10.1 ms Idle target umount latency under cross-sb cgwb-switch pressure: p50 p95 p99 max global synchronize_rcu() 62.7 ms 95.4 ms 108.1 ms 108.6 ms per-sb counter (this) 5.3 ms 6.9 ms 7.4 ms 7.4 ms no-pressure baseline 4.9 ms 5.9 ms 6.3 ms 6.7 ms 8 concurrent umounts of idle sbs under the same pressure: p50 p95 max global synchronize_rcu() 61.3 ms 99.5 ms 113.7 ms per-sb counter (this) 8.1 ms 9.1 ms 9.5 ms In-kernel cgroup_writeback_umount() time across the same run (bpftrace, ~340 calls covering all scenarios): global synchronize_rcu() 12371 ms total (~36 ms / call) per-sb counter (this) 1.37 ms total ( ~4 us / call) Suggested-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/177910456953.488929.2169908940676707307.b4-review@b4 Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260521095016.2791354-4-libaokun@linux.alibaba.com Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-22serial: core: introduce guard(uart_port_lock_check_sysrq_irqsave)Jacques Nilo
uart_handle_break() and uart_prepare_sysrq_char() (in include/linux/serial_core.h) capture a SysRq character into port->sysrq_ch while the port lock is held and rely on the unlock helper -- uart_unlock_and_check_sysrq_irqrestore() -- to dispatch the captured character to handle_sysrq() on scope exit. The existing guard(uart_port_lock_irqsave) cannot be used by IRQ handlers that process RX, because its destructor calls plain uart_port_unlock_irqrestore() and silently drops port->sysrq_ch. Add a dedicated guard(uart_port_lock_check_sysrq_irqsave) variant whose destructor is the sysrq-aware unlock helper. The lock side is identical to uart_port_lock_irqsave -- only the unlock-time behaviour differs. Callers that may capture SysRq characters must use guard(uart_port_lock_check_sysrq_irqsave); the existing guard(uart_port_lock_irqsave) keeps its current plain-unlock semantics for the many callers that do not process RX. The new macro is placed after the CONFIG_MAGIC_SYSRQ_SERIAL block so both definitions of uart_unlock_and_check_sysrq_irqrestore() (sysrq enabled and disabled) are visible at expansion time. When CONFIG_MAGIC_SYSRQ_SERIAL=n the destructor degenerates to plain uart_port_unlock_irqrestore(), so there is no overhead. No functional change on its own; users are converted in the following patches. Cc: stable@vger.kernel.org Signed-off-by: Jacques Nilo <jnilo@free.fr> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/3849af4bc55d5d2a424fa850844e94d641b2f8a6.1778675349.git.jnilo@free.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22vt: merge ucs_is_zero_width()/ucs_is_double_width() into ucs_get_width()Nicolas Pitre
The hot path in vc_process_ucs() asks two independent questions about the same code point -- "is it double-width?" and "is it zero-width?" -- and was answering each with its own bsearch over its own table. For anything past the leading bounds check that meant two scans of the BMP width tables back to back for what is logically a single lookup. Replace both with one ucs_get_width(cp) returning 0, 1, or 2 in a single bsearch, while keeping the total table footprint at the same 2384 B as before. To do so, merge the zero-width and double-width ranges per region into one sorted-by-`first` table. BMP entries stay 4 bytes; per-entry width is hosted in spare bits of the non-BMP table's `last` field. Non-BMP code points use only 20 of 32 bits, so each u32 has 12 unused high bits. Store first/last shifted left by 12 and use the low 12 bits of `last` for metadata: bit 11 is this entry's own width flag, bits 0..7 host an 8-bit chunk of the BMP double-width bitmap. Because the metadata bits sit strictly below the lowest cp-scale bit, the bsearch comparator remains a plain u32 compare on shifted keys with no masking. In vc_process_ucs() the overwhelmingly common single-width path now collapses to a single predicted branch: if (likely(w == 1)) return 1; Note: scripts/checkpatch.pl complains about "Macros with complex values should be enclosed in parentheses" for the BMP_*WIDTH and RANGE_*WIDTH macros. They are deliberately defined to expand to a comma-separated (first, last) pair so they can populate the two adjacent fields of a struct initializer; wrapping them in parentheses would turn that into a comma-expression and defeat the whole construction. Please ignore. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Link: https://patch.msgid.link/20260515034857.2514225-1-nico@fluxnic.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22serial: 8250: Add support for console flow controlJohn Ogness
The kernel documentation specifies that the console option 'r' can be used to enable hardware flow control for console writes. The 8250 driver does include code for hardware flow control on the console if cons_flow is set, but there is no code path that actually sets this. However, that is not the only issue. The problems are: 1. Specifying the console option 'r' does not lead to cons_flow being set. 2. Even if cons_flow would be set, serial8250_register_8250_port() clears it. 3. When the console option 'r' is specified, uart_set_options() attempts to initialize the port for CRTSCTS. However, afterwards it does not set the UPSTAT_CTS_ENABLE status bit and therefore on boot, uart_cts_enabled() is always false. This policy bit is important for console drivers as a criteria if they may poll CTS. 4. Even though uart_set_options() attempts to initialize the port for CRTSCTS, the 8250 set_termios() callback does not enable the RTS signal (TIOCM_RTS) and thus the hardware is not properly initialized for CTS polling. 5. Even if modem control was properly setup for CTS polling (TIOCM_RTS), uart_configure_port() clears TIOCM_RTS, thus breaking CTS polling. 6. wait_for_xmitr() and serial8250_console_write() use cons_flow to decide if CTS polling should occur. However, the condition should also include a check that it is not in RS485 mode and CRTSCTS is actually enabled in the hardware. Address all these issues as conservatively as possible by gating them behind checks focussed on the user specifying console hardware flow control support and the hardware being configured for CTS polling at the time of the write to the UART. Since checking the UPSTAT_CTS_ENABLE status bit is a part of the new condition gate, these changes also support runtime termios updates to disable/enable CRTSCTS. Signed-off-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20260511152706.151498-4-john.ogness@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-21Merge branch '20260227061544.1785978-1-praveen.talari@oss.qualcomm.com' into ↵Bjorn Andersson
drivers-for-7.2 Merge the refactoring and helper functions in the Qualcomm GENI Serial Engine driver through a topic branch. These changes will provide the ability to add support managing power and performance for the GENI instances in platforms where these are controlled as SCMI resources. The patches are merged through a topic branch to avoid conflicts with other changes, while making them available to other subsystems. Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21soc: qcom: geni-se: Introduce helper APIs for performance controlPraveen Talari
The GENI Serial Engine (SE) drivers (I2C, SPI, and SERIAL) currently manage performance levels and operating points directly. This resulting in code duplication across drivers. such as configuring a specific level or find and apply an OPP based on a clock frequency. Introduce two new helper APIs, geni_se_set_perf_level() and geni_se_set_perf_opp(), addresses this issue by providing a streamlined method for the GENI Serial Engine (SE) drivers to find and set the OPP based on the desired performance level, thereby eliminating redundancy. Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org> Link: https://lore.kernel.org/r/20260227061544.1785978-8-praveen.talari@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21soc: qcom: geni-se: Introduce helper API for attaching power domainsPraveen Talari
The GENI Serial Engine drivers (I2C, SPI, and SERIAL) currently handle the attachment of power domains. This often leads to duplicated code logic across different driver probe functions. Introduce a new helper API, geni_se_domain_attach(), to centralize the logic for attaching "power" and "perf" domains to the GENI SE device. Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org> Link: https://lore.kernel.org/r/20260227061544.1785978-7-praveen.talari@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21soc: qcom: geni-se: Add resources activation/deactivation helpersPraveen Talari
The GENI SE protocol drivers (I2C, SPI, UART) implement similar resource activation/deactivation sequences independently, leading to code duplication. Introduce geni_se_resources_activate()/geni_se_resources_deactivate() to power on/off resources.The activate function enables ICC, clocks, and TLMM whereas the deactivate function disables resources in reverse order including OPP rate reset, clocks, ICC and TLMM. Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org> Link: https://lore.kernel.org/r/20260227061544.1785978-6-praveen.talari@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21soc: qcom: geni-se: Introduce helper API for resource initializationPraveen Talari
The GENI Serial Engine drivers (I2C, SPI, and SERIAL) currently duplicate code for initializing shared resources such as clocks and interconnect paths. Introduce a new helper API, geni_se_resources_init(), to centralize this initialization logic, improving modularity and simplifying the probe function. Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com> Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org> Link: https://lore.kernel.org/r/20260227061544.1785978-4-praveen.talari@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21soc: qcom: geni-se: Add geni_icc_set_bw_ab() functionPraveen Talari
Add a new function geni_icc_set_bw_ab() that allows callers to set average bandwidth values for all ICC (Interconnect) paths in a single call. This function takes separate parameters for core, config, and DDR average bandwidth values and applies them to the respective ICC paths. This provides a more convenient API for drivers that need to configure specific average bandwidth values. Co-developed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com> Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org> Link: https://lore.kernel.org/r/20260227061544.1785978-3-praveen.talari@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21Revert "mm: introduce a new page type for page pool in page type"Byungchul Park
This reverts commit db359fccf212 ("mm: introduce a new page type for page pool in page type") and a part of 735a309b4bfb9e ("net: add net_iov_init() and use it to initialize ->page_type"). Netpp page_type'ed pages might be used in mapping so as to use @_mapcount. However, since @page_type and @_mapcount are union'ed in struct page, these two can't be used at the same time. Revert the commit introducing page_type for Netpp for now. The patch will be retried once @page_type and @_mapcount get allowed to be used at the same time. The revert also includes removal of @page_type initialization part introduced by commit 735a309b4bfb9e ("net: add net_iov_init() and use it to initialize ->page_type"), which will be restored on the retry. Link: https://lore.kernel.org/20260515034701.17027-1-byungchul@sk.com Fixes: db359fccf212 ("mm: introduce a new page type for page pool in page type") Signed-off-by: Byungchul Park <byungchul@sk.com> Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Closes: https://lore.kernel.org/all/982b9bc1-0a0a-4fc5-8e3a-3672db2b29a1@nvidia.com Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Mark Bloch <mbloch@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Toke Hoiland-Jorgensen <toke@redhat.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-21Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare"Lorenzo Stoakes
This reverts commit ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare") with conflict resolution to account for changes in commit ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare"). The patch incorrectly handled hugetlb VMA lock allocation at the mmap_prepare stage, where a failed allocation occurring after mmap_prepare is called might result in the lock leaking. There is no risk of a merge causing a similar issues, as VMA_DONTEXPAND_BIT is set for hugetlb mappings. As a first step in addressing this issue, simply revert the change so we can rework how we do this having corrected the underlying issues. We maintain the VMA flags changes as best we can, accounting for the fact that we were working with a VMA descriptor previously and propagating like-for-like changes for this. Note that we invoke vma_set_flags() and do not call vma_start_write() as vm_flags_set() does. This is OK as it's being done in an .mmap hook where the VMA is not yet linked into the tree so nobody else can be accessing it. Link: https://lore.kernel.org/20260512160643.266960-1-ljs@kernel.org Fixes: ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reported-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Closes: https://lore.kernel.org/linux-mm/20260425070700.562229-1-25181214217@stu.xidian.edu.cn/ Acked-by: Muchun Song <muchun.song@linux.dev> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.1-rc5). No conflicts, adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c cc199cd1b912 ("net/mlx5e: Reduce branches in napi poll") c326f9c68921 ("net/mlx5e: xsk: Fix unlocked writing to ICOSQ") drivers/net/ethernet/mellanox/mlx5/core/eswitch.c c6df9a65cbb0 ("net/mlx5: Skip disabled vports when setting max TX speed") 1fba57c91416 ("net/mlx5: Add VHCA_ID page management mode support") net/mac80211/mlme.c a6e6ccd5bd07 ("wifi: mac80211: consume only present negotiated TTLM maps") 49e62ec6eb06 ("wifi: mac80211: move frame RX handling to type files") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21tracefs: Fix typo in a comment of eventfs_callback() kerneldocMartin Kaiser
Fix a typo "evetnfs files" to "eventfs files" in a comment. Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://patch.msgid.link/20260507081041.885781-2-martin@kaiser.cx Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21tracing: Remove local variable for argument detection from trace_printk()Qian-Yu Lin
The trace_printk() macro uses a local variable _______STR to detect whether variadic arguments are present. This name can shadow outer variables. Replace the local variable with sizeof applied directly to the stringified arguments: if (sizeof __stringify((__VA_ARGS__)) > 3) This eliminates the shadowing risk entirely without introducing any additional includes or local variables. Verified with objdump on samples/trace_printk that all four cases branch correctly: __trace_bputs, __trace_puts, __trace_bprintk, and __trace_printk. Link: https://patch.msgid.link/20260502075535.34997-1-tiffany019230@gmail.com Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Qian-Yu Lin <tiffany019230@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21tracepoint: Add lockdep rcu_is_watching() check to trace_##name##_enabled()David Carlier
The trace_##name##_enabled() static call branch is used when work needs to be done for a tracepoint. It allows that work to be skipped when the tracepoint is not active and still uses the static_branch() of the tracepoint to keep performance. Tracepoints themselves require being called in "RCU watching" locations otherwise races can occur that corrupts things. In order to make sure lockdep triggers at tracepoint locations, the lockdep checks are added to the tracepoint calling location and trigger even if the tracepoint is not enabled. This is done because a poorly placed tracepoint may never be detected if it is never enabled when lockdep is enabled. As trace_##name##_enabled() also prevents the lockdep checks when the tracepoint is disabled add lockdep checks to that as well so that if one is placed in a location that RCU is not watching, it will trigger a lockdep splat even when the tracepoint is not enabled. Cc: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://patch.msgid.link/20260430144159.10985-1-devnexen@gmail.com Signed-off-by: David Carlier <devnexen@gmail.com> [ Updated the change log ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21Merge tag 'net-7.1-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth, wireless and netfilter. Craziness continues with no end in sight. Even discounting the driver revert this is a pretty huge PR for standards of the previous era. I'd speculate - we haven't seen the worst of it, yet. Good news, I guess, is that so far we haven't seen many (any?) cases of "AI reported a bug, we fixed it and a real user regressed". Current release - fix to a fix: - Bluetooth: btmtk: accept too short WMT FUNC_CTRL events - vsock/virtio: relax the recently added memory limit a little Current release - regressions: - IB/IPoIB: make sure IB drivers always use async set_rx_mode since some (mlx5) are now required to use it due to locking changes Previous releases - regressions: - udp: fix UDP length on last GSO_PARTIAL segment - af_unix: fix UAF read of tail->len in unix_stream_data_wait() - tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction - mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP Previous releases - always broken: - tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR - ipv4: raw: reject IP_HDRINCL packets with ihl < 5 - Bluetooth: a lot of locking and concurrency fixes (as always) - batman-adv (mesh wireless networking): a lot of random fixes for issues reported by security researchers and Sashiko - netfilter: same thing, a lot of small security-ish fixes all over the place, nothing really stands out Misc: - bring back the old 3c509 driver, Maciej wants to maintain it" * tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (187 commits) net: enetc: avoid VF->PF mailbox timeout during SR-IOV teardown net: enetc: fix init and teardown order to prevent use of unsafe resources net: enetc: fix unbounded loop and interrupt handling in VF-to-PF messaging net: enetc: fix DMA write to freed memory in enetc_msg_free_mbx() net: enetc: fix race condition in VF MAC address configuration net: enetc: fix TOCTOU race and validate VF MAC address net: enetc: add ratelimiting to VF mailbox error messages net: enetc: fix missing error code when pf->vf_state allocation fails net: enetc: fix incorrect mailbox message status returned to VFs net: bridge: prevent too big nested attributes in br_fill_linkxstats() l2tp: use list_del_rcu in l2tp_session_unhash net: bcmgenet: keep RBUF EEE/PM disabled ethernet: 3c509: Fix most coding style issues ethernet: 3c509: Update documentation to match MAINTAINERS ethernet: 3c509: Add GPL 2.0 SPDX license identifier ethernet: 3c509: Fix AUI transceiver type selection Revert "drivers: net: 3com: 3c509: Remove this driver" tools: ynl: support listening on all nsids net: gro: don't merge zcopy skbs pds_core: ensure null-termination for firmware version strings ...
2026-05-21Merge tag 'efi-fixes-for-v7.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Permit ACPI PRM runtime firmware calls when acpi_init() runs - Add another Lenovo Ideapad framebuffer quirk - Cosmetic tweak * tag 'efi-fixes-for-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: sysfb_efi: Extend quirk to cover IdeaPad Duet 3 10IGL5-LTE efi: efi.h: Remove extra semicolon efi: Allocate runtime workqueue before ACPI init
2026-05-21usb: xhci-pci: add AMD Promontory 21 PCI glueJihong Min
AMD Promontory 21 (PROM21) xHCI PCI functions use the common xhci-pci core for USB operation, but also expose controller-specific sensor data. Add a small PROM21 PCI glue driver for AMD 1022:43fc and 1022:43fd controllers. The glue delegates USB host operation to the common xhci-pci core and publishes a "hwmon" auxiliary device with parent-provided MMIO data. Auxiliary device creation failure is logged but does not fail the xHCI probe. Make the PROM21 glue a hidden Kconfig tristate driven by the user-visible SENSORS_PROM21_XHCI option. If sensor support is disabled, generic xhci-pci binds PROM21 controllers normally. If sensor support is enabled, the glue follows USB_XHCI_PCI. This keeps the auxiliary device available for a modular sensor driver while avoiding a built-in xhci-pci core handing PROM21 controllers to a glue driver that is only available as a module during initramfs. Assisted-by: Codex:gpt-5.5 Signed-off-by: Jihong Min <hurryman2212@gmail.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: Yaroslav Isakov <yaroslav.isakov@gmail.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20260519000732.2334711-2-hurryman2212@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-21HID: remove duplicate hid_warn_ratelimited definitionLiu Kai
The hid_warn_ratelimited macro is defined twice in include/linux/hid.h: - first one added by commit 4051ead99888 ("HID: rate-limit hid_warn to prevent log flooding") - second one added by commit 1d64624243af ("HID: core: Add printk_ratelimited variants to hid_warn() etc")). The second definition is correctly grouped with other ratelimited macros. Remove the duplicate definition. Fixes: 1d64624243af ("HID: core: Add printk_ratelimited variants to hid_warn() etc") Signed-off-by: Liu Kai <lukace97@outlook.com> [bentiss: edited commit message] Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2026-05-21openat2: new OPENAT2_REGULAR flag supportDorjoy Chowdhury
This flag indicates the path should be opened if it's a regular file. This is useful to write secure programs that want to avoid being tricked into opening device nodes with special semantics while thinking they operate on regular files. This is a requested feature from the uapi-group[1]. The previously introduced EFTYPE error code is returned when the path doesn't refer to a regular file. For example, if openat2 is called on path /dev/null with OPENAT2_REGULAR in the flag param, it will return -EFTYPE. When used in combination with O_CREAT, either the regular file is created, or if the path already exists, it is opened if it's a regular file. Otherwise, -EFTYPE is returned. When OPENAT2_REGULAR is combined with O_DIRECTORY, -EINVAL is returned as it doesn't make sense to open a path that is both a directory and a regular file. The UAPI bit lives in the upper 32 bits of open_how::flags (((__u64)1 << 32)) so that open(2) and openat(2) -- whose @flags argument is a C int -- cannot physically express it. This is a structural guarantee, not a runtime mask: the bit is unrepresentable in 32 bits. Because the rest of the VFS open path narrows to 32 bits in several places (op->open_flag, f->f_flags, the unsigned open_flag argument of i_op->atomic_open()), build_open_flags() translates OPENAT2_REGULAR into a kernel-internal lower-32-bit carrier __O_REGULAR (bit 4, unused as an O_* on every architecture) before the assignment to op->open_flag. __O_REGULAR then rides through the existing channels exactly like __FMODE_EXEC. do_dentry_open() strips it so it cannot leak back to userspace via fcntl(F_GETFL). Four BUILD_BUG_ON_MSG() invariants in build_open_flags() prevent any future bit collision or accidental low-32 redefinition: - VALID_OPEN_FLAGS fits in 32 bits. - OPENAT2_REGULAR lives in the upper 32 bits. - OPENAT2_REGULAR does not alias any open()/openat() flag. - __O_REGULAR does not alias any user-visible flag. [1]: https://uapi-group.org/kernel-features/#ability-to-only-open-regular-files Christian Brauner <brauner@kernel.org> says: Move OPENAT2_REGULAR to the upper 32 bits of open_how::flags with a kernel-internal __O_REGULAR carrier so that open(2)/openat(2) cannot encode the flag; add BUILD_BUG_ON_MSG() invariants and register __O_REGULAR in the fcntl_init() allocation-uniqueness BUILD_BUG_ON() (bit count 21 -> 22). Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com> Link: https://patch.msgid.link/20260328172314.45807-2-dorjoychy111@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Aleksa Sarai <aleksa@amutable.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>