| Age | Commit message (Collapse) | Author |
|
This new torture_sched_set_normal() function clamps the nice value at
the MIN_NICE..MAX_NICE limits, splatting it these limits are exceeded.
It then invokes sched_set_normal() to set the new value. This prevents
more difficult-to-debug failures within the scheduler.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Remove the software node on platform device release(); without this,
the software node remains registered after the device is gone and a
subsequent platform_device_register_full() reusing the same node
fails with -EBUSY
- In sysfs_update_group(), do not remove a pre-existing directory when
create_files() fails; the previous code would silently destroy a
sysfs group that the caller did not create
- Set fwnode->secondary to NULL in fwnode_init() to avoid dereferencing
uninitialized memory (e.g. in dev_to_swnode()) when the firmware node
is allocated on the stack or via a non-zeroing allocator
* tag 'driver-core-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
device property: set fwnode->secondary to NULL in fwnode_init()
sysfs: don't remove existing directory on update failure
driver core: platform: remove software node on release()
|
|
suspend
Per PCIe v7.0, sec 5.5.3.3.1, when exiting L1.2 due to an endpoint
asserting CLKREQ# signal, the REFCLK must be turned on within the latency
advertised in the LTR message. This requirement applies to L1.1 as well.
On some platforms like Qcom, these requirements are satisfied during OS
runtime, but not while resuming from the system suspend. This happens
because the PCIe RC driver may remove all resource votes and turn off the
PHY analog circuitry during suspend to maximize power savings while keeping
the link in L1SS.
Consequently, when the endpoint asserts CLKREQ# to wake up, the RC driver
must restore the PHY and enable the REFCLK. When this recovery process
exceeds the L1SS exit latency time (roughly L10_REFCLK_ON + T_COMMONMODE),
the endpoint may treat it as a fatal condition and trigger Link Down (LDn).
This results in a reset that destroys the internal device state.
So to indicate this platform limitation to the client drivers, introduce a
new flag 'pci_host_bridge::broken_l1ss_resume' and check it in
pci_suspend_retains_context(). If the flag is set by the RC driver, the API
will return 'false' indicating the client drivers that the device context
may not be retained and the drivers must be prepared for context loss.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260519-l1ss-fix-v2-2-b2c3a4bdeb15@oss.qualcomm.com
|
|
The dtlk driver supports the RC Systems DoubleTalk PC ISA speech
synthesizer card. It has severe coding style issues and has only
received tree-wide fixes and drive-by cleanups in the entire Git
history (since Linux 2.6.12-rc2). The same hardware is supported by
drivers/accessibility/speakup for screen reader use, but that
implementation does not share any code with this driver. Given all of
these factors, it is likely the driver is entirely unused. Remove it to
reduce future maintenance workload.
Note: The removed maintainer is already listed in CREDITS.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260502043341.34324-1-enelsonmoore@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
during suspend
Currently, PCI endpoint drivers (e.g. nvme) use pm_suspend_via_firmware()
to check whether device state is preserved during system suspend. If
firmware will be invoked at the end of suspend, we don't know whether
devices will retain their internal state.
But device context might be lost due to platform issues as well. Having
those checks in endpoint drivers will not scale and will cause a lot of
code duplication.
Add pci_suspend_retains_context() as a sole point of truth that the
endpoint drivers can rely on to check whether they can expect the device
context to be retained or not.
If pci_suspend_retains_context() returns 'false', drivers need to prepare
for context loss by performing actions such as resetting the device, saving
the context, shutting it down etc. If it returns 'true', drivers do not
need to perform any special action and can leave the device in active
state.
Right now, this API only incorporates pm_suspend_via_firmware(), but will
be extended in future commits.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260519-l1ss-fix-v2-1-b2c3a4bdeb15@oss.qualcomm.com
|
|
struct bpf_arena is opaque to callers outside arena.c. Add two helpers
for struct_ops subsystems that need to reach into an arena:
bpf_arena_map_kern_vm_start(struct bpf_map *map)
returns @map's kern_vm_start. A sched_ext follow-up needs this
to translate kern_va <-> uaddr.
bpf_prog_arena(struct bpf_prog *prog)
returns the bpf_map of the arena referenced by @prog (NULL if
@prog references no arena). The verifier enforces at most one
arena per program. Used by struct_ops callers that auto-discover
an arena from a member prog and need to take a map reference.
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260522172219.1423324-6-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a helper that walks the member progs of the struct_ops map
containing a given @kdata vmtable. struct_ops ->reg() callbacks (and
similar) sometimes need to inspect the loaded BPF programs, e.g. to
discover maps they reference via prog->aux->used_maps.
The implementation mirrors bpf_struct_ops_id(): container_of @kdata
to recover the bpf_struct_ops_map, then iterate st_map->links[i]->prog
for i in [0, funcs_cnt). Same access pattern, no new locking - by the
time ->reg() fires st_map is fully populated and stable.
A sched_ext follow-up walks the member progs of a cid-form scheduler's
struct_ops map, reads prog->aux->arena directly, and requires all member
progs to reference exactly one arena, without requiring the BPF program
to call a registration kfunc.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260522172219.1423324-5-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The existing kernel-side export of bpf_arena_alloc_pages is _non_sleepable
only - it's used by the verifier to inline the kfunc when the call site is
non-sleepable. There is no sleepable equivalent for kernel callers. The
kfunc bpf_arena_alloc_pages itself is BPF-only.
sched_ext needs sleepable kernel-side allocs for its arena pool init/grow
paths. Add bpf_arena_alloc_pages_sleepable() mirroring the _non_sleepable
wrapper but passing sleepable=true to arena_alloc_pages().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260522172219.1423324-4-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
BPF arena usage is becoming more prevalent, but kernel <-> BPF communication
over arena memory is awkward today. Data has to be staged through a trusted
kernel pointer with extra code and copying on the BPF side. While reads
through arena pointers can use a fault-safe helper, writes don't have a good
solution. The in-line alternative would need instruction emulation or asm
fixup labels.
Enable direct kernel-side reads and writes within GUARD_SZ / 2 of any
handed-in arena pointer, without bounds checking. A per-arena scratch page
is installed by the arch fault path into empty arena kernel PTEs - x86 from
page_fault_oops() for not-present faults, arm64 from __do_kernel_fault() for
translation faults, both after the existing exception-table and KFENCE
handling. The faulting instruction retries and the access is also reported
through the program's BPF stream, preserving error reporting.
bpf_prog_find_from_stack() resolves the current BPF program (and its arena)
from the kernel stack - no new bpf_run_ctx state is added. Recovery covers
the 4 GiB arena plus the upper half-guard (GUARD_SZ / 2). The lower
half-guard is excluded because well-behaved kfuncs only access forward from
arena pointers. The kfunc-author contract - access at most GUARD_SZ / 2 past
a handed-in pointer - is documented in Documentation/bpf/kfuncs.rst.
The install is lock-free via ptep_try_set(). On race-loss the winning
installer's PTE is already valid, so the access retry succeeds. The arena
clear path uses ptep_get_and_clear() so installer and clearer race through
atomic accessors. No flush_tlb_kernel_range() afterwards. Stale "not mapped"
entries just cause one extra re-fault, cheaper than a global IPI on every
install.
Scratch exists only to keep the kernel from oopsing on an in-line arena
access. Its presence at a PTE means the BPF program has already
malfunctioned, and the violation is reported through the program's BPF
stream. The only requirement for behavior on a scratched PTE is that the
kernel doesn't crash. In particular, any user-side access through such a PTE
may segfault. The shared scratch page is freed once during map destruction.
BPF instruction faults continue to use the existing JIT exception-table
path. This patch changes only the kernel-text fault path. No UAPI flag is
added. The new behavior is the default.
v2: Use ptep_get_and_clear() in apply_range_clear_cb(). (David)
v3: Stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL. (lkp)
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Cc: David Hildenbrand <david@kernel.org>
Link: https://lore.kernel.org/r/20260522172219.1423324-3-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is
currently pte_none(). Returns true on success, false if the slot was already
populated or the arch has no implementation.
The intended caller is the upcoming bpf_arena kernel-side fault recovery
path. The install runs from a page fault that can be nested under locks
held by the faulting kernel caller (e.g. a BPF program holding
raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry
would A-A deadlock. Lock-free cmpxchg is the only viable option, which
constrains this helper to special kernel page tables where concurrent
writers cooperate via atomic accessors.
The generic version in <linux/pgtable.h> returns false. x86 and arm64
override with try_cmpxchg-based implementations on the underlying pteval.
Other architectures get the false stub - the callers there already fall
through to oops.
v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei)
v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea)
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Acked-by: David Hildenbrand (arm) <david@kernel.org>
Link: https://lore.kernel.org/r/20260522172219.1423324-2-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We plan to no longer hold RTNL in "ip link show", and use RCU instead.
Assume rtnl_fill_dpll_pin() will have to fill DPLL_A_PIN_ID.
It is fine to over-estimate skb size (by 8 bytes) in if_nlmsg_size().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260521171440.114956-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Two rstat fixes:
- Out-of-bounds access in the css_rstat_updated() BPF kfunc when
called with an unchecked user-supplied cpu
- Over-strict NMI guard after the recent switch to try_cmpxchg left
sparc and ppc64 unable to queue rstat updates from NMI"
* tag 'cgroup-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: rstat: relax NMI guard after switch to try_cmpxchg
cgroup/rstat: validate cpu before css_rstat_cpu() access
|
|
Drivers setting CPUFREQ_NEED_UPDATE_LIMITS expect target() to be
invoked even if the target frequency remains unchanged, so they can
update their internal policy limits state.
Currently the core invokes target() unconditionally whenever the
requested frequency matches policy->cur for such drivers, even if
policy->min and policy->max haven't changed since the previous update.
Track pending policy limit updates explicitly and skip redundant
target() invocations when neither the target frequency nor the
effective limits changed.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Link: https://patch.msgid.link/d0107c364b709abca21acf88072220bc05478594.1779423281.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Replace "diver" with "driver" in the comment describing
CPUFREQ_NEED_UPDATE_LIMITS.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Link: https://patch.msgid.link/396f64411431ffbb5b4f07d1f2e0bbf9763d468f.1779423281.git.viresh.kumar@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The current implementation uses pci_num_vf() while holding the
memory_lock to prevent changing the power state of a PF when
VFs are enabled. This creates a lockdep circular dependency
warning because memory_lock is held during device probing.
[ 286.997167] ======================================================
[ 287.003363] WARNING: possible circular locking dependency detected
[ 287.009562] 7.0.0-dbg-DEV #3 Tainted: G S
[ 287.015074] ------------------------------------------------------
[ 287.021270] vfio_pci_sriov_/18636 is trying to acquire lock:
[ 287.026942] ff45bea2294d4968 (&vdev->memory_lock){+.+.}-{4:4}, at:
vfio_pci_core_runtime_resume+0x1f/0xa0
[ 287.036530]
[ 287.036530] but task is already holding lock:
[ 287.042383] ff45bea3a96b8230 (&new_dev_set->lock){+.+.}-{4:4}, at:
vfio_group_fops_unl_ioctl+0x44d/0x7b0
[ 287.051879]
[ 287.051879] which lock already depends on the new lock.
[ 287.051879]
[ 287.060070]
[ 287.060070] the existing dependency chain (in reverse order) is:
[ 287.067568]
[ 287.067568] -> #2 (&new_dev_set->lock){+.+.}-{4:4}:
[ 287.073941] __mutex_lock+0x92/0xb80
[ 287.078058] vfio_assign_device_set+0x66/0x1b0
[ 287.083042] vfio_pci_core_register_device+0xd1/0x2a0
[ 287.088638] vfio_pci_probe+0xd2/0x100
[ 287.092933] local_pci_probe_callback+0x4d/0xa0
[ 287.098001] process_scheduled_works+0x2ca/0x680
[ 287.103158] worker_thread+0x1e8/0x2f0
[ 287.107452] kthread+0x10c/0x140
[ 287.111230] ret_from_fork+0x18e/0x360
[ 287.115519] ret_from_fork_asm+0x1a/0x30
[ 287.119983]
[ 287.119983] -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}:
[ 287.127219] __flush_work+0x345/0x490
[ 287.131429] pci_device_probe+0x2e3/0x490
[ 287.135979] really_probe+0x1f9/0x4e0
[ 287.140180] __driver_probe_device+0x77/0x100
[ 287.145079] driver_probe_device+0x1e/0x110
[ 287.149803] __device_attach_driver+0xe3/0x170
[ 287.154789] bus_for_each_drv+0x125/0x150
[ 287.159346] __device_attach+0xca/0x1a0
[ 287.163720] device_initial_probe+0x34/0x50
[ 287.168445] pci_bus_add_device+0x6e/0x90
[ 287.172995] pci_iov_add_virtfn+0x3c9/0x3e0
[ 287.177719] sriov_add_vfs+0x2c/0x60
[ 287.181838] sriov_enable+0x306/0x4a0
[ 287.186038] vfio_pci_core_sriov_configure+0x184/0x220
[ 287.191715] sriov_numvfs_store+0xd9/0x1c0
[ 287.196351] kernfs_fop_write_iter+0x13f/0x1d0
[ 287.201338] vfs_write+0x2be/0x3b0
[ 287.205286] ksys_write+0x73/0x100
[ 287.209233] do_syscall_64+0x14d/0x750
[ 287.213529] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 287.219120]
[ 287.219120] -> #0 (&vdev->memory_lock){+.+.}-{4:4}:
[ 287.225491] __lock_acquire+0x14c6/0x2800
[ 287.230048] lock_acquire+0xd3/0x2f0
[ 287.234168] down_write+0x3a/0xc0
[ 287.238019] vfio_pci_core_runtime_resume+0x1f/0xa0
[ 287.243436] __rpm_callback+0x8c/0x310
[ 287.247730] rpm_resume+0x529/0x6f0
[ 287.251765] __pm_runtime_resume+0x68/0x90
[ 287.256402] vfio_pci_core_enable+0x44/0x310
[ 287.261216] vfio_pci_open_device+0x1c/0x80
[ 287.265947] vfio_df_open+0x10f/0x150
[ 287.270148] vfio_group_fops_unl_ioctl+0x4a4/0x7b0
[ 287.275476] __se_sys_ioctl+0x71/0xc0
[ 287.279679] do_syscall_64+0x14d/0x750
[ 287.283975] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 287.289559]
[ 287.289559] other info that might help us debug this:
[ 287.289559]
[ 287.297582] Chain exists of:
[ 287.297582] &vdev->memory_lock --> (work_completion)(&arg.work)
--> &new_dev_set->lock
[ 287.297582]
[ 287.310023] Possible unsafe locking scenario:
[ 287.310023]
[ 287.315961] CPU0 CPU1
[ 287.320510] ---- ----
[ 287.325059] lock(&new_dev_set->lock);
[ 287.328917]
lock((work_completion)(&arg.work));
[ 287.336153] lock(&new_dev_set->lock);
[ 287.342523] lock(&vdev->memory_lock);
[ 287.346382]
[ 287.346382] *** DEADLOCK ***
[ 287.346382]
[ 287.352315] 2 locks held by vfio_pci_sriov_/18636:
[ 287.357125] #0: ff45bea208ed3e18 (&group->group_lock){+.+.}-{4:4},
at: vfio_group_fops_unl_ioctl+0x3e3/0x7b0
[ 287.367048] #1: ff45bea3a96b8230 (&new_dev_set->lock){+.+.}-{4:4},
at: vfio_group_fops_unl_ioctl+0x44d/0x7b0
[ 287.376976]
[ 287.376976] stack backtrace:
[ 287.381353] CPU: 191 UID: 0 PID: 18636 Comm: vfio_pci_sriov_
Tainted: G S 7.0.0-dbg-DEV #3 PREEMPTLAZY
[ 287.381355] Tainted: [S]=CPU_OUT_OF_SPEC
[ 287.381356] Call Trace:
[ 287.381357] <TASK>
[ 287.381358] dump_stack_lvl+0x54/0x70
[ 287.381361] print_circular_bug+0x2e1/0x300
[ 287.381363] check_noncircular+0xf9/0x120
[ 287.381364] ? __lock_acquire+0x5b4/0x2800
[ 287.381366] __lock_acquire+0x14c6/0x2800
[ 287.381368] ? pci_mmcfg_read+0x4f/0x220
[ 287.381370] ? pci_mmcfg_write+0x57/0x220
[ 287.381371] ? lock_acquire+0xd3/0x2f0
[ 287.381373] ? pci_mmcfg_write+0x57/0x220
[ 287.381374] ? lock_release+0xef/0x360
[ 287.381376] ? vfio_pci_core_runtime_resume+0x1f/0xa0
[ 287.381377] lock_acquire+0xd3/0x2f0
[ 287.381378] ? vfio_pci_core_runtime_resume+0x1f/0xa0
[ 287.381379] ? lock_is_held_type+0x76/0x100
[ 287.381382] down_write+0x3a/0xc0
[ 287.381382] ? vfio_pci_core_runtime_resume+0x1f/0xa0
[ 287.381383] vfio_pci_core_runtime_resume+0x1f/0xa0
[ 287.381384] ? __pfx_pci_pm_runtime_resume+0x10/0x10
[ 287.381385] __rpm_callback+0x8c/0x310
[ 287.381386] ? ktime_get_mono_fast_ns+0x3d/0xb0
[ 287.381389] ? __pfx_pci_pm_runtime_resume+0x10/0x10
[ 287.381390] rpm_resume+0x529/0x6f0
[ 287.381392] ? lock_is_held_type+0x76/0x100
[ 287.381394] __pm_runtime_resume+0x68/0x90
[ 287.381396] vfio_pci_core_enable+0x44/0x310
[ 287.381398] vfio_pci_open_device+0x1c/0x80
[ 287.381399] vfio_df_open+0x10f/0x150
[ 287.381401] vfio_group_fops_unl_ioctl+0x4a4/0x7b0
[ 287.381402] __se_sys_ioctl+0x71/0xc0
[ 287.381404] do_syscall_64+0x14d/0x750
[ 287.381405] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 287.381406] ? trace_irq_disable+0x25/0xd0
[ 287.381409] entry_SYSCALL_64_after_hwframe+0x77/0x7f
Introduce a private flag 'sriov_active' in the vfio_pci_core_device
struct. This allows the driver to track the SR-IOV power state requirement
without relying on pci_num_vf() while holding the memory_lock. The lock is
now only held to set the flag and ensure the device is in D0, after which
pci_enable_sriov() can be called without the lock.
Fixes: f4162eb1e2fc ("vfio/pci: Change the PF power state to D0 before enabling VFs")
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20260514173449.3282188-1-rananta@google.com
[promote bitfield to plain bool to avoid storage-unit races]
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Use IOMEM_ERR_PTR() when returning a void __iomem * rather than
ERR_PTR(). This fixes a sparse warning, "different address spaces".
Fixes: 859dc0f6253b ("vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605211601.U1OvmuqY-lkp@intel.com/
Signed-off-by: Matt Evans <mattev@meta.com>
Link: https://lore.kernel.org/r/20260522124215.3268565-1-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Add blk_rq_has_data(), an analogue of bio_has_data() for struct request.
This skips one dereference relative to bio_has_data(rq->bio).
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260513211846.1956810-2-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The UX500 crypto drivers were removed in commit 453de3eb08c4
("crypto: ux500/cryp - delete driver") and commit dd7b7972cb89
("crypto: ux500/hash - delete driver"). No file includes
this header.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Allow device attribute to reside in read-only memory.
Both const and non-const attributes are handled by the utility macros
and attributes can be migrated one-by-one.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-4-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The constification of device attributes will require a transition phase,
where 'struct device_attribute' contains a classic non-const and a new
const variant of the 'show' and 'store' callbacks.
As __ATTR() and friends can not handle this duplication stop using them.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-3-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
For the upcoming constification of device attributes the generic
__ATTR() macros are insufficient.
Prepare for a split by introducing new low-level macros specific to
device attributes.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-2-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This macro is unused and would create extra work during the upcoming
constification of device attributes. Remove it.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20260512-sysfs-const-attr-device_attr-prep-v3-1-cb7c17b34d52@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When an overlay is applied, if the target device has already probed
successfully and bound to a device, then some of the fw_devlink logic
that ran when the device was probed needs to be rerun. This allows newly
created dangling consumers of the overlayed device tree nodes to be
moved to become consumers of the target device.
[Herve: Add the call to driver_deferred_probe_trigger()]
[Herve: Use fwnode_test_flag() to test fwnode flags value]
Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays")
Reported-by: Herve Codina <herve.codina@bootlin.com>
Closes: https://lore.kernel.org/lkml/CAMuHMdXEnSD4rRJ-o90x4OprUacN_rJgyo8x6=9F9rZ+-KzjOg@mail.gmail.com/
Closes: https://lore.kernel.org/all/20240221095137.616d2aaa@bootlin.com/
Closes: https://lore.kernel.org/lkml/20240312151835.29ef62a0@bootlin.com/
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/lkml/20240411235623.1260061-3-saravanak@google.com/
[Herve: Rebase on top of recent kernel]
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Tested-by: Kalle Niemi <kaleposti@gmail.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260511155755.34428-3-herve.codina@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Constify the groups arrays, allowing to assign constant arrays.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/42624513-923c-4970-834d-036282e24e24@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Constify the groups arrays, allowing to assign constant arrays.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/265f6584-8edd-48a0-9568-a9d584b9ec3a@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
"struct class" is defined earlier on both cases.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: https://patch.msgid.link/6d5937c5-9d41-4cfe-9e42-0946e12dc72d@p183
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Quoting reporter:
A race between GRE keymap insertion and destruction can corrupt the
kernel list or use a freed object. `nf_ct_gre_keymap_add()` publishes a
new keymap pointer before the embedded `list_head` is linked, while
`nf_ct_gre_keymap_destroy()` can concurrently delete and free that
same object. An unprivileged user can reach this through the PPTP
conntrack helper by racing PPTP control messages or helper teardown,
leading to KASAN-detectable list corruption/UAF in kernel context.
## Root Cause Analysis
`exp_gre()` installs GRE expectations for a PPTP control flow and then
adds two GRE keymap entries [..]
The add path publishes `ct_pptp_info->keymap[dir]` before linking the
embedded list node [..]
Concurrent teardown deletes that partially initialized object.
Make add/destroy symmetric: install both, destroy both while under lock.
Furthermore, we should refuse to publish a new mapping in case ct is going
away, else we may leak the allocation.
The "retrans" detection is strange: existing mapping is checked for key
equality with the new mapping, then for "is on the list" via list walk.
But I can't see how an existing keymap entry can be NOT on list.
Change this to only check if we're asked to map same tuple again -- if so,
skip re-install, else signal failure.
Last, add a bug trap for the keymap list; it has to be empty when namespace
is going away.
Reported-by: Leo Lin <leo@depthfirst.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
If a firmware node is allocated on the stack (for instance: temporary
software node whose life-time we control) or on the heap - but using a
non-zeroing allocation function - and initialized using fwnode_init(),
its secondary pointer will contain uninitalized memory which likely will
be neither NULL nor IS_ERR() and so may end up being dereferenced (for
example: in dev_to_swnode()). Set fwnode->secondary to NULL on
initialization.
Cc: stable <stable@kernel.org>
Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The parport subsystem registers port devices before they are fully
initialised, resulting in a race condition where client drivers such
as lp can attach to ports that are not completely initialised or even
being torn down.
When the port and client drivers are built as modules and loaded
around the same time during boot, this occasionally results in a
crash. I was able to make this happen reliably in a VM with a
PC-style parallel port by patching parport_pc to fail probing:
> --- a/drivers/parport/parport_pc.c
> +++ b/drivers/parport/parport_pc.c
> @@ -2069,7 +2069,7 @@ static struct parport *__parport_pc_probe_port(unsigned long int base,
> if (!p)
> goto out3;
>
> - base_res = request_region(base, 3, p->name);
> + base_res = NULL;
> if (!base_res)
> goto out4;
>
and then running:
while true; do
modprobe lp & modprobe parport_pc
wait
rmmod lp parport_pc
done
for a few seconds.
In the long term I think port registration should be changed to put
the call to device_add() inside parport_announce_port(), but since the
latter currently cannot fail this will require changing all port
drivers.
For now, add a flag to indicate whether a port has been "announced"
and only try to attach client drivers to ports when the flag is set.
Fixes: 6fa45a226897 ("parport: add device-model to parport subsystem")
Closes: https://bugs.debian.org/1130365
Closes: https://lore.kernel.org/all/6ba903ad-9897-42bb-8c2d-337385cc3746@molgen.mpg.de/
Cc: stable <stable@kernel.org>
Signed-off-by: Ben Hutchings <benh@debian.org>
Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://patch.msgid.link/afo6uBv68GDevbMD@decadent.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Tracking in-flight inode wb switches with a single global counter
(isw_nr_in_flight) plus a synchronize_rcu() based wait in
cgroup_writeback_umount() forces every umount to take a global hit
whenever any other superblock on the system has wb switches in flight,
even if the superblock being unmounted has none of its own.
Replace the global synchronize_rcu()/flush_workqueue() pair with a
per-sb counter, s_isw_nr_in_flight, plus three small helpers:
- cgroup_writeback_pin(sb) - increment counter
- cgroup_writeback_unpin(sb) - decrement and wake drainer if last
- cgroup_writeback_drain(sb) - wait for counter to reach zero
The wiring is:
- inode_prepare_wbs_switch() pins before checking SB_ACTIVE and
grabbing the inode; failure paths unpin before returning. A
lockless SB_ACTIVE check at the top of the function lets us skip
the atomic_inc/smp_mb dance once SB_ACTIVE has been cleared (it
is monotonic and never set back).
- process_inode_switch_wbs() unpins after the matching iput().
- cgroup_writeback_umount() drains the per-sb counter via
wait_var_event().
The smp_mb() pair between inode_prepare_wbs_switch() and
cgroup_writeback_umount() keeps the SB_ACTIVE / counter ordering:
either the umounter sees a non-zero counter and waits, or the
switcher sees SB_ACTIVE cleared and aborts before grabbing the
inode.
The global isw_nr_in_flight is left in place, since it is still used
to throttle in-flight switches via WB_FRN_MAX_IN_FLIGHT.
The rcu_read_lock() extension in inode_switch_wbs() and
cleanup_offline_cgwb() that the race fix added is no longer needed
and is reverted; the synchronize_rcu() that the race fix added to
cgroup_writeback_umount() is dropped as well.
The following numbers were measured on a 16 vCPU QEMU guest with 4
background superblocks each churning "create memcg -> write 1 MiB ->
rmdir memcg" to keep the global isw_nr_in_flight non-zero. Latencies
are wall-clock around umount(8); only the target sb's umount is
measured.
Target sb runs its own cgwb churn:
p50 p95 p99 max
global synchronize_rcu() 67.6 ms 88.3 ms 88.3 ms 96.8 ms
per-sb counter (this) 7.9 ms 10.0 ms 10.0 ms 10.1 ms
Idle target umount latency under cross-sb cgwb-switch pressure:
p50 p95 p99 max
global synchronize_rcu() 62.7 ms 95.4 ms 108.1 ms 108.6 ms
per-sb counter (this) 5.3 ms 6.9 ms 7.4 ms 7.4 ms
no-pressure baseline 4.9 ms 5.9 ms 6.3 ms 6.7 ms
8 concurrent umounts of idle sbs under the same pressure:
p50 p95 max
global synchronize_rcu() 61.3 ms 99.5 ms 113.7 ms
per-sb counter (this) 8.1 ms 9.1 ms 9.5 ms
In-kernel cgroup_writeback_umount() time across the same run
(bpftrace, ~340 calls covering all scenarios):
global synchronize_rcu() 12371 ms total (~36 ms / call)
per-sb counter (this) 1.37 ms total ( ~4 us / call)
Suggested-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/177910456953.488929.2169908940676707307.b4-review@b4
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun@linux.alibaba.com>
Link: https://patch.msgid.link/20260521095016.2791354-4-libaokun@linux.alibaba.com
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|
|
uart_handle_break() and uart_prepare_sysrq_char() (in
include/linux/serial_core.h) capture a SysRq character into
port->sysrq_ch while the port lock is held and rely on the unlock
helper -- uart_unlock_and_check_sysrq_irqrestore() -- to dispatch the
captured character to handle_sysrq() on scope exit.
The existing guard(uart_port_lock_irqsave) cannot be used by IRQ
handlers that process RX, because its destructor calls plain
uart_port_unlock_irqrestore() and silently drops port->sysrq_ch.
Add a dedicated guard(uart_port_lock_check_sysrq_irqsave) variant
whose destructor is the sysrq-aware unlock helper. The lock side is
identical to uart_port_lock_irqsave -- only the unlock-time behaviour
differs. Callers that may capture SysRq characters must use
guard(uart_port_lock_check_sysrq_irqsave); the existing
guard(uart_port_lock_irqsave) keeps its current plain-unlock semantics
for the many callers that do not process RX.
The new macro is placed after the CONFIG_MAGIC_SYSRQ_SERIAL block so
both definitions of uart_unlock_and_check_sysrq_irqrestore() (sysrq
enabled and disabled) are visible at expansion time. When
CONFIG_MAGIC_SYSRQ_SERIAL=n the destructor degenerates to plain
uart_port_unlock_irqrestore(), so there is no overhead.
No functional change on its own; users are converted in the following
patches.
Cc: stable@vger.kernel.org
Signed-off-by: Jacques Nilo <jnilo@free.fr>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/3849af4bc55d5d2a424fa850844e94d641b2f8a6.1778675349.git.jnilo@free.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The hot path in vc_process_ucs() asks two independent questions about the
same code point -- "is it double-width?" and "is it zero-width?" -- and
was answering each with its own bsearch over its own table. For anything
past the leading bounds check that meant two scans of the BMP width
tables back to back for what is logically a single lookup.
Replace both with one ucs_get_width(cp) returning 0, 1, or 2 in a single
bsearch, while keeping the total table footprint at the same 2384 B as
before.
To do so, merge the zero-width and double-width ranges per region into
one sorted-by-`first` table. BMP entries stay 4 bytes; per-entry width
is hosted in spare bits of the non-BMP table's `last` field. Non-BMP
code points use only 20 of 32 bits, so each u32 has 12 unused high bits.
Store first/last shifted left by 12 and use the low 12 bits of `last`
for metadata: bit 11 is this entry's own width flag, bits 0..7 host an
8-bit chunk of the BMP double-width bitmap. Because the metadata bits
sit strictly below the lowest cp-scale bit, the bsearch comparator
remains a plain u32 compare on shifted keys with no masking.
In vc_process_ucs() the overwhelmingly common single-width path now
collapses to a single predicted branch:
if (likely(w == 1))
return 1;
Note: scripts/checkpatch.pl complains about "Macros with complex values
should be enclosed in parentheses" for the BMP_*WIDTH and
RANGE_*WIDTH macros. They are deliberately defined to expand to a
comma-separated (first, last) pair so they can populate the two
adjacent fields of a struct initializer; wrapping them in
parentheses would turn that into a comma-expression and defeat
the whole construction. Please ignore.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Link: https://patch.msgid.link/20260515034857.2514225-1-nico@fluxnic.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The kernel documentation specifies that the console option 'r' can
be used to enable hardware flow control for console writes. The 8250
driver does include code for hardware flow control on the console if
cons_flow is set, but there is no code path that actually sets this.
However, that is not the only issue. The problems are:
1. Specifying the console option 'r' does not lead to cons_flow being
set.
2. Even if cons_flow would be set, serial8250_register_8250_port()
clears it.
3. When the console option 'r' is specified, uart_set_options()
attempts to initialize the port for CRTSCTS. However, afterwards
it does not set the UPSTAT_CTS_ENABLE status bit and therefore on
boot, uart_cts_enabled() is always false. This policy bit is
important for console drivers as a criteria if they may poll CTS.
4. Even though uart_set_options() attempts to initialize the port
for CRTSCTS, the 8250 set_termios() callback does not enable the
RTS signal (TIOCM_RTS) and thus the hardware is not properly
initialized for CTS polling.
5. Even if modem control was properly setup for CTS polling
(TIOCM_RTS), uart_configure_port() clears TIOCM_RTS, thus
breaking CTS polling.
6. wait_for_xmitr() and serial8250_console_write() use cons_flow
to decide if CTS polling should occur. However, the condition
should also include a check that it is not in RS485 mode and
CRTSCTS is actually enabled in the hardware.
Address all these issues as conservatively as possible by gating them
behind checks focussed on the user specifying console hardware flow
control support and the hardware being configured for CTS polling
at the time of the write to the UART.
Since checking the UPSTAT_CTS_ENABLE status bit is a part of the new
condition gate, these changes also support runtime termios updates to
disable/enable CRTSCTS.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20260511152706.151498-4-john.ogness@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
drivers-for-7.2
Merge the refactoring and helper functions in the Qualcomm GENI Serial
Engine driver through a topic branch.
These changes will provide the ability to add support managing power and
performance for the GENI instances in platforms where these are
controlled as SCMI resources.
The patches are merged through a topic branch to avoid conflicts with other
changes, while making them available to other subsystems.
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The GENI Serial Engine (SE) drivers (I2C, SPI, and SERIAL) currently
manage performance levels and operating points directly. This resulting
in code duplication across drivers. such as configuring a specific level
or find and apply an OPP based on a clock frequency.
Introduce two new helper APIs, geni_se_set_perf_level() and
geni_se_set_perf_opp(), addresses this issue by providing a streamlined
method for the GENI Serial Engine (SE) drivers to find and set the OPP
based on the desired performance level, thereby eliminating redundancy.
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org>
Link: https://lore.kernel.org/r/20260227061544.1785978-8-praveen.talari@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The GENI Serial Engine drivers (I2C, SPI, and SERIAL) currently handle
the attachment of power domains. This often leads to duplicated code
logic across different driver probe functions.
Introduce a new helper API, geni_se_domain_attach(), to centralize
the logic for attaching "power" and "perf" domains to the GENI SE
device.
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org>
Link: https://lore.kernel.org/r/20260227061544.1785978-7-praveen.talari@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The GENI SE protocol drivers (I2C, SPI, UART) implement similar resource
activation/deactivation sequences independently, leading to code
duplication.
Introduce geni_se_resources_activate()/geni_se_resources_deactivate() to
power on/off resources.The activate function enables ICC, clocks, and TLMM
whereas the deactivate function disables resources in reverse order
including OPP rate reset, clocks, ICC and TLMM.
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org>
Link: https://lore.kernel.org/r/20260227061544.1785978-6-praveen.talari@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The GENI Serial Engine drivers (I2C, SPI, and SERIAL) currently duplicate
code for initializing shared resources such as clocks and interconnect
paths.
Introduce a new helper API, geni_se_resources_init(), to centralize this
initialization logic, improving modularity and simplifying the probe
function.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org>
Link: https://lore.kernel.org/r/20260227061544.1785978-4-praveen.talari@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Add a new function geni_icc_set_bw_ab() that allows callers to set
average bandwidth values for all ICC (Interconnect) paths in a single
call. This function takes separate parameters for core, config, and DDR
average bandwidth values and applies them to the respective ICC paths.
This provides a more convenient API for drivers that need to configure
specific average bandwidth values.
Co-developed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Tested-by: Mattijs Korpershoek <mkorpershoek@kernel.org>
Link: https://lore.kernel.org/r/20260227061544.1785978-3-praveen.talari@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
This reverts commit db359fccf212 ("mm: introduce a new page type for page
pool in page type") and a part of 735a309b4bfb9e ("net: add net_iov_init()
and use it to initialize ->page_type").
Netpp page_type'ed pages might be used in mapping so as to use @_mapcount.
However, since @page_type and @_mapcount are union'ed in struct page,
these two can't be used at the same time. Revert the commit introducing
page_type for Netpp for now.
The patch will be retried once @page_type and @_mapcount get allowed to be
used at the same time.
The revert also includes removal of @page_type initialization part
introduced by commit 735a309b4bfb9e ("net: add net_iov_init() and use it
to initialize ->page_type"), which will be restored on the retry.
Link: https://lore.kernel.org/20260515034701.17027-1-byungchul@sk.com
Fixes: db359fccf212 ("mm: introduce a new page type for page pool in page type")
Signed-off-by: Byungchul Park <byungchul@sk.com>
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Closes: https://lore.kernel.org/all/982b9bc1-0a0a-4fc5-8e3a-3672db2b29a1@nvidia.com
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Mark Bloch <mbloch@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Toke Hoiland-Jorgensen <toke@redhat.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use
mmap_prepare") with conflict resolution to account for changes in commit
ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare").
The patch incorrectly handled hugetlb VMA lock allocation at the
mmap_prepare stage, where a failed allocation occurring after mmap_prepare
is called might result in the lock leaking.
There is no risk of a merge causing a similar issues, as
VMA_DONTEXPAND_BIT is set for hugetlb mappings.
As a first step in addressing this issue, simply revert the change so we
can rework how we do this having corrected the underlying issues.
We maintain the VMA flags changes as best we can, accounting for the fact
that we were working with a VMA descriptor previously and propagating
like-for-like changes for this.
Note that we invoke vma_set_flags() and do not call vma_start_write() as
vm_flags_set() does. This is OK as it's being done in an .mmap hook where
the VMA is not yet linked into the tree so nobody else can be accessing
it.
Link: https://lore.kernel.org/20260512160643.266960-1-ljs@kernel.org
Fixes: ea52cb24cd3f ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare")
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Reported-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Closes: https://lore.kernel.org/linux-mm/20260425070700.562229-1-25181214217@stu.xidian.edu.cn/
Acked-by: Muchun Song <muchun.song@linux.dev>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Cross-merge networking fixes after downstream PR (net-7.1-rc5).
No conflicts, adjacent changes:
drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
cc199cd1b912 ("net/mlx5e: Reduce branches in napi poll")
c326f9c68921 ("net/mlx5e: xsk: Fix unlocked writing to ICOSQ")
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
c6df9a65cbb0 ("net/mlx5: Skip disabled vports when setting max TX speed")
1fba57c91416 ("net/mlx5: Add VHCA_ID page management mode support")
net/mac80211/mlme.c
a6e6ccd5bd07 ("wifi: mac80211: consume only present negotiated TTLM maps")
49e62ec6eb06 ("wifi: mac80211: move frame RX handling to type files")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a typo "evetnfs files" to "eventfs files" in a comment.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260507081041.885781-2-martin@kaiser.cx
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The trace_printk() macro uses a local variable _______STR to detect
whether variadic arguments are present. This name can shadow outer
variables.
Replace the local variable with sizeof applied directly to the
stringified arguments:
if (sizeof __stringify((__VA_ARGS__)) > 3)
This eliminates the shadowing risk entirely without introducing
any additional includes or local variables.
Verified with objdump on samples/trace_printk that all four cases
branch correctly: __trace_bputs, __trace_puts, __trace_bprintk,
and __trace_printk.
Link: https://patch.msgid.link/20260502075535.34997-1-tiffany019230@gmail.com
Suggested-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Qian-Yu Lin <tiffany019230@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The trace_##name##_enabled() static call branch is used when work needs to
be done for a tracepoint. It allows that work to be skipped when the
tracepoint is not active and still uses the static_branch() of the
tracepoint to keep performance.
Tracepoints themselves require being called in "RCU watching" locations
otherwise races can occur that corrupts things. In order to make sure
lockdep triggers at tracepoint locations, the lockdep checks are added to
the tracepoint calling location and trigger even if the tracepoint is not
enabled. This is done because a poorly placed tracepoint may never be
detected if it is never enabled when lockdep is enabled.
As trace_##name##_enabled() also prevents the lockdep checks when the
tracepoint is disabled add lockdep checks to that as well so that if one
is placed in a location that RCU is not watching, it will trigger a
lockdep splat even when the tracepoint is not enabled.
Cc: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/20260430144159.10985-1-devnexen@gmail.com
Signed-off-by: David Carlier <devnexen@gmail.com>
[ Updated the change log ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth, wireless and netfilter.
Craziness continues with no end in sight. Even discounting the driver
revert this is a pretty huge PR for standards of the previous era. I'd
speculate - we haven't seen the worst of it, yet. Good news, I guess,
is that so far we haven't seen many (any?) cases of "AI reported a
bug, we fixed it and a real user regressed".
Current release - fix to a fix:
- Bluetooth: btmtk: accept too short WMT FUNC_CTRL events
- vsock/virtio: relax the recently added memory limit a little
Current release - regressions:
- IB/IPoIB: make sure IB drivers always use async set_rx_mode since
some (mlx5) are now required to use it due to locking changes
Previous releases - regressions:
- udp: fix UDP length on last GSO_PARTIAL segment
- af_unix: fix UAF read of tail->len in unix_stream_data_wait()
- tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction
- mlx5e: fix unlocked writing to ICOSQ, breaking AF_XDP
Previous releases - always broken:
- tap: fix stack info leak in tap_ioctl() SIOCGIFHWADDR
- ipv4: raw: reject IP_HDRINCL packets with ihl < 5
- Bluetooth: a lot of locking and concurrency fixes (as always)
- batman-adv (mesh wireless networking): a lot of random fixes for
issues reported by security researchers and Sashiko
- netfilter: same thing, a lot of small security-ish fixes all over
the place, nothing really stands out
Misc:
- bring back the old 3c509 driver, Maciej wants to maintain it"
* tag 'net-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (187 commits)
net: enetc: avoid VF->PF mailbox timeout during SR-IOV teardown
net: enetc: fix init and teardown order to prevent use of unsafe resources
net: enetc: fix unbounded loop and interrupt handling in VF-to-PF messaging
net: enetc: fix DMA write to freed memory in enetc_msg_free_mbx()
net: enetc: fix race condition in VF MAC address configuration
net: enetc: fix TOCTOU race and validate VF MAC address
net: enetc: add ratelimiting to VF mailbox error messages
net: enetc: fix missing error code when pf->vf_state allocation fails
net: enetc: fix incorrect mailbox message status returned to VFs
net: bridge: prevent too big nested attributes in br_fill_linkxstats()
l2tp: use list_del_rcu in l2tp_session_unhash
net: bcmgenet: keep RBUF EEE/PM disabled
ethernet: 3c509: Fix most coding style issues
ethernet: 3c509: Update documentation to match MAINTAINERS
ethernet: 3c509: Add GPL 2.0 SPDX license identifier
ethernet: 3c509: Fix AUI transceiver type selection
Revert "drivers: net: 3com: 3c509: Remove this driver"
tools: ynl: support listening on all nsids
net: gro: don't merge zcopy skbs
pds_core: ensure null-termination for firmware version strings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Permit ACPI PRM runtime firmware calls when acpi_init() runs
- Add another Lenovo Ideapad framebuffer quirk
- Cosmetic tweak
* tag 'efi-fixes-for-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: sysfb_efi: Extend quirk to cover IdeaPad Duet 3 10IGL5-LTE
efi: efi.h: Remove extra semicolon
efi: Allocate runtime workqueue before ACPI init
|
|
AMD Promontory 21 (PROM21) xHCI PCI functions use the common xhci-pci
core for USB operation, but also expose controller-specific sensor data.
Add a small PROM21 PCI glue driver for AMD 1022:43fc and 1022:43fd
controllers.
The glue delegates USB host operation to the common xhci-pci core and
publishes a "hwmon" auxiliary device with parent-provided MMIO data.
Auxiliary device creation failure is logged but does not fail the xHCI
probe.
Make the PROM21 glue a hidden Kconfig tristate driven by the user-visible
SENSORS_PROM21_XHCI option. If sensor support is disabled, generic
xhci-pci binds PROM21 controllers normally. If sensor support is enabled,
the glue follows USB_XHCI_PCI.
This keeps the auxiliary device available for a modular sensor driver while
avoiding a built-in xhci-pci core handing PROM21 controllers to a glue
driver that is only available as a module during initramfs.
Assisted-by: Codex:gpt-5.5
Signed-off-by: Jihong Min <hurryman2212@gmail.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20260519000732.2334711-2-hurryman2212@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The hid_warn_ratelimited macro is defined twice in include/linux/hid.h:
- first one added by commit 4051ead99888 ("HID: rate-limit hid_warn to
prevent log flooding")
- second one added by commit 1d64624243af ("HID: core: Add
printk_ratelimited variants to hid_warn() etc")).
The second definition is correctly grouped with other ratelimited macros.
Remove the duplicate definition.
Fixes: 1d64624243af ("HID: core: Add printk_ratelimited variants to hid_warn() etc")
Signed-off-by: Liu Kai <lukace97@outlook.com>
[bentiss: edited commit message]
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
This flag indicates the path should be opened if it's a regular file.
This is useful to write secure programs that want to avoid being
tricked into opening device nodes with special semantics while thinking
they operate on regular files. This is a requested feature from the
uapi-group[1].
The previously introduced EFTYPE error code is returned when the path
doesn't refer to a regular file. For example, if openat2 is called on
path /dev/null with OPENAT2_REGULAR in the flag param, it will return
-EFTYPE.
When used in combination with O_CREAT, either the regular file is
created, or if the path already exists, it is opened if it's a regular
file. Otherwise, -EFTYPE is returned.
When OPENAT2_REGULAR is combined with O_DIRECTORY, -EINVAL is returned
as it doesn't make sense to open a path that is both a directory and a
regular file.
The UAPI bit lives in the upper 32 bits of open_how::flags
(((__u64)1 << 32)) so that open(2) and openat(2) -- whose @flags
argument is a C int -- cannot physically express it. This is a
structural guarantee, not a runtime mask: the bit is unrepresentable in
32 bits.
Because the rest of the VFS open path narrows to 32 bits in several
places (op->open_flag, f->f_flags, the unsigned open_flag argument of
i_op->atomic_open()), build_open_flags() translates OPENAT2_REGULAR
into a kernel-internal lower-32-bit carrier __O_REGULAR (bit 4, unused
as an O_* on every architecture) before the assignment to op->open_flag.
__O_REGULAR then rides through the existing channels exactly like
__FMODE_EXEC. do_dentry_open() strips it so it cannot leak back to
userspace via fcntl(F_GETFL).
Four BUILD_BUG_ON_MSG() invariants in build_open_flags() prevent any
future bit collision or accidental low-32 redefinition:
- VALID_OPEN_FLAGS fits in 32 bits.
- OPENAT2_REGULAR lives in the upper 32 bits.
- OPENAT2_REGULAR does not alias any open()/openat() flag.
- __O_REGULAR does not alias any user-visible flag.
[1]: https://uapi-group.org/kernel-features/#ability-to-only-open-regular-files
Christian Brauner <brauner@kernel.org> says:
Move OPENAT2_REGULAR to the upper 32 bits of open_how::flags with a
kernel-internal __O_REGULAR carrier so that open(2)/openat(2) cannot
encode the flag; add BUILD_BUG_ON_MSG() invariants and register
__O_REGULAR in the fcntl_init() allocation-uniqueness BUILD_BUG_ON()
(bit count 21 -> 22).
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Link: https://patch.msgid.link/20260328172314.45807-2-dorjoychy111@gmail.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Aleksa Sarai <aleksa@amutable.com>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|