| Age | Commit message (Collapse) | Author |
|
Cache aware scheduling needs to know the LLC size that a process
can use, so as to avoid memory-intensive tasks from being
over-aggregated on a single LLC.
Introduce a preparation patch to add get_effective_llc_bytes() to
get the LLC size that a CPU can use. The function can be further
enhanced by subtracting the LLC cache ways reserved by resctrl
(CAT in Intel RDT, etc).
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingyin Duan <tingyin.duan@gmail.com>
Link: https://patch.msgid.link/37afee09ff608034da0ce149e72d33b6f4698edf.1778703694.git.tim.c.chen@linux.intel.com
|
|
counts
A performance regression was observed by Prateek when running hackbench
with many threads per process (high fd count). To avoid this, processes
with a large number of active threads are excluded from cache-aware
scheduling.
With sched_cache enabled, record the number of active threads in each
process during the periodic task_cache_work(). While iterating over
CPUs, if the currently running task belongs to the same process as the
task that launched task_cache_work(), increment the active thread count.
If the number of active threads within the process exceeds the number
of Cores (divided by the SMT number) in the LLC, do not enable
cache-aware scheduling. However, on systems with a smaller number of
CPUs within 1 LLC, like Power10/Power11 with SMT4 and an LLC size of 4,
this check effectively disables cache-aware scheduling for any process.
One possible solution suggested by Peter is to use an LLC-mask instead
of a single LLC value for preference. Once there are a 'few' LLCs as
preference, this constraint becomes a little easier. It could be an
enhancement in the future.
For users who wish to perform task aggregation regardless, a debugfs knob
is provided for tuning in a subsequent change.
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Suggested-by: Aaron Lu <ziqianlu@bytedance.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingyin Duan <tingyin.duan@gmail.com>
Link: https://patch.msgid.link/d076cd21a8e6c6341d1e2d927e118db770ebb650.1778703694.git.tim.c.chen@linux.intel.com
|
|
Scanning online CPUs to calculate the occupancy might be
time-consuming. Only allow 1 thread of the process to scan
the CPUs at the same time, which is similar to what
NUMA balance does in task_numa_work().
Signed-off-by: Jianyong Wu <wujianyong@hygon.cn>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/5672b52e588b855b01e5a1a17822f7c6c7237a3d.1778703694.git.tim.c.chen@linux.intel.com
|
|
css_rstat_updated() is exposed as a BPF kfunc and accepts a
caller-provided cpu argument. The function uses cpu for per-cpu rstat
lookups without checking whether it refers to a valid possible CPU.
A BPF iter/cgroup program with CAP_BPF and CAP_PERFMON can pass an
invalid cpu value. On an unfixed UBSCAN_BOUNDS test kernel, cpu ==
0x7fffffff triggers:
UBSAN: array-index-out-of-bounds in kernel/cgroup/rstat.c:31:9
index 2147483647 is out of range for type 'long unsigned int [64]'
Call Trace:
css_rstat_updated
bpf_iter_run_prog
cgroup_iter_seq_show
bpf_seq_read
Add cpu validation to the BPF-facing css_rstat_updated() kfunc and
move the common implementation to __css_rstat_updated() for in-kernel
callers.
Fixes: a319185be9f5 ("cgroup: bpf: enable bpf programs to integrate with rstat")
Signed-off-by: Qing Ming <a0yami@mailbox.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
When the mailbox controller failed transmitting message, the error code
was only passed to the client's tx done handler and not to
mbox_send_message() in blocking mode. For this reason, the function could
return a false success. This commit resolves the issue by introducing the
tx status and checking it before mbox_send_message() returns.
This commit works with the premise that the multi-threads' access to a
channel in blocking mode is serialized by clients, not by the mailbox
APIs, since the current mbox_send_message() in blocking mode does not
support multi-threads.
Signed-off-by: Joonwon Kang <joonwonkang@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@kernel.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
Herve Codina <herve.codina@bootlin.com> says:
On some embedded system boards, audio amplifiers are designed using
discrete components such as op-amp, several resistors and switches to
either adjust the gain (switching resistors) or fully switch the
audio signal path (mute and/or bypass features).
Those switches are usually driven by simple GPIOs.
This kind of amplifiers are not handled in ASoC and the fallback is to
let the user-space handle those GPIOs out of the ALSA world.
In order to have those kind of amplifiers fully integrated in the audio
stack, this series introduces the audio-gpio-amp to handle them.
This new ASoC component allows to have the amplifiers seen as ASoC
auxiliarty devices and so it allows to control them through audio mixer
controls.
In order to ease the review, I choose to split modifications related
to the merge of the gpio-audio-amp part into the simple-amplfier driver
in several commits.
Link: https://patch.msgid.link/20260513081702.317117-1-herve.codina@bootlin.com
|
|
Signed integers can be read from single value properties using
of_property_read_s32() but nothing exist to read signed integers
from multi-value properties.
Fix this lack adding of_property_read_s32_index().
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260513081702.317117-2-herve.codina@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux into soc/drivers
FSL SOC Changes for 7.1
Freescale QUICC Engine:
- Add missing cleanup on device removal and switch to irq_domain_create_linear()
in interrupt controller for IO Ports
- Panic on ioremap() failure in qe_reset()
Freescale Management Complex:
- Move fsl-mc over to device MSI infrastructure
- Wait for the MC firmware to complete its boot
Freescale Hypervisor:
- Fix header kernel-doc warnings
* tag 'soc_fsl-7.1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux:
bus: fsl-mc: wait for the MC firmware to complete its boot
soc: fsl: qe: panic on ioremap() failure in qe_reset()
soc: fsl: qe_ports_ic: switch to irq_domain_create_linear()
soc: fsl: qe_ports_ic: Add missing cleanup on device removal
virt: fsl_hypervisor: fix header kernel-doc warnings
platform-msi: Remove stale comment
fsl-mc: Remove legacy MSI implementation
fsl-mc: Switch over to per-device platform MSI
irqchip/gic-v3-its: Add fsl_mc device plumbing to the msi-parent handling
fsl-mc: Add minimal infrastructure to use platform MSI
fsl-mc: Remove MSI domain propagation to sub-devices
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a fixes for the current development cycle. Note that AI
related review sometimes delays fixes a bit because we find more fixes
for the fixes. I might try and send smaller but more fixes PRs if this
trend keeps up.
- Fix various netfslib bugs
- Fix an out-of-bounds write when listing idmappings
- Fix the return values in jfs_mkdir() and orangefs_mkdir()
- Fix a writeback writeback array overflow in fuse
- Fix a forced iversion increment on lazytime timestamp updates
- Reject a negative timeval component in kern_select()
- Fix error return when vfs_mkdir() fails in the cachefiles code
- Fix wrong error code returned for pidns ioctls"
* tag 'vfs-7.1-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
cachefiles: Fix error return when vfs_mkdir() fails
afs: Fix the locking used by afs_get_link()
netfs, afs: Fix write skipping in dir/link writepages
netfs: Fix netfs_read_folio() to wait on writeback
netfs: Fix folio->private handling in netfs_perform_write()
netfs: Fix partial invalidation of streaming-write folio
netfs: Fix potential UAF in netfs_unlock_abandoned_read_pages()
netfs: Fix leak of request in netfs_write_begin() error handling
netfs: Fix early put of sink folio in netfs_read_gaps()
netfs: Fix write streaming disablement if fd open O_RDWR
netfs: Fix read-gaps to remove netfs_folio from filled folio
netfs: Fix potential deadlock in write-through mode
netfs: Fix streaming write being overwritten
netfs: Defer the emission of trace_netfs_folio()
netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone
netfs: Fix overrun check in netfs_extract_user_iter()
netfs: fix error handling in netfs_extract_user_iter()
netfs: Fix potential uninitialised var in netfs_extract_user_iter()
netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call
netfs: Fix zeropoint update where i_size > remote_i_size
...
|
|
As described in commit 2bc057692599 ("block: don't make REQ_POLLED imply
REQ_NOWAIT"), which fixed the same issue for the block device node, there
are valid cases to poll for I/O completion without REQ_NOWAIT.
Additionally, sing REQ_NOWAIT for file system writes is currently not
supported as file systems writes are not idempotent and would need a
retry of just the bio and not the entire operation to be fully supported.
Switch iomap to set REQ_POLLED and remove the now unused bio_set_polled
helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260518062917.506483-1-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This commit moves CPU hotplug callbacks from ETMv4 driver to core layer.
The motivation is the core layer can control all components on an
activated path rather but not only managing tracer in ETMv4 driver.
The perf event layer will disable CoreSight PMU event 'cs_etm' when
hotplug off a CPU. That means a perf mode will be always converted to
disabled mode in CPU hotplug. Arm CoreSight CPU hotplug callbacks only
need to handle the Sysfs mode and ignore the perf mode.
Add a 'mode' argument to coresight_pm_get_active_path() so it only
returns active paths for the relevant mode. Define the enum with bit
flags so it is safe for bitwise operations.
Change CPUHP_AP_ARM_CORESIGHT_STARTING to CPUHP_AP_ARM_CORESIGHT_ONLINE
so that the CPU hotplug callback runs in the online state and thread
context, allowing coresight_disable_sysfs() to be called directly to
disable the path.
Tested-by: James Clark <james.clark@linaro.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-27-f88c4a3ecfe9@arm.com
|
|
Except for software sources (e.g. STM), other sources treat multiple
enables as equivalent to a single enable. The device mode already
tracks the binary state, so it is redundant to operate refcount.
Introduce a helper coresight_is_software_source() for check software
source. Refactor to maintain the refcount only for software sources.
This simplifies future CPU PM handling without refcount logic.
Tested-by: James Clark <james.clark@linaro.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-26-f88c4a3ecfe9@arm.com
|
|
The ACS specification does not allow a non-NCQ command to be issued while
an NCQ command is outstanding.
Commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
introduced a feature where a deferred non-NCQ command gets issued from a
workqueue. The design stores a single non-NCQ command per port.
However, when using Port Multipliers (PMPs), specifically PMPs that
support FIS-Based Switching (FBS), non-NCQ and NCQ commands can be mixed
on the same port, just not for the same link, see e.g. ata_std_qc_defer()
which is, and always has operated on a per-link basis.
Therefore, move the deferred_qc from struct ata_port to struct ata_link.
This way, when using a PMP with FBS, we will not needlessly defer commands
to all other links, just because one link issued a non-NCQ command while
having an NCQ command outstanding. Only commands for that specific link
will be deferred. This is in line with how PMPs with FBS worked before
commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation").
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Tested-by: Tommy Kelly <linux@tkel.ly>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
When using Port Multipliers (PMPs) with Command-Based Switching (CBS), you
can only issue commands to one link at a time. For PMPs with CBS, there is
already code to handle commands being sent to different links in
sata_pmp_qc_defer_cmd_switch() using ap->excl_link. sata_sil24 also makes
use of ap->excl_link.
A user on the list reported that commit 0ea84089dbf6 ("ata: libata-scsi:
avoid Non-NCQ command starvation") broke PMPs with CBS. The commit
introduced code that stores a deferred qc in ap->deferred_qc, to later be
issued via a workqueue. It turns out that this change is incompatible with
the existing ap->excl_link handling used by PMPs with CBS.
Thus, modify sata_pmp_qc_defer_cmd_switch() and sil24_qc_defer() to return
ATA_DEFER_LINK_EXCL, and make sure that the deferred QC handling via
workqueue is not used for this return value.
This way, PMPs with CBS will work once again. Note that the starvation
referenced in commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ
command starvation") can only happen on libsas ports, and libsas does not
support Port Multipliers, thus there is no harm of reverting back to the
previous way of deferring commands for PMPs with CBS.
Non-libsas ports connected to anything but a PMP with CBS (e.g. a normal
drive or a PMP with FBS) will continue using the deferred workqueue, since
it does result in lower completion latencies for non-NCQ commands, even
though the workqueue is not strictly needed to avoid starvation for
non-libsas ports.
If we want to modify the scope of the workqueue issuing to also handle
PMPs with CBS, then we should ensure that we can save both NCQ and non-NCQ
commands in ap->deferred_qc, while also removing the existing PMP CBS
handling using ap->excl_link, such that we don't duplicate features.
While at it, also add a comment explaining how the ap->excl_link mechanism
works.
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Tested-by: Tommy Kelly <linux@tkel.ly>
Reported-by: Tommy Kelly <linux@tkel.ly>
Closes: https://lore.kernel.org/linux-ide/ce09cc21-a8e9-4845-b205-35411e22fba9@tkel.ly/
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
Johan Hovold <johan@kernel.org> says:
In preparation for fixing the SPI controller API so that it no longer
drops a reference when deregistering (non-managed) controllers (cf.
[1]), this series converts drivers using managed registration to also
use managed allocation.
Included is also a related cleanup of a lp8841-rtc.
This leaves us with 18 drivers using non-managed allocation, which is
few enough to be able to fix the API in tree-wide change.
Johan
[1] https://lore.kernel.org/lkml/20260325145319.1132072-1-johan@kernel.org/
Link: https://patch.msgid.link/20260511150408.796155-1-johan@kernel.org
|
|
This commit only set the path pointer for system tracers (e.g. STM) in
coresight_{enable|disable}_source().
Later changes will set the path pointer locally for per-CPU sources.
This is because the mode and path pointer must be set together, so that
they are observed atomically by the CPU PM notifier.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-19-f88c4a3ecfe9@arm.com
|
|
The current implementation only saves and restores the context for ETM
sources while ignoring the context of links. However, if funnels or
replicators on a linked path resides in a CPU or cluster power domain,
the hardware context for the link will be lost after resuming from low
power states.
To support context management for links during CPU low power modes, a
better way is to implement CPU PM callbacks in the Arm CoreSight core
layer. As the core layer has sufficient information for linked paths,
from tracers to links, which can be used for power management.
As a first step, this patch registers CPU PM notifier in the core layer.
If a source device provides callbacks for saving and restoring context,
these callbacks will be invoked in CPU suspend and resume.
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Tested-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-11-f88c4a3ecfe9@arm.com
|
|
The CPU ID can be fetched directly from the coresight_device structure,
so the .cpu_id() callback is no longer needed.
Remove the .cpu_id() callback from source ops and update callers
accordingly.
Tested-by: Jie Gan <jie.gan@oss.qualcomm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-5-f88c4a3ecfe9@arm.com
|
|
Add a new flag CORESIGHT_DESC_CPU_BOUND to indicate components that
are CPU bound. Populate CPU ID into the coresight_device structure;
otherwise, set CPU ID to -1 for non CPU bound devices.
Use the {0} initializer to clear coresight_desc structures to avoid
uninitialized values.
Tested-by: Jie Gan <jie.gan@oss.qualcomm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260515-arm_coresight_path_power_management_improvement-v14-4-f88c4a3ecfe9@arm.com
|
|
DIO-II support was added in 2004, update a comment to reflect this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://patch.msgid.link/5aa3901baaa5d145804e1a836dd8ee3fb07ea144.1777897387.git.geert@linux-m68k.org
|
|
Introduce device-managed helpers for clocksource registration.
The clocksource framework currently provides __clocksource_register_scale()
along with convenience wrappers for Hz and kHz registration. However,
drivers must handle error paths and cleanup manually, typically by pairing
registration with an explicit clocksource_unregister() call.
Add a devm-based variant, __devm_clocksource_register_scale(), along with
devm_clocksource_register_hz() and devm_clocksource_register_khz() helpers.
These helpers register the clocksource and attach a devres action to
automatically unregister it on driver detach or probe failure.
This simplifies driver code by:
* removing explicit cleanup paths
* ensuring correct teardown ordering
* aligning with the devm-based resource management model widely used
across the kernel
While drivers can open-code devm_add_action_or_reset(), providing a
dedicated helper avoids duplication, reduces boilerplate, and ensures
consistent usage across drivers, following patterns used in other
subsystems.
This is also particularly useful for drivers built as modules, where
device-managed resource handling avoids manual cleanup in remove paths and
ensures correct teardown on module unload.
This helper is self-contained and can be adopted progressively by drivers.
No functional change.
Signed-off-by: Daniel Lezcano <daniel.lezcano@oss.qualcomm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260506153831.605159-1-daniel.lezcano@oss.qualcomm.com
|
|
When a main program with exception_boundary has outgoing stack
arguments (e.g. from calling subprogs with >5 args), bpf_throw() fails
to correctly restore callee-saved registers, causing a kernel crash.
The x86 JIT allocates the outgoing stack arg area below the
callee-saved registers via 'sub rsp, outgoing_rsp' in the prologue.
When bpf_throw() unwinds, it captures the main program's sp (which
includes this outgoing area) and passes it to the exception callback.
The callback gets rsp and rbp, followed by pop_callee_regs, but rsp
points into the outgoing arg area rather than the callee-saved
registers, so the pops restore garbage values. Returning to the
kernel with corrupted callee-saved registers causes a crash.
Fix this by adjusting the sp (adding stack_arg_sp_adjust) passed to
the exception callback, so it points to the bottom of the callee-saved
registers instead of the outgoing arg area. When stack_arg_sp_adjust
is 0 (the common case), this is a no-op.
Fixes: 324c3ca6eed6 ("bpf,x86: Implement JIT support for stack arguments")
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260517150702.288031-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Global subprogs are verified independently and are not descended into
when their callers are symbolically executed. This means a caller can
hold references or locks across a global subprog call that may throw,
while the verifier only checks the non-exceptional return path at the
call site.
Record whether a subprog might throw in the CFG summary pass, alongside
the existing might_sleep and packet-data-changing summaries, and
propagate that effect through reachable callees.
When a global subprog is marked as possibly throwing, push the normal
continuation and validate the exceptional path immediately at the call
site, avoiding a synthetic exception state and associated special case
in the pruning checks.
Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260517075530.3461166-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
- Fix ARM64-specific rseq regressions (Mark Rutland)
* tag 'sched-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/entry: Fix arm64-specific rseq brokenness
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull IRQ fixes from Ingo Molnar:
- Fix use-after-free in irq_work_single() on PREEMPT_RT (Jiayuan Chen)
- Don't call add_interrupt_randomness() for NMIs in
handle_percpu_devid_irq() (Mark Rutland)
- Remove unused function in the ath79-cpu irqchip driver causing LKP
CI build warnings (Rosen Penev)
- Fix IRQ allocation/teardown leakage regressions in the GICv5 irqchip
driver (Sascha Bischoff)
- Fix an IRQ trigger type regression in the Meson S4 SoC irqchip driver
(Xianwei Zhao)
- Fix CPU offlining regression in the RiscV IMSIC irqchip driver
(Yong-Xuan Wang)
* tag 'irq-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT
irqchip/riscv-imsic: Clear interrupt move state during CPU offlining
irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type()
irqchip/ath79-cpu: Remove unused function
genirq/chip: Don't call add_interrupt_randomness() for NMIs
irqchip/gic-v5: Allocate ITS parent LPIs as a range
irqchip/gic-v5: Support range allocation for LPIs
irqchip/gic-v5: Move LPI allocation into the LPI domain
|
|
Pass a parent device into ffa_device_register() and use the synthetic
arm-ffa platform device as the parent for each registered FF-A device.
This keeps the enumerated FF-A partition devices anchored below the FF-A
core device in the driver model, matching the platform-driver conversion
of the core transport.
Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://patch.msgid.link/20260508-b4-ffa_plat_dev-v1-3-c5a30f8cf7b8@kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
|
|
This was causing a build break when CONFIG_FILE_LOCKING was disabled.
Move the LEASE_BREAK_* flags into the non-#ifdef'ed part of the file.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605161232.1lY6pZoM-lkp@intel.com/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260516-dir-deleg-fix-v1-1-1b68f0aa990a@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull VFIO fixes from Alex Williamson:
- Convert vfio-pci BAR resource requests and iomaps initialization
from a lazy, on-demand model to an eager pre-allocation model to
avoid races while preserving legacy error behavior. Fix unchecked
barmap access in dma-buf export path (Matt Evans)
- Introduce an implicit unsigned cast in converting vfio-pci device
offsets to region indexes, closing a potential out-of-bounds
access through the vfio_pci_ioeventfd() interface (Matt Evans)
- Fix a dma-buf kref underflow and stuck wait_for_completion() when
closing a previously revoked dma-buf (Alex Williamson)
* tag 'vfio-v7.1-rc4' of https://github.com/awilliam/linux-vfio:
vfio/pci: Check BAR resources before exporting a DMABUF
vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable()
vfio/pci: Make VFIO_PCI_OFFSET_TO_INDEX() return unsigned
vfio/pci: fix dma-buf kref underflow after revoke
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe merge request via Keith:
- Fix memory leak on a passthrough integrity mapping failure (Keith)
- Hide secrets behind debug option (Hannes)
- Fix pci use-after-free for host memory buffer (Chia-Lin Kao)
- Fix tcp taregt use-after-free for data digest (Sagi)
- Revert a mistaken quirk (Alan Cui)
- Fix uevent and controller state race condition (Maurizio)
- Fix apple submission queue re-initialization (Nick Chan)
- Three fixes for blk-integrity, fixing an issue with the user data
mapping and two problems with recomputing number of segments
- Two fixes for the iov_iter bounce buffering
- Fix for the handling of dead zoned write plugs
- ublk max_sectors validation fix, with associated selftest addition
* tag 'block-7.1-20260515' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
nvme-apple: Reset q->sq_tail during queue init
block: align down bounces bios
block: pass a minsize argument to bio_iov_iter_bounce
selftests: ublk: cap nthreads to kernel's actual nr_hw_queues
block: fix handling of dead zone write plugs
block: bio-integrity: Fix null-ptr-deref in bio_integrity_map_user()
block: recompute nr_integrity_segments in blk_insert_cloned_request
block: don't overwrite bip_vcnt in bio_integrity_copy_user()
nvme: fix race condition between connected uevent and STARTED_ONCE flag
Revert "nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808"
nvmet-tcp: Fix potential UAF when ddgst mismatch
nvme-pci: fix use-after-free in nvme_free_host_mem()
nvmet-auth: Do not print DH-HMAC-CHAP secrets
nvme: fix bio leak on mapping failure
nvme: make prp passthrough usage less scary
ublk: reject max_sectors smaller than PAGE_SECTORS in parameter validation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- asus-nb-wmi:
- Use existing keyboard quirk for ASUS Zenbook Duo UX8407AA
- hp-wmi:
- Add support for Victus 16-r0xxx (8BC2)
- intel/vsec_tpmi:
- Move debugfs register before creating devices
- Prevent fault during unbind
- lenovo-wmi-*:
- Fix memory leak in lwmi_dev_evaluate_int()
- Balance IDA id allocation and free
- Balance component bind and unbind
- Prevent sending uninitialized WMI arguments to the device
- Decouple lenovo-wmi-gamezone and lenovo-wmi-other to simplify
module dependency graph
- Limit adding attributes to supported devices
- samsung-galaxybook:
- Handle kbd backlight, mic mute and camera block hotkeys
* tag 'platform-drivers-x86-v7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8407AA
platform/x86: lenovo-wmi-other: Limit adding attributes to supported devices
platform/x86: lenovo-wmi-other: Add Attribute ID helper functions
platform/x86: lenovo-wmi-helpers: Move gamezone enums to wmi-helpers
platform/x86: lenovo: Decouple lenovo-wmi-gamezone and lenovo-wmi-other
platform/x86: lenovo-wmi-other: Fix tunable_attr_01 struct members
platform/x86: lenovo-wmi-other: Zero initialize WMI arguments
platform/x86: lenovo-wmi-other: Balance component bind and unbind
platform/x86: lenovo-wmi-other: Balance IDA id allocation and free
platform/x86: lenovo-wmi-helpers: Fix memory leak in lwmi_dev_evaluate_int()
platform/x86: hp-wmi: Add support for Victus 16-r0xxx (8BC2)
platform/x86/intel/tpmi/plr: Prevent fault during unbind
platform/x86: intel: Add notifiers support
platform/x86: intel: Move debugfs register before creating devices
platform/x86: samsung-galaxybook: Handle ACPI hotkey notifications
platform/x86: samsung-galaxybook: Refactor camera lens cover input device
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Fix potential dead-lock in rhashtable when used by xattr
- Avoid calling kvfree on atomic path in rhashtable
* tag 'v7.1-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
rhashtable: Add bucket_table_free_atomic() helper
mm/slab: Add kvfree_atomic() helper
rhashtable: drop ht->mutex in rhashtable_free_and_destroy()
|
|
Add a new fsnotify_rename_data struct and FSNOTIFY_EVENT_RENAME data
type that carries both the moved dentry and the inode that was
overwritten by the rename (if any).
Update fsnotify_data_inode(), fsnotify_data_dentry(), and
fsnotify_data_sb() to handle the new type, and add a new
fsnotify_data_rename_target() helper for extracting the overwritten
target inode.
Update fsnotify_move() to use the new data type for FS_RENAME and
FS_MOVED_TO events, passing the overwritten target inode through the
event data. FS_MOVED_FROM is unchanged since the source directory
doesn't need overwrite information.
This is done so that fsnotify consumers like nfsd can atomically
observe the overwritten file when a rename replaces an existing entry,
without needing a separate FS_DELETE event.
Assisted-by: Claude (Anthropic Claude Code)
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260428-dir-deleg-v3-7-5a0780ba9def@kernel.org
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
nfsd needs to be able to modify the mask on an existing mark when new
directory delegations are set or unset. Add an exported function that
allows the caller to set and clear bits in the mark->mask, and does
the recalculation if something changed.
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260428-dir-deleg-v3-6-5a0780ba9def@kernel.org
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a new routine that returns a mask of all dir change events that are
currently ignored by any leases. nfsd will use this to determine how to
configure the fsnotify_mark mask.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260428-dir-deleg-v3-4-5a0780ba9def@kernel.org
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If a NFS client requests a directory delegation with a notification
bitmask covering directory change events, the server shouldn't recall
the delegation. Instead the client will be notified of the change after
the fact.
Add support for ignoring lease breaks on directory changes. Add a new
flags parameter to try_break_deleg() and teach __break_lease how to
ignore certain types of delegation break events.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260428-dir-deleg-v3-2-5a0780ba9def@kernel.org
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
93618edf7538 ("cgroup: Defer css percpu_ref kill on rmdir until cgroup is
depopulated") deferred kill_css_finish() at the cgroup level: rmdir waits
for the entire cgroup's populated count to drop to zero, then fires
kill_css_finish() on every subsystem css at once. Replace that with
per-subsys-css deferral. Each subsystem css now tracks its own hierarchical
populated count and independently defers its kill_css_finish() until its own
subtree drains.
The rmdir-race fix carries through unchanged in shape. The dying css's
->css_offline() still waits until no PF_EXITING task references it, and v2's
cgroup-level machinery goes away.
cgroup_apply_control_disable() has the same race shape (PF_EXITING tasks
pinning a css whose ->css_offline() is about to run) and stays synchronous
here. This patch lays the groundwork for fixing it - per-cgroup waiting
can't gate one subsys css being killed while the rest of the cgroup stays
live, but per-css can.
Subtree-wide invariant preserved: a dying ancestor css stays populated
through nr_populated_children until every dying descendant's task drains, so
the walker fires the ancestor's kill_finish_work only after all descendants
have drained.
Add paired smp_mb()s in kill_css_sync() and css_update_populated() to fence
the StoreLoad on (CSS_DYING, populated counter), guaranteeing that either
the walker queues kill_finish_work or the caller fires synchronously.
cgroup_destroy_locked() was implicitly fenced by an unrelated css_set_lock
pair; cgroup_apply_control_disable() in the next patch is not.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Later patches replace the cgroup-level finish_destroy_work deferral added
by 93618edf7538 ("cgroup: Defer css percpu_ref kill on rmdir until cgroup
is depopulated") with a per-subsys-css deferral. That needs each subsystem
css to track its own populated count. Move the populated counters from
cgroup onto cgroup_subsys_state. cgroup->self is itself a
cgroup_subsys_state and self.parent walks the same chain as cgroup_parent(),
so cgroup_update_populated() generalizes to a single css_update_populated()
taking a css. The cgroup-side bookkeeping runs only when the walk started
from a self css.
Keep nr_populated_{domain,threaded}_children on cgroup. Both sum to
self.nr_populated_children, but staying as dedicated fields to allow readers
like cgroup_can_be_thread_root() unlocked access.
css_set_update_populated() also walks the per-subsys-css chain so each
subsystem css's hierarchical populated count is maintained. No reader
consumes those counts yet.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
cgroup_update_populated() updates nr_populated_csets,
nr_populated_domain_children, and nr_populated_threaded_children under
css_set_lock, but cgroup_has_tasks(), cgroup_is_populated(), and
cgroup_can_be_thread_root() read them without holding it. Use
READ_ONCE/WRITE_ONCE.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
cpuset reads cs->css.cgroup->nr_populated_csets directly in two places to
test whether a cgroup has tasks. cgroup.c already has a matching helper,
cgroup_has_tasks(). Move it to cgroup.h as static inline and use that
instead. This is to prepare for relocation of cgroup->nr_populated_csets. No
semantic change.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Jens Axboe <axboe@kernel.dk> says:
One of the nastier things about epoll is how it allows nesting contexts
inside each other, leading to the necessity of loop detection and the
issues that have come with that.
I don't believe there's any reason to support nesting on the io_uring
side, in fact IORING_OP_EPOLL_CTL is a historical mistake, imho. But
let's at least try and contain the damage and disallow nested contexts
from our side.
Christian Brauner <brauner@kernel.org> says:
Bring in the eventpoll specific io_uring changes together with the
eventpoll cleanup I did this cycle. The io_uring changes can go on top
of both through the block tree.
* patches from https://patch.msgid.link/20260514140817.623026-1-axboe@kernel.dk:
eventpoll: rename struct epoll_filefd to epoll_key
eventpoll: add file based control interface
eventpoll: export is_file_epoll()
eventpoll: pass struct epoll_filefd through ep_find() and ep_insert()
Link: https://patch.msgid.link/20260514140817.623026-1-axboe@kernel.dk
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|
|
This more accurately describes what purpose this structure serves, as
a lookup key.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://patch.msgid.link/20260514140817.623026-5-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add do_epoll_ctl_file(), which takes a pre-resolved epoll file and a
struct epoll_filefd for the target rather than two integer file
descriptors. do_epoll_ctl() remains as a thin wrapper.
In preparation for using the file based interface from io_uring.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://patch.msgid.link/20260514140817.623026-4-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make is_file_epoll() available outside of epoll. This is in preparation
from using it from io_uring.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://patch.msgid.link/20260514140817.623026-3-axboe@kernel.dk
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Only used in core block code, so unexport and move the prototype to
blk.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260515045547.3790129-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Only used by bio_copy_data, so implement that directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260515045547.3790129-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Only used to implement zero_fill_bio, so directly implement that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260515045547.3790129-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
This change should cause no difference in behavior; it just cleans up some
hazardous code that could have become a problem in the future.
MSG_NO_SHARED_FRAGS is a kernel-internal flag that cancels the effect of
MSG_SPLICE_PAGES, another kernel-internal flag that influences the
data-sharing semantics of SKBs.
Prevent passing this flag in from userspace via sendmsg() by adding it to
MSG_INTERNAL_SENDMSG_FLAGS.
This is not currently an observable problem because MSG_NO_SHARED_FRAGS
only has an effect if kernel code adds MSG_SPLICE_PAGES to it.
The only codepath that adds MSG_SPLICE_PAGES to user-supplied flags from
which MSG_NO_SHARED_FRAGS hasn't been cleared is the path
tcp_bpf_sendmsg -> tcp_bpf_send_verdict -> tcp_bpf_push, and that is not a
problem because tcp_bpf_sendmsg always intentionally sets
MSG_NO_SHARED_FRAGS anyway.
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260512-msg_no_shared_frags-v1-1-55ea46760331@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fixes for a few OOB/UAF in several HID drivers (Florian Pradines, Lee
Jones, Michael Zaidman, Rosalie Wanders, Sangyun Kim and Tomasz
Pakuła)
- more general sanitation of input data, dealing with potentially
malicious hardware in hid-core (Benjamin Tissoires)
- a few device-specific quirks and fixups
* tag 'hid-for-linus-2026051401' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (22 commits)
HID: logitech-hidpp: Add support for newer Bluetooth keyboards
HID: pidff: Fix integer overflow in pidff_rescale
HID: i2c-hid: add reset quirk for BLTP7853 touchpad
HID: core: introduce hid_safe_input_report()
HID: pass the buffer size to hid_report_raw_event
HID: google: hammer: stop hardware on devres action failure
HID: appletb-kbd: run inactivity autodim from workqueues
HID: appletb-kbd: fix UAF in inactivity-timer cleanup path
HID: playstation: Clamp num_touch_reports
HID: magicmouse: Prevent out-of-bounds (OOB) read during DOUBLE_REPORT_ID
HID: mcp2221: fix OOB write in mcp2221_raw_event()
HID: quirks: really enable the intended work around for appledisplay
HID: hid-sjoy: race between init and usage
HID: uclogic: Fix regression of input name assignment
HID: intel-thc-hid: Intel-quickspi: Fix some error codes
HID: hid-lenovo-go-s: restore OS_TYPE after resume from s2idle
HID: elan: Add support for ELAN SB974D touchpad
HID: sony: add missing size validation for Rock Band 3 Pro instruments
HID: sony: add missing size validation for SMK-Link remotes
HID: sony: remove unneeded WARN_ON() in sony_leds_init()
...
|
|
Add per-cgroup local event counters to track RDMA resource limit
exhaustion from the perspective of individual cgroups. The
rdma.events.local file reports two per-resource counters:
- max: number of times this cgroup's limit was the one that blocked
an allocation in the subtree
- alloc_fail: number of allocation attempts originating from this
cgroup that failed due to an ancestor's limit
This mirrors the design of pids.events.local, where events are
attributed to the cgroup that imposed the limit, not necessarily the
cgroup where the allocation was attempted.
Also extend rdma.events with a hierarchical alloc_fail counter that
tracks allocation failures propagating upward from the requesting
cgroup, complementing the existing max counter, so that rdma.events
and rdma.events.local share the same output format.
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
|