| Age | Commit message (Collapse) | Author |
|
By default, the Linux TCP implementation does not shrink the
advertised window (RFC 7323 calls this "window retraction") with the
following exceptions:
- When an incoming segment cannot be added due to the receive buffer
running out of memory. Since commit 8c670bdfa58e ("tcp: correct
handling of extreme memory squeeze") a zero window will be
advertised in this case. It turns out that reaching the required
memory pressure is easy when window scaling is in use. In the
simplest case, sending a sufficient number of segments smaller than
the scale factor to a receiver that does not read data is enough.
- Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
allowing the tcp window to shrink") addressed the "eating memory"
problem by introducing a sysctl knob that allows shrinking the
window before running out of memory.
However, RFC 7323 does not only state that shrinking the window is
necessary in some cases, it also formulates requirements for TCP
implementations when doing so (Section 2.4).
This commit addresses the receiver-side requirements: After retracting
the window, the peer may have a snd_nxt that lies within a previously
advertised window but is now beyond the retracted window. This means
that all incoming segments (including pure ACKs) will be rejected
until the application happens to read enough data to let the peer's
snd_nxt be in window again (which may be never).
To comply with RFC 7323, the receiver MUST honor any segment that
would have been in window for any ACK sent by the receiver and, when
window scaling is in effect, SHOULD track the maximum window sequence
number it has advertised. This patch tracks that maximum window
sequence number rcv_mwnd_seq throughout the connection and uses it in
tcp_sequence() when deciding whether a segment is acceptable.
rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
tcp_select_window(). If we count tcp_sequence() as fast path, it is
read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
cacheline group.
The logic for handling received data in tcp_data_queue() is already
sufficient and does not need to be updated.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-1-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Merge upstream io_uring fixes to avoid conflicts in later patches.
* io_uring-7.0:
io_uring/kbuf: check if target buffer list is still legacy on recycle
io_uring: fix physical SQE bounds check for SQE_MIXED 128-byte ops
io_uring/eventfd: use ctx->rings_rcu for flags checking
io_uring: ensure ctx->rings is stable for task work flags manipulation
io_uring/bpf_filter: use bpf_prog_run_pin_on_cpu() to prevent migration
io_uring/register: fix comment about task_no_new_privs
|
|
A bio segment may have partial interval block data with the rest
continuing into the next segments because direct-io data payloads only
need to align in memory to the device's DMA limits.
At the same time, the protection information may also be split in
multiple segments. The most likely way that may happen is if two
requests merge, or if we're directly using the io_uring user metadata.
The generate/verify, however, only ever accessed the first bip_vec.
Further, it may be possible to unalign the protection fields from the
user space buffer, or if there are odd additional opaque bytes in front
or in back of the protection information metadata region.
Change up the iteration to allow spanning multiple segments. This patch
is mostly a re-write of the protection information handling to allow any
arbitrary alignments, so it's probably easier to review the end result
rather than the diff.
Many controllers are not able to handle interval data composed of
multiple segments when PI is used, so this patch introduces a new
integrity limit that a low level driver can set to notify that it is
capable, default to false. The nvme driver is the first one to enable it
in this patch. Everyone else will force DMA alignment to the logical
block size as before to ensure interval data is always aligned within a
single segment.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://patch.msgid.link/20260313144701.1221652-2-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
DEFINE_GUARD_COND() for device_lock_interruptible()
Introduce conditional guard version of device_lock() for scenarios that
require conditional device lock holding.
This is a stable tag for other trees to merge.
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
UDP-Lite supports variable-length checksum and has two socket
options, UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV, to control
the checksum coverage.
Let's remove the support.
setsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was only
available for UDP-Lite and returned -ENOPROTOOPT for UDP.
Now, the options are handled in ip_setsockopt() and
ipv6_setsockopt(), which still return the same error.
getsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was available
for UDP and always returned 0, meaning full checksum, but now
-ENOPROTOOPT is returned.
Given that getsockopt() is meaningless for UDP and even the options
are not defined under include/uapi/, this should not be a problem.
$ man 7 udplite
...
BUGS
Where glibc support is missing, the following definitions
are needed:
#define IPPROTO_UDPLITE 136
#define UDPLITE_SEND_CSCOV 10
#define UDPLITE_RECV_CSCOV 11
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260311052020.1213705-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device
always wait for the completion of each DMA buffer. That is,
issuing the DMA sync and waiting for completion is done in a
single API call.
For scatter-gather lists with multiple entries, this means
issuing and waiting is repeated for each entry, which can hurt
performance. Architectures like ARM64 may be able to issue all
DMA sync operations for all entries first and then wait for
completion together.
To address this, arch_sync_dma_for_* now batches DMA operations
and performs a flush afterward. On ARM64, the flush is implemented
with a dsb instruction in arch_sync_dma_flush(). On other
architectures, arch_sync_dma_flush() is currently a nop.
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Reviewed-by: Juergen Gross <jgross@suse.com> # drivers/xen/swiotlb-xen.c
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
Signed-off-by: Barry Song <baohua@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260228221316.59934-1-21cnbao@gmail.com
|
|
Currently, there are two helpers to match the root compatible value
against an of_device_id array:
- of_machine_device_match() returns true if a match is found,
- of_machine_get_match_data() returns the match data if a match is
found.
However, there is no helper that returns the actual of_device_id
structure corresponding to the match, leading to code duplication in
various drivers.
Fix this by reworking of_machine_device_match() to return the actual
match structure, and renaming it to of_machine_get_match().
Retain the old of_machine_device_match() functionality using a cheap
static inline wrapper around the new of_machine_get_match() helper.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/14e1c03d443b1a5f210609ec3a1ebbaeab8fb3d9.1772468323.git.geert+renesas@glider.be
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
Add SCX_ENQ_IMMED enqueue flag for local DSQ insertions. Once a task is
dispatched with IMMED, it either gets on the CPU immediately and stays on it,
or gets reenqueued back to the BPF scheduler. It will never linger on a local
DSQ behind other tasks or on a CPU taken by a higher-priority class.
rq_is_open() uses rq->next_class to determine whether the rq is available,
and wakeup_preempt_scx() triggers reenqueue when a higher-priority class task
arrives. These capture all higher class preemptions. Combined with reenqueue
points in the dispatch path, all cases where an IMMED task would not execute
immediately are covered.
SCX_TASK_IMMED persists in p->scx.flags until the next fresh enqueue, so the
guarantee survives SAVE/RESTORE cycles. If preempted while running,
put_prev_task_scx() reenqueues through ops.enqueue() with
SCX_TASK_REENQ_PREEMPTED instead of silently placing the task back on the
local DSQ.
This enables tighter scheduling latency control by preventing tasks from
piling up on local DSQs. It also enables opportunistic CPU sharing across
sub-schedulers - without this, a sub-scheduler can stuff the local DSQ of a
shared CPU, making it difficult for others to use.
v2: - Rewrite is_curr_done() as rq_is_open() using rq->next_class and
implement wakeup_preempt_scx() to achieve complete coverage of all
cases where IMMED tasks could get stranded.
- Track IMMED persistently in p->scx.flags and reenqueue
preempted-while-running tasks through ops.enqueue().
- Bound deferred reenq cycles (SCX_REENQ_LOCAL_MAX_REPEAT).
- Misc renames, documentation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Fix nvme-pci IRQ race and slab-out-of-bounds access
- Fix recursive workqueue locking for target async events
- Various cleanups
- Fix a potential NULL pointer dereference in ublk on size setting
- ublk automatic partition scanning fix
- Two s390 dasd fixes
* tag 'block-7.0-20260312' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
nvme: Annotate struct nvme_dhchap_key with __counted_by
nvme-core: do not pass empty queue_limits to blk_mq_alloc_queue()
nvme-pci: Fix race bug in nvme_poll_irqdisable()
nvmet: move async event work off nvmet-wq
nvme-pci: Fix slab-out-of-bounds in nvme_dbbuf_set
s390/dasd: Copy detected format information to secondary device
s390/dasd: Move quiesce state with pprc swap
ublk: don't clear GD_SUPPRESS_PART_SCAN for unprivileged daemons
ublk: fix NULL pointer dereference in ublk_ctrl_set_size()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Fix an inverted true/false comment on task_no_new_privs, from the
BPF filtering changes merged in this release
- Use the migration disabling way of running the BPF filters, as the
io_uring side doesn't do that already
- Fix an issue with ->rings stability under resize, both for local
task_work additions and for eventfd signaling
- Fix an issue with SQE mixed mode, where a bounds check wasn't correct
for having a 128b SQE
- Fix an issue where a legacy provided buffer group is changed to to
ring mapped one while legacy buffers from that group are in flight
* tag 'io_uring-7.0-20260312' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/kbuf: check if target buffer list is still legacy on recycle
io_uring: fix physical SQE bounds check for SQE_MIXED 128-byte ops
io_uring/eventfd: use ctx->rings_rcu for flags checking
io_uring: ensure ctx->rings is stable for task work flags manipulation
io_uring/bpf_filter: use bpf_prog_run_pin_on_cpu() to prevent migration
io_uring/register: fix comment about task_no_new_privs
|
|
Introduce dev_is_auxiliary() in analogy with dev_is_platform() to
facilitate subsequent changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/5079467.GXAFRqVoOG@rafael.j.wysocki
|
|
security/integrity/secure_boot.c contains a single __weak function,
which breaks recordmcount when building with clang:
$ make -skj"$(nproc)" ARCH=powerpc LLVM=1 ppc64_defconfig security/integrity/secure_boot.o
Cannot find symbol for section 2: .text.
security/integrity/secure_boot.o: failed
Introduce a Kconfig symbol, CONFIG_HAVE_ARCH_GET_SECUREBOOT, to indicate
that an architecture provides a definition of arch_get_secureboot().
Provide a static inline stub when this symbol is not defined to achieve
the same effect as the __weak function, allowing secure_boot.c to be
removed altogether. Move the s390 definition of arch_get_secureboot()
out of the CONFIG_KEXEC_FILE block to ensure it is always available, as
it does not actually depend on KEXEC_FILE.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 31a6a07eefeb ("integrity: Make arch_ima_get_secureboot integrity-wide")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
According to the definition in IEEE Std 802.11be-2024, Table 9-417r, each
bit indicates support for the transmission and reception of EHT-MCS 15 in:
- B0: 52+26-tone and 106+26-tone MRUs.
- B1: a 484+242-tone MRU if 80 MHz is supported.
- B2: a 996+484-tone MRU and a 996+484+242-tone MRU if 160 MHz is
supported.
- B3: a 3×996-tone MRU if 320 MHz is supported.
Fixes: 6239da18d2f9 ("wifi: mac80211: adjust EHT capa when lowering bandwidth")
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Link: https://patch.msgid.link/20260313062150.3165433-1-shayne.chen@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Constify the groups arrays, allowing to assign constant arrays.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/7ff56b07-09ca-4948-b98f-5bd37ceef21e@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
device_remove_groups
Now that sysfs_create_groups() and sysfs_remove_groups() allow to
pass constant groups arrays, we can constify the groups array argument
also here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/8ea2d6d1-0adb-4d7f-92bc-751e93ce08d6@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Constify the groups array argument where applicable. This allows to
pass constant arrays as arguments.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/17035265-8882-4101-b7a7-16b3eb94f8b5@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The alternate screen support added by commit 23743ba64709 ("vt: add
support for smput/rmput escape codes") only saves and restores the
regular screen buffer (vc_origin), but completely ignores the corresponding
unicode screen buffer (vc_uni_lines) creating a messed-up display.
Add vc_saved_uni_lines to save the unicode screen buffer when entering
the alternate screen, and restore it when leaving. Also ensure proper
cleanup in reset_terminal() and vc_deallocate().
Fixes: 23743ba64709 ("vt: add support for smput/rmput escape codes")
Cc: stable <stable@kernel.org>
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Link: https://patch.msgid.link/5o2p6qp3-91pq-0p17-or02-1oors4417ns7@onlyvoer.pbz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Use the correct struct member name and function parameter name in
kernel-doc comments.
Move a macro that was between a struct's documentation and its
declaration.
These eliminate the following kernel-doc warnings:
Warning: include/linux/tee_core.h:73 struct member 'c_no_users' not
described in 'tee_device'
Warning: include/linux/tee_core.h:132 #define TEE_DESC_PRIVILEGED
0x1; error: Cannot parse struct or union!
Warning: include/linux/tee_core.h:257 function parameter 'connection_data'
not described in 'tee_session_calc_client_uuid'
Warning: include/linux/tee_core.h:320 function parameter 'teedev'
not described in 'tee_get_drvdata'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
|
|
Blamed commits forgot that vxlan/geneve use udp_tunnel[6]_xmit_skb() which
call iptunnel_xmit_stats().
iptunnel_xmit_stats() was assuming tunnels were only using
NETDEV_PCPU_STAT_TSTATS.
@syncp offset in pcpu_sw_netstats and pcpu_dstats is different.
32bit kernels would either have corruptions or freezes if the syncp
sequence was overwritten.
This patch also moves pcpu_stat_type closer to dev->{t,d}stats to avoid
a potential cache line miss since iptunnel_xmit_stats() needs to read it.
Fixes: 6fa6de302246 ("geneve: Handle stats using NETDEV_PCPU_STAT_DSTATS.")
Fixes: be226352e8dc ("vxlan: Handle stats using NETDEV_PCPU_STAT_DSTATS.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20260311123110.1471930-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce conditional guard version of device_lock() for scenarios that
require conditional device lock holding.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Li Ming <ming.li@zohomail.com>
Link: https://patch.msgid.link/20260310-fix_access_endpoint_without_drv_check-v1-1-94fe919a0b87@zohomail.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-7.0-rc4).
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
db25c42c2e1f9 ("net/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQ")
dff1c3164a692 ("net/mlx5e: SHAMPO, Always calculate page size")
https://lore.kernel.org/aa7ORohmf67EKihj@sirena.org.uk
drivers/net/ethernet/ti/am65-cpsw-nuss.c
840c9d13cb1ca ("net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support")
a23c657e332f2 ("net: ethernet: ti: am65-cpsw: Use also port number to identify timestamps")
https://lore.kernel.org/abK3EkIXuVgMyGI7@sirena.org.uk
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN and netfilter.
Current release - regressions:
- eth: mana: Null service_wq on setup error to prevent double destroy
Previous releases - regressions:
- nexthop: fix percpu use-after-free in remove_nh_grp_entry
- sched: teql: fix NULL pointer dereference in iptunnel_xmit on TEQL slave xmit
- bpf: fix nd_tbl NULL dereference when IPv6 is disabled
- neighbour: restore protocol != 0 check in pneigh update
- tipc: fix divide-by-zero in tipc_sk_filter_connect()
- eth:
- mlx5:
- fix crash when moving to switchdev mode
- fix DMA FIFO desync on error CQE SQ recovery
- iavf: fix PTP use-after-free during reset
- bonding: fix type confusion in bond_setup_by_slave()
- lan78xx: fix WARN in __netif_napi_del_locked on disconnect
Previous releases - always broken:
- core: add xmit recursion limit to tunnel xmit functions
- net-shapers: don't free reply skb after genlmsg_reply()
- netfilter:
- fix stack out-of-bounds read in pipapo_drop()
- fix OOB read in nfnl_cthelper_dump_table()
- mctp:
- fix device leak on probe failure
- i2c: fix skb memory leak in receive path
- can: keep the max bitrate error at 5%
- eth:
- bonding: fix nd_tbl NULL dereference when IPv6 is disabled
- bnxt_en: fix RSS table size check when changing ethtool channels
- amd-xgbe: prevent CRC errors during RX adaptation with AN disabled
- octeontx2-af: devlink: fix NIX RAS reporter recovery condition"
* tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
net: prevent NULL deref in ip[6]tunnel_xmit()
octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status
octeontx2-af: devlink: fix NIX RAS reporter recovery condition
net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support
net/mana: Null service_wq on setup error to prevent double destroy
selftests: rtnetlink: add neighbour update test
neighbour: restore protocol != 0 check in pneigh update
net: dsa: realtek: Fix LED group port bit for non-zero LED group
tipc: fix divide-by-zero in tipc_sk_filter_connect()
net: dsa: microchip: Fix error path in PTP IRQ setup
bpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled
bpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled
net: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled
ipv6: move the disable_ipv6_mod knob to core code
net: bcmgenet: fix broken EEE by converting to phylib-managed state
net-shapers: don't free reply skb after genlmsg_reply()
net: dsa: mxl862xx: don't set user_mii_bus
net: ethernet: arc: emac: quiesce interrupts before requesting IRQ
page_pool: store detach_time as ktime_t to avoid false-negatives
net: macb: Shuffle the tx ring before enabling tx
...
|
|
Sync up with the mainline to brig up the latest changes, specifically
changes to ALPS driver.
|
|
Some SoC drivers reimplement the functionality of
soc_device_get_machine(). Make this function accessible through the
sys_soc.h header and rename it to a more descriptive name.
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260223-soc-of-root-v2-4-b45da45903c8@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Provide a helper function allowing users to read the model string of the
machine, hiding the access to the root node.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260223-soc-of-root-v2-2-b45da45903c8@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Provide a helper function allowing users to read the compatible string
of the machine, hiding the access to the root node.
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260223-soc-of-root-v2-1-b45da45903c8@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
8250_port exports serial8250_handle_irq() to HW specific 8250 drivers.
It takes port's lock within but a HW specific 8250 driver may want to
take port's lock itself, do something, and then call the generic
handler in 8250_port but to do that, the caller has to release port's
lock for no good reason.
Introduce serial8250_handle_irq_locked() which a HW specific driver can
call while already holding port's lock.
As this is new export, put it straight into a namespace (where all 8250
exports should eventually be moved).
Tested-by: Bandal, Shankar <shankar.bandal@intel.com>
Tested-by: Murthy, Shanth <shanth.murthy@intel.com>
Cc: stable <stable@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260203171049.4353-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On the embedded platform, certain critical data, such as IMU data, is
transmitted through UART. The tty_flip_buffer_push() interface in the TTY
layer uses system_dfl_wq to handle the flipping of the TTY buffer.
Although the unbound workqueue can create new threads on demand and wake
up the kworker thread on an idle CPU, it may be preempted by real-time
tasks or other high-prio tasks.
flush_to_ldisc() needs to wake up the relevant data handle thread. When
executing __wake_up_common_lock(), it calls spin_lock_irqsave(), which
does not disable preemption but disables migration in RT-Linux. This
prevents the kworker thread from being migrated to other cores by CPU's
balancing logic, resulting in long delays. The call trace is as follows:
__wake_up_common_lock
__wake_up
ep_poll_callback
__wake_up_common
__wake_up_common_lock
__wake_up
n_tty_receive_buf_common
n_tty_receive_buf2
tty_ldisc_receive_buf
tty_port_default_receive_buf
flush_to_ldisc
In our system, the processing interval for each frame of IMU data
transmitted via UART can experience significant jitter due to this issue.
Instead of the expected 10 to 15 ms frame processing interval, we see
spikes up to 30 to 35 ms. Moreover, in just one or two hours, there can
be 2 to 3 occurrences of such high jitter, which is quite frequent. This
jitter exceeds the software's tolerable limit of 20 ms.
Introduce flip_wq in tty_port which can be set by tty_port_link_wq() or as
default linked to default workqueue allocated when tty_register_driver().
The default workqueue is allocated with flag WQ_SYSFS, so that cpumask and
nice can be set dynamically. The execution timing of tty_port_link_wq() is
not clearly restricted. The newly added function tty_port_link_driver_wq()
checks whether the flip_wq of the tty_port has already been assigned when
linking the default tty_driver's workqueue to the port. After the user has
set a custom workqueue for a certain tty_port using tty_port_link_wq(), the
system will only use this custom workqueue, even if tty_driver does not
have %TTY_DRIVER_NO_WORKQUEUE flag. When tty_port register device, flip_wq
link operation is done by tty_port_link_driver_wq(), but for in-memory
devices the link operation cannot cover all the cases. Although
tty_port_install() is dedicated for in-memory devices lik PTY to link port
allocated on demand, the logic of tty_port_install() is so simple that
people may not call it, vc_cons[0].d->port is one such case. We check the
buf.flip_wq when flip TTY buffer, if buf.flip_wq of TTY port is NULL, use
system_dfl_wq as a backup.
To avoid naming conflict of the default tty_driver's workqueue, using
'"%s-%s", driver->name, driver->driver_name' as the workqueue name. In
cases where driver_name is not specified and therefore is NULL, the
workqueue is not created. Drivers that do not define driver_name are
potentially in-memory devices like vty, which generally do not require
special workqueue settings. Even with the combination of name and
driver_name, the workqueue names can still be duplicated, as many tty
serial drivers use "ttyS" as dev_name and "serial" as driver_name. I
modified the conflicting driver_name of these drivers by appending a
suffix of _xx based on the corresponding .c file. If this modification is
not made, it could not only lead to duplicate workqueue names but also
result in duplicate entries for the /proc/tty/driver/<driver_name> nodes.
Introduce %TTY_DRIVER_NO_WORKQUEUE flag meaning not to create the
default single tty_driver workqueue. Two reasons why need to introduce the
%TTY_DRIVER_NO_WORKQUEUE flag:
1. If the WQ_SYSFS parameter is enabled, workqueue_sysfs_register() will
fail when trying to create a workqueue with the same name. The pty is an
example of this; if both CONFIG_LEGACY_PTYS and CONFIG_UNIX98_PTYS are
enabled, the call to tty_register_driver() in unix98_pty_init() will fail.
2. Different TTY ports may be used for different tasks, which may require
separate core binding control via workqueues. In this case, the workqueue
created by default in the TTY driver is unnecessary. Enabling this flag
prevents the creation of this redundant workqueue.
After applying this patch, we can set the related UART TTY flip buffer
workqueue by sysfs. We set the cpumask to CPU cores associated with the
IMU tasks, and set the nice to -20. Testing has shown significant
improvement in the previously described issue, with almost no stuttering
occurring anymore.
Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
Link: https://patch.msgid.link/20260213085039.3274704-1-jackzxcui1989@163.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Backmerging to bring in 7.00-rc3. Important ahead GPU SVM merging THP
support.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
|
|
Correct kernel-doc comment format and add a missing to avoid
kernel-doc warnings:
Warning: include/linux/serdev.h:49 struct member 'write_comp' not
described in 'serdev_device'
Warning: include/linux/serdev.h:49 struct member 'write_lock' not
described in 'serdev_device'
Warning: include/linux/serdev.h:68 struct member 'shutdown' not described
in 'serdev_device_driver'
Warning: include/linux/serdev.h:134 function parameter 'serdev' not
described in 'serdev_device_put'
Warning: include/linux/serdev.h:162 function parameter 'ctrl' not
described in 'serdev_controller_put'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260311052347.305612-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
tty_ldisc_ops is not modified once registered, so make it const.
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260206062004.1273890-1-dqfext@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Majority of tcp_release_cb() calls do nothing at all.
Provide tcp_release_cb_cond() helper so that release_sock()
can avoid these calls.
Also hint the compiler that __release_sock() and wake_up()
are rarely called.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-77 (-77)
Function old new delta
release_sock 258 181 -77
Total: Before=25235790, After=25235713, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260310124451.2280968-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
HRTIMER_MAX_CLOCK_BASES is required to stay the last value of the enum.
Drop the trailing comma so no new members are added after it by mistake.
Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-11-095357392669@linutronix.de
|
|
These fields are initialized once and are never supposed to change.
Mark them as const to make this explicit.
Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-10-095357392669@linutronix.de
|
|
This spurious space makes grepping for the enum definition annoying.
Remove it.
Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-8-095357392669@linutronix.de
|
|
There are no users left.
Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-6-095357392669@linutronix.de
|
|
The sentinel value added by the wrapper macros __print_symbolic() et al
prevents the callers from adding their own trailing comma. This makes
constructing symbol list dynamically based on kconfig values tedious.
Drop the sentinel elements, so callers can either specify the trailing
comma or not, just like in regular array initializers.
Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-2-095357392669@linutronix.de
|
|
with sparse
There are two versions of the __this_cpu_local_lock() definitions in
include/linux/local_lock_internal.h: one version that relies on the
Clang overloading functionality and another version that does not.
Select the latter version when using sparse. This patch fixes the
following errors reported by sparse:
include/linux/local_lock_internal.h:331:40: sparse: sparse: multiple definitions for function '__this_cpu_local_lock'
include/linux/local_lock_internal.h:325:37: sparse: the previous one is here
Closes: https://lore.kernel.org/oe-kbuild-all/202603062334.wgI5htP0-lkp@intel.com/
Fixes: d3febf16dee2 ("locking/local_lock: Support Clang's context analysis")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Marco Elver <elver@google.com>
Link: https://patch.msgid.link/20260311231455.1961413-1-bvanassche@acm.org
|
|
Biju Das needs a patch for rz-du merged in 7.0-rc3
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Use the correct function name and function parameter name to avoid
these kernel-doc warnings:
Warning: include/linux/wait_bit.h:424 expecting prototype for
wait_var_event_killable(). Prototype was for
wait_var_event_interruptible() instead
Warning: include/linux/wait_bit.h:508 function parameter 'lock' not
described in 'wait_var_event_mutex'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260302005237.3473095-1-rdunlap@infradead.org
|
|
From: Jakub Kicinski <kuba@kernel.org>
Make sure disable_ipv6_mod itself is not part of the IPv6 module,
in case core code wants to refer to it. We will remove support
for IPv6=m soon, this change helps make fixes we commit before
that less messy.
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-1-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the global cgroup_file_kn_lock with a per-cgroup_file spinlock
to eliminate cross-cgroup contention as it is not really protecting
data shared between different cgroups.
The lock is initialized in cgroup_add_file() alongside timer_setup().
No lock acquisition is needed during initialization since the cgroup
directory is being populated under cgroup_mutex and no concurrent
accessors exist at that point.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Add a new clone3() flag CLONE_AUTOREAP that makes a child process
auto-reap on exit without ever becoming a zombie. This is a per-process
property in contrast to the existing auto-reap mechanism via
SA_NOCLDWAIT or SIG_IGN for SIGCHLD which applies to all children of a
given parent.
Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property
affecting all children which makes it unsuitable for libraries or
applications that need selective auto-reaping of specific children while
still being able to wait() on others.
CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct.
When the child exits do_notify_parent() checks this flag and causes
exit_notify() to transition the task directly to EXIT_DEAD. Since the
flag lives on the child it survives reparenting: if the original parent
exits and the child is reparented to a subreaper or init the child still
auto-reaps when it eventually exits.
CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to
monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern where the parent simply doesn't care about the child's exit
status. No exit signal is delivered so exit_signal must be zero.
CLONE_AUTOREAP is rejected in combination with CLONE_PARENT. If a
CLONE_AUTOREAP child were to clone(CLONE_PARENT) the new grandchild
would inherit exit_signal == 0 from the autoreap parent's group leader
but without signal->autoreap. This grandchild would become a zombie that
never sends a signal and is never autoreaped - confusing and arguably
broken behavior.
The flag is not inherited by the autoreap process's own children. Each
child that should be autoreaped must be explicitly created with
CLONE_AUTOREAP.
Link: https://github.com/uapi-group/kernel-features/issues/45
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-1-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
the ring is being resized, it's possible for the OR'ing of
IORING_SQ_TASKRUN to happen in the small window of swapping into the
new rings and the old rings being freed.
Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
protected by RCU. The task work flags manipulation is inside RCU
already, and if the resize ring freeing is done post an RCU synchronize,
then there's no need to add locking to the fast path of task work
additions.
Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
that supports ring resizing. If this ever changes, then they too need to
use the io_ctx_mark_taskrun() helper.
Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pick up the hrtick related hrtimer changes so other unrelated changes can
be queued on top.
|
|
efi_range_is_wc() has no callers, so remove it.
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
HEAD
KVM generic changes for 7.0
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
unnecessary and confusing, triggered compiler warnings due to
-Wflex-array-member-not-at-end.
- Document that vcpu->mutex is take outside of kvm->slots_lock and
kvm->slots_arch_lock, which is intentional and desirable despite being
rather unintuitive.
|
|
A flex array can be used to reduce the allocation to 1.
And actually mtdconcat was using the pointer + 1 trick to point to the
overallocated area. Better alternatives exist.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
Some USB devices incorrectly report bNumConfigurations as 0 in their
device descriptor, which causes the USB core to reject them during
enumeration.
logs:
usb 1-2: device descriptor read/64, error -71
usb 1-2: no configurations
usb 1-2: can't read configurations, error -22
However, these devices actually work correctly when
treated as having a single configuration.
Add a new quirk USB_QUIRK_FORCE_ONE_CONFIG to handle such devices.
When this quirk is set, assume the device has 1 configuration instead
of failing with -EINVAL.
This quirk is applied to the device with VID:PID 5131:2007 which
exhibits this behavior.
Signed-off-by: Jie Deng <dengjie03@kylinos.cn>
Link: https://patch.msgid.link/20260227084931.1527461-1-dengjie03@kylinos.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The usb_control_msg(), usb_bulk_msg(), and usb_interrupt_msg() APIs in
usbcore allow unlimited timeout durations. And since they use
uninterruptible waits, this leaves open the possibility of hanging a
task for an indefinitely long time, with no way to kill it short of
unplugging the target device.
To prevent this sort of problem, enforce a maximum limit on the length
of these unkillable timeouts. The limit chosen here, somewhat
arbitrarily, is 60 seconds. On many systems (although not all) this
is short enough to avoid triggering the kernel's hung-task detector.
In addition, clear up the ambiguity of negative timeout values by
treating them the same as 0, i.e., using the maximum allowed timeout.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/linux-usb/3acfe838-6334-4f6d-be7c-4bb01704b33d@rowland.harvard.edu/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Link: https://patch.msgid.link/15fc9773-a007-47b0-a703-df89a8cf83dd@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|