summaryrefslogtreecommitdiff
path: root/net/mptcp
AgeCommit message (Collapse)Author
8 daysmptcp: add-addr: always drop other suboptionsMatthieu Baerts (NGI0)
When an ADD_ADDR needs to be sent, it could be prepared if there is enough remaining space and even if the packet is not a pure ACK. But it would be dropped soon after. Indeed, in mptcp_pm_add_addr_signal(), there is enough space to fit a DSS of 20 octets and an ADD_ADDR echo containing an IPv4 address on 8 octets for example. In this case, the packet would be prepared, the MPTCP_ADD_ADDR_ECHO bit would be removed from pm->addr_signal, but the option would be silently dropped in mptcp_established_options_add_addr() not to override DSS info in the union from 'struct mptcp_out_options', and also because mptcp_write_options() will enforce mutually exclusion with DSS. Instead, don't even try to send an ADD_ADDR if it is not a pure ACK. Retry for each new packet until a pure-ACK is emitted. That's fine to do that, because each time an ADD_ADDR (echo) is scheduled, a pure ACK is queued. This also simplifies the code, and the skb checks can be done earlier, before the lock. Note: also, since commit 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets"), opts->ahmac would not have been set to 0 when other suboptions were not dropped, and when sending an ADD_ADDR echo. That would have resulted in sending an ADD_ADDR using garbage info, where there was not enough space, instead of an echo one without the ADD_ADDR HMAC. Fixes: 1bff1e43a30e ("mptcp: optimize out option generation") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-11-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: fix uninit-value in mptcp_established_optionsPaolo Abeni
syzbot reported the following uninit splat: BUG: KMSAN: uninit-value in mptcp_write_data_fin net/mptcp/options.c:542 [inline] BUG: KMSAN: uninit-value in mptcp_established_options_dss net/mptcp/options.c:590 [inline] BUG: KMSAN: uninit-value in mptcp_established_options+0x112f/0x3530 net/mptcp/options.c:874 mptcp_write_data_fin net/mptcp/options.c:542 [inline] mptcp_established_options_dss net/mptcp/options.c:590 [inline] mptcp_established_options+0x112f/0x3530 net/mptcp/options.c:874 tcp_established_options+0x312/0xcc0 net/ipv4/tcp_output.c:1192 __tcp_transmit_skb+0x5dc/0x5fe0 net/ipv4/tcp_output.c:1575 __tcp_send_ack+0x967/0xad0 net/ipv4/tcp_output.c:4499 tcp_send_ack+0x3d/0x60 net/ipv4/tcp_output.c:4505 mptcp_subflow_shutdown+0x164/0x690 net/mptcp/protocol.c:3137 mptcp_check_send_data_fin+0x31b/0x3d0 net/mptcp/protocol.c:3218 __mptcp_wr_shutdown net/mptcp/protocol.c:3234 [inline] __mptcp_close+0x860/0x1360 net/mptcp/protocol.c:3313 mptcp_close+0x42/0x260 net/mptcp/protocol.c:3367 inet_release+0x1ee/0x2a0 net/ipv4/af_inet.c:442 __sock_release net/socket.c:722 [inline] sock_close+0xd6/0x2f0 net/socket.c:1514 __fput+0x60e/0x1010 fs/file_table.c:510 ____fput+0x25/0x30 fs/file_table.c:538 task_work_run+0x208/0x2b0 kernel/task_work.c:233 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] __exit_to_user_mode_loop kernel/entry/common.c:67 [inline] exit_to_user_mode_loop+0x306/0x1b60 kernel/entry/common.c:98 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline] syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:238 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline] __do_fast_syscall_32+0x2c7/0x460 arch/x86/entry/syscall_32.c:310 do_fast_syscall_32+0x37/0x80 arch/x86/entry/syscall_32.c:332 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:370 entry_SYSENTER_compat_after_hwframe+0x84/0x8e Local variable opts created at: __tcp_transmit_skb+0x4d/0x5fe0 net/ipv4/tcp_output.c:1536 __tcp_send_ack+0x967/0xad0 net/ipv4/tcp_output.c:4499 The output path currently omits initializing the mptcp extension `use_map` flag in a few corner cases. Address the issue always zeroing all the extensions flags before eventually initializing the individual bits. To that extent, introduce and use a struct_group to avoid multiple bitwise operations. Fixes: cfcceb7a39fc ("tcp: shrink per-packet memset in __tcp_transmit_skb()") Cc: stable@vger.kernel.org Reported-by: syzbot+ff020673c5e3d94d9478@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ff020673c5e3d94d9478 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-10-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: check desc->count in read_sockGang Yan
__tcp_read_sock() checks desc->count after each skb is consumed and breaks the loop when it reaches 0. The MPTCP variant lacks this check. This is a functional bug, other subsystems also rely on this check: TLS strparser sets desc->count to 0 once a full TLS record is assembled and depends on this break to stop reading. Add the same desc->count check to __mptcp_read_sock(), mirroring __tcp_read_sock(). Fixes: 250d9766a984 ("mptcp: implement .read_sock") Cc: stable@vger.kernel.org Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-9-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: sockopt: set sockopt on all subflowsMatthieu Baerts (NGI0)
The mptcp_setsockopt_all_sf(), currently used only with TCP_MAXSEG, stopped when one subflow returned an error. Even if it is not wrong, this is different from the other helpers trying to set the option on all subflows, and then returning an error if at least one of them had an issue. Follow this behaviour, for a question of uniformity. Fixes: 51c5fd09e1b4 ("mptcp: add TCP_MAXSEG sockopt support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-8-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: sockopt: check timestamping ret valueMatthieu Baerts (NGI0)
sock_set_timestamping() can fail for different reasons. The returned value should then be checked. If sock_set_timestamping() fails for at least one subflow, the first error is now reported to the userspace, similar to what is done with other socket options. Fixes: 9061f24bf82e ("mptcp: sockopt: propagate timestamp request to subflows") Cc: stable@vger.kernel.org Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Closes: https://lore.kernel.org/willemdebruijn.kernel.178a41a53d041@gmail.com Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-7-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: pm: fix extra_subflows underflow on userspace PM subflow creationTao Cui
The userspace PM increments extra_subflows after __mptcp_subflow_connect() succeeds, but __mptcp_subflow_connect() calls mptcp_pm_close_subflow() on failure to roll back the pre-increment done by the kernel PM's fill_*() helpers. Because the userspace PM hasn't incremented yet at that point, this decrement is spurious and causes extra_subflows to underflow. Fix it by aligning the userspace PM with the kernel PM: increment extra_subflows before calling __mptcp_subflow_connect(), so the existing error path in subflow.c correctly rolls it back on failure. Also simplify the error handling by taking pm.lock only when needed for cleanup. Fixes: 77e4b94a3de6 ("mptcp: update userspace pm infos") Cc: stable@vger.kernel.org Signed-off-by: Tao Cui <cuitao@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-5-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: allow subflow rcv wnd to shrinkPaolo Abeni
In MPTCP connection, the `window` field in the TCP header refers to the MPTCP-level rcv_nxt and it's right edge should not move backward. Such constraint is enforced at DSS option generation time. At the same time, the TCP stack ensures independently that the TCP-level rcv wnd right's edge does not move backward. That in turn causes artificial inflating of the MPTCP rcv window when the incoming data is acked at the TCP level and is OoO in the MPTCP sequence space (or lands in the backlog). As a consequence, the incoming traffic can exceed the receiver rcvbuf size even when the sender is not misbehaving. Prevent such scenario forcibly allowing the TCP subflow to shrink the TCP-level rcv wnd regardless of the current netns setting. Fixes: f3589be0c420 ("mptcp: never shrink offered window") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-4-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: close TOCTOU race while computing rcv_wndPaolo Abeni
The MPTCP output path access locklessly the MPTCP-level ack_seq in multiple times, using possibly different values for the data_ack in the DSS option and to compute the announced rcv wnd for the same packet. Refactor the cote to avoid inconsistencies which may confuse the peer. Also ensure that the MPTCP level rcv wnd is updated only when the egress packet actually contains a DSS ack. Fixes: fa3fe2b15031 ("mptcp: track window announced to peer") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-3-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: fix retransmission loop when csum is enabledPaolo Abeni
Sashiko noted that retransmission with csum enabled can actually transmit new data, but currently the relevant code does not update accordingly snd_nxt. The may cause incoming ack drop and an endless retransmission loop. Address the issue incrementing snd_nxt as needed. Fixes: 4e14867d5e91 ("mptcp: tune re-injections for csum enabled mode") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-2-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysmptcp: fix missing wakeups in edge scenariosPaolo Abeni
The mptcp_recvmsg() can fill MPTCP socket receive queue via mptcp_move_skbs(), but currently does not try to wakeup any listener, because the same process is going to check the receive queue soon. When multiple threads are reading from the same fd, the above can cause stall. Add the missing wakeup. Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-1-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19mptcp: update window_clamp on subflows when SO_RCVBUF is setGang Yan
Add __mptcp_subflow_set_rcvbuf() helper to write the subflow sk_rcvbuf, but also to call the recently added tcp_set_rcvbuf() helper to update window_clamp. This is needed because the window clap is updated when scaling_ratio changes, in tcp_measure_rcv_mss(). Until scaling_ratio changes, the subflow is stuck with the old window clamp which may be based on a small initial buffer. Use this new helper in both mptcp_sol_socket_sync_intval() (setsockopt path) and sync_socket_options() (new subflow creation path). Note that this patch depends on commit b025461303d8 ("tcp: update window_clamp when SO_RCVBUF is set"): it fixes the issue on TCP side, but the same fix is needed on MPTCP side as well. Fixes: a2cbb1603943 ("tcp: Update window clamping condition") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/619 Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-5-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19mptcp: reset rcv wnd on disconnectPaolo Abeni
If the MPTCP socket fallback to TCP before the MP handshake completion, the IASN remain 0, and the rcv_wnd_sent field is not explicitly initialized, just incremented over time with the data transfer. At disconnect time such value is not cleared. If the next connection falls back to TCP before the MP handshake completion, the data transfer will keep incrementing the receive window end sequence starting from the last value used in the previous connection: the announced window will be unrelated from the actual receiver buffer size and likely too big. Address the issue zeroing the field at disconnect time. Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-4-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19mptcp: pm: fix ADD_ADDR timer infinite retry on option space insufficientLi Xiasong
When TCP option space is insufficient (e.g., when sending ADD_ADDR with an IPv6 address and port while tcp_timestamps is enabled), the original code jumped to out_unlock without clearing the addr_signal flag. This caused mptcp_pm_add_timer to keep rescheduling indefinitely, not sending ADD_ADDR, preventing subsequent addresses in the endpoint list from being announced. Handle this case by clearing the ADD_ADDR signal and skipping the matching ADD_ADDR retransmission entry. The skip path cancels the matching timer (with id check) and advances PM state progression, preserving forward progress to subsequent PM work. This cancellation is inherently best-effort. A concurrent add_timer callback may already be running and may acquire pm.lock before the cancel path updates entry state. In that case, one final ADD_ADDR transmit attempt can still be executed. Once the cancel path sets entry->retrans_times to ADD_ADDR_RETRANS_MAX, the callback-side retrans_times check suppresses further ADD_ADDR retransmissions. Note that when an ADD_ADDR is being prepared, a pure-ACK is queued. On the output side, it means that it is fine to skip non-pure-ACK packets, when drop_other_suboptions is set: a pure-ACK will be processed soon after. Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-2-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-19mptcp: do not drop partial packetsShardul Bankar
When a packet arrives with map_seq < ack_seq < end_seq, the beginning of the packet has already been acknowledged but the end contains new data. Currently the entire packet is dropped as "old data," forcing the sender to retransmit. Instead, skip the already-acked bytes by adjusting the skb offset and enqueue only the new portion. Update bytes_received and ack_seq to reflect the new data consumed. A previous attempt at this fix has been sent by Paolo Abeni [1], but had issues [2]: it also added a zero-window check and changed rcv_wnd_sent initialization, which caused test regressions. This version addresses only the partial packet handling without modifying receive window accounting. Fixes: ab174ad8ef76 ("mptcp: move ooo skbs into msk out of order queue.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/c9b426a4e163aa3c4fe8b80c79f1a610f47ae7d8.1763075056.git.pabeni@redhat.com [1] Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/600 [2] Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> [pabeni@redhat.com: update map] Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-1-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-09Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Alexei Starovoitov: - Fix sk_local_storage diag dump via netlink (Amery Hung) - Fix off-by-one in arena direct-value access (Junyoung Jang) - Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan) - Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima) - Reject TX-only AF_XDP sockets (Linpu Yu) - Don't run arg-tracking analysis twice on main subprog (Paul Chaignon) - Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup (Weiming Shi) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix off-by-one boundary validation in arena direct-value access xskmap: reject TX-only AF_XDP sockets bpf: Don't run arg-tracking analysis twice on main subprog bpf: Free reuseport cBPF prog after RCU grace period. bpf: tcp: Fix type confusion in sol_tcp_sockopt(). bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock(). bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock(). mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow() selftest: bpf: Add test for bpf_tcp_sock() and RAW socket. bpf: tcp: Fix type confusion in bpf_tcp_sock(). tools/headers: Regenerate stddef.h to fix BPF selftests bpf: Fix sk_local_storage diag dumping uninitialized special fields bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup() sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}(). bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks bpf: Reject TCP_NODELAY in bpf-tcp-cc bpf: Reject TCP_NODELAY in TCP header option callbacks
2026-05-08mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()Matthieu Baerts (NGI0)
bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is IPPROTO_TCP, but RAW socket can bypass it: socket(AF_INET, SOCK_RAW, IPPROTO_TCP) In this case, it would NOT be valid to call sk_is_mptcp() which will assume sk is a pointer to a struct tcp_sock, and wrongly checks for: tcp_sk(sk)->is_mptcp. Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260504210610.180150-4-kuniyu@google.com
2026-05-06mptcp: pm: prio: skip closed subflowsMatthieu Baerts (NGI0)
When sending an MP_PRIO, closed subflows need to be skipped. This fixes the case where the initial subflow got closed, re-opened later, then an MP_PRIO is needed for the same local address. Note that explicit MP_PRIO cannot be sent during the 3WHS, so it is fine to use __mptcp_subflow_active(). Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Cc: stable@vger.kernel.org Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-9-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: ADD_ADDR rtx: return early if no retransMatthieu Baerts (NGI0)
No need to iterate over all subflows if there is no retransmission needed. Exit early in this case then. Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-8-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: ADD_ADDR rtx: skip inactive subflowsMatthieu Baerts (NGI0)
When looking at the maximum RTO amongst the subflows, inactive subflows were taken into account: that includes stale ones, and the initial one if it has been already been closed. Unusable subflows are now simply skipped. Stale ones are used as an alternative: if there are only stale ones, to take their maximum RTO and avoid to eventually fallback to net.mptcp.add_addr_timeout, which is set to 2 minutes by default. Fixes: 30549eebc4d8 ("mptcp: make ADD_ADDR retransmission timeout adaptive") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-7-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quickerMatthieu Baerts (NGI0)
When an ADD_ADDR needs to be retransmitted and another one has already been prepared -- e.g. multiple ADD_ADDRs have been sent in a row and need to be retransmitted later -- this additional retransmission will need to wait. In this case, the timer was reset to TCP_RTO_MAX / 8, which is ~15 seconds. This delay is unnecessary long: it should just be rescheduled at the next opportunity, e.g. after the retransmission timeout. Without this modification, some issues can be seen from time to time in the selftests when multiple ADD_ADDRs are sent, and the host takes time to process them, e.g. the "signal addresses, ADD_ADDR timeout" MPTCP Join selftest, especially with a debug kernel config. Note that on older kernels, 'timeout' is not available. It should be enough to replace it by one second (HZ). Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-6-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: ADD_ADDR rtx: free sk if lastMatthieu Baerts (NGI0)
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(), and released at the end. If at that moment, it was the last reference being held, the sk would not be freed. sock_put() should then be called instead of __sock_put(). But that's not enough: if it is the last reference, sock_put() will call sk_free(), which will end up calling sk_stop_timer_sync() on the same timer, and waiting indefinitely to finish. So it is needed to mark that the timer is done at the end of the timer handler when it has not been rescheduled, not to call sk_stop_timer_sync() on "itself". Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-5-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: ADD_ADDR rtx: always decrease sk refcountMatthieu Baerts (NGI0)
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(). It should then be released in all cases at the end. Some (unlikely) checks were returning directly instead of calling sock_put() to decrease the refcount. Jump to a new 'exit' label to call __sock_put() (which will become sock_put() in the next commit) to fix this potential leak. While at it, drop the '!msk' check which cannot happen because it is never reset, and explicitly mark the remaining one as "unlikely". Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-4-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: ADD_ADDR rtx: fix potential data-raceMatthieu Baerts (NGI0)
This mptcp_pm_add_timer() helper is executed as a timer callback in softirq context. To avoid any data races, the socket lock needs to be held with bh_lock_sock(). If the socket is in use, retry again soon after, similar to what is done with the keepalive timer. Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-3-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: ADD_ADDR rtx: allow ID 0Matthieu Baerts (NGI0)
ADD_ADDR can be sent for the ID 0, which corresponds to the local address and port linked to the initial subflow. Indeed, this address could be removed, and re-added later on, e.g. what is done in the "delete re-add signal" MPTCP Join selftests. So no reason to ignore it. Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-2-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-06mptcp: pm: kernel: correctly retransmit ADD_ADDR ID 0Matthieu Baerts (NGI0)
When adding the ADD_ADDR to the list, the address including the IP, port and ID are copied. On the other hand, when the endpoint corresponds to the one from the initial subflow, the ID is set to 0, as specified by the MPTCP protocol. The issue is that the ID was reset after having copied the ID in the ADD_ADDR entry. So the retransmission was done, but using a different ID than the initial one. Fixes: 8b8ed1b429f8 ("mptcp: pm: reuse ID 0 after delete and re-add") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260505-net-mptcp-pm-fixes-7-1-rc3-v1-1-fca8091060a4@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-04mptcp: sockopt: increase seq in mptcp_setsockopt_all_sfMatthieu Baerts (NGI0)
mptcp_setsockopt_all_sf() was missing a call to sockopt_seq_inc(). This is required not to cause missing synchronization for newer subflows created later on. This helper is called each time a socket option is set on subflows, and future ones will need to inherit this option after their creation. Fixes: 51c5fd09e1b4 ("mptcp: add TCP_MAXSEG sockopt support") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-4-b70118df778e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-04mptcp: fix rx timestamp corruption on fastopenPaolo Abeni
The skb cb offset containing the timestamp presence flag is cleared before loading such information. Cache such value before MPTCP CB initialization. Fixes: 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-3-b70118df778e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-04mptcp: use MPTCP_RST_EMPTCP for ACK HMAC validation failureShardul Bankar
When HMAC validation fails on a received ACK + MP_JOIN in subflow_syn_recv_sock(), the subflow is reset with reason MPTCP_RST_EPROHIBIT ("Administratively prohibited"). This is incorrect: HMAC validation failure is an MPTCP protocol-level error, not an administrative policy denial. The mirror site on the client, in subflow_finish_connect(), already uses MPTCP_RST_EMPTCP ("MPTCP-specific error") for the same kind of HMAC failure on the SYN/ACK + MP_JOIN. Use the same reason on the server side for symmetry and accuracy. Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Fixes: 443041deb5ef ("mptcp: fix NULL pointer in can_accept_new_subflow") Cc: stable@vger.kernel.org Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-2-b70118df778e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-04mptcp: use MPJoinSynAckHMacFailure for SynAck HMAC failureShardul Bankar
In subflow_finish_connect(), HMAC validation of the server's HMAC in SYN/ACK + MP_JOIN increments MPTCP_MIB_JOINACKMAC ("HMAC was wrong on ACK + MP_JOIN") on failure. The function processes the SYN/ACK, not the ACK; the matching MPTCP_MIB_JOINSYNACKMAC counter ("HMAC was wrong on SYN/ACK + MP_JOIN") exists but is not incremented anywhere in the tree. The mirror site on the server, subflow_syn_recv_sock(), already uses JOINACKMAC correctly for ACK HMAC failure. Use JOINSYNACKMAC at the SYN/ACK validation site so each counter reflects the packet whose HMAC actually failed. Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-1-b70118df778e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-28mptcp: pm: kernel: reset fullmesh counter after flushMatthieu Baerts (NGI0)
This variable counts how many MPTCP endpoints have a 'fullmesh' flag set. After having flushed all MPTCP endpoints, it is then needed to reset this counter. Without this reset, this counter exposed to the userspace is wrong, but also non-fullmesh endpoints added after the flush will not be taken into account to create subflows in reaction to ADD_ADDRs. Fixes: f88191c7f361 ("mptcp: pm: in-kernel: record fullmesh endp nb") Cc: stable@vger.kernel.org Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260422-mptcp-inc-limits-v6-0-903181771530%40kernel.org?part=15 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-4-7432b7f279fa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-28mptcp: fastclose msk when linger time is 0Matthieu Baerts (NGI0)
The SO_LINGER socket option has been supported for a while with MPTCP sockets [1], but it didn't cause the equivalent of a TCP reset as expected when enabled and its time was set to 0. This was causing some behavioural differences with TCP where some connections were not promptly stopped as expected. To fix that, an extra condition is checked at close() time before sending an MP_FASTCLOSE, the MPTCP equivalent of a TCP reset. Note that backporting up to [1] will be difficult as more changes are needed to be able to send MP_FASTCLOSE. It seems better to stop at [2], which was supposed to already imitate TCP. Validated with MPTCP packetdrill tests [3]. Fixes: 268b12387460 ("mptcp: setsockopt: support SO_LINGER") [1] Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") [2] Cc: stable@vger.kernel.org Reported-by: Lance Tuller <lance@lance0.com> Closes: https://github.com/lance0/xfr/pull/67 Link: https://github.com/multipath-tcp/packetdrill/pull/196 [3] Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-3-7432b7f279fa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-28mptcp: fix scheduling with atomic in timestamp sockoptGang Yan
Using lock_sock_fast() (atomic context) around sock_set_timestamp() and sock_set_timestamping() is unsafe, as both helpers can sleep. Replace lock_sock_fast() with sleepable lock_sock()/release_sock() to avoid scheduling while atomic panic. Fixes: 9061f24bf82e ("mptcp: sockopt: propagate timestamp request to subflows") Cc: stable@vger.kernel.org Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260420093343.16443-1-gang.yan@linux.dev Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-2-7432b7f279fa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-28mptcp: sockopt: set timestamp flags on subflow socket, not mskGang Yan
Both mptcp_setsockopt_sol_socket_tstamp() and mptcp_setsockopt_sol_socket_timestamping() iterate over subflows, acquire the subflow socket lock, but then erroneously pass the MPTCP msk socket to sock_set_timestamp() / sock_set_timestamping() instead of the subflow ssk. As a result, the timestamp flags are set on the wrong socket and have no effect on the actual subflows. Pass ssk instead of sk to both helpers. Fixes: 9061f24bf82e ("mptcp: sockopt: propagate timestamp request to subflows") Cc: stable@vger.kernel.org Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260427-net-mptcp-misc-fixes-7-1-rc2-v1-1-7432b7f279fa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23mptcp: sync the msk->sndbuf at accept() timeGang Yan
On passive MPTCP connections, the msk sndbuf is not updated correctly. The root cause is an order issue in the accept path: - tcp_check_req() -> subflow_syn_recv_sock() -> mptcp_sk_clone_init() calls __mptcp_propagate_sndbuf() to copy the ssk sndbuf into msk - Later, tcp_child_process() -> tcp_init_transfer() -> tcp_sndbuf_expand() grows the ssk sndbuf. So __mptcp_propagate_sndbuf() runs before the ssk sndbuf has been expanded and the msk ends up with a much smaller sndbuf than the subflow: MPTCP: msk->sndbuf:20480, msk->first->sndbuf:2626560 Fix this by moving the __mptcp_propagate_sndbuf() call from mptcp_sk_clone_init() -- the ssk sndbuf is not yet finalized there -- to __mptcp_propagate_sndbuf() at accept() time, when the ssk sndbuf has been fully expanded by tcp_sndbuf_expand(). Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/602 Signed-off-by: Gang Yan <yangang@kylinos.cn> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260420-net-mptcp-sync-sndbuf-accept-v1-1-e3523e3aeb44@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-09net: use get_random_u{16,32,64}() where appropriateDavid Carlier
Use the typed random integer helpers instead of get_random_bytes() when filling a single integer variable. The helpers return the value directly, require no pointer or size argument, and better express intent. Skipped sites writing into __be16 (netdevsim) and __le64 (ceph) fields where a direct assignment would trigger sparse endianness warnings. Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407150758.5889-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c c3812651b522f ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") 78723a62b969a ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c fde29fd934932 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") d98adfbdd5c01 ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c 51f4e090b9f8 ("net: stmmac: fix integer underflow in chain mode") 6b4286e05508 ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08mptcp: add receive queue awareness in tcp_rcv_space_adjust()Paolo Abeni
This is the MPTCP counter-part of commit ea33537d8292 ("tcp: add receive queue awareness in tcp_rcv_space_adjust()"). Prior to this commit: ESTAB 33165568 0 192.168.255.2:5201 192.168.255.1:53380 \ skmem:(r33076416,rb33554432,t0,tb91136,f448,w0,o0,bl0,d0) After: ESTAB 3279168 0 192.168.255.2:5201 192.168.255.1]:53042 \ skmem:(r3190912,rb3719956,t0,tb91136,f1536,w0,o0,bl0,d0) Same throughput. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-2-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08mptcp: better mptcp-level RTT estimatorPaolo Abeni
The current MPTCP-level RTT estimator has several issues. On high speed links, the MPTCP-level receive buffer auto-tuning happens with a frequency well above the TCP-level's one. That in turn can cause excessive/unneeded receive buffer increase. On such links, the initial rtt_us value is considerably higher than the actual delay, and the current mptcp_rcv_space_adjust() updates msk->rcvq_space.rtt_us with a period equal to the such field previous value. If the initial rtt_us is 40ms, its first update will happen after 40ms, even if the subflows see actual RTT orders of magnitude lower. Additionally: - setting the msk RTT to the maximum among all the subflows RTTs makes DRS constantly overshooting the rcvbuf size when a subflow has considerable higher latency than the other(s). - during unidirectional bulk transfers with multiple active subflows, the TCP-level RTT estimator occasionally sees considerably higher value than the real link delay, i.e. when the packet scheduler reacts to an incoming ACK on given subflow pushing data on a different subflow. - currently inactive but still open subflows (i.e. switched to backup mode) are always considered when computing the msk-level RTT. Address the all the issues above with a more accurate RTT estimation strategy: the MPTCP-level RTT is set to the minimum of all the subflows actually feeding data into the MPTCP receive buffer, using a small sliding window. While at it, also use EWMA to compute the msk-level scaling_ratio, to that MPTCP can avoid traversing the subflow list is mptcp_rcv_space_adjust(). Use some care to avoid updating msk and ssk level fields too often. Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-1-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08Revert "mptcp: add needs_id for netlink appending addr"Matthieu Baerts (NGI0)
This commit was originally adding the ability to add MPTCP endpoints with ID 0 by accident. The in-kernel PM, handling MPTCP endpoints at the net namespace level, is not supposed to handle endpoints with such ID, because this ID 0 is reserved to the initial subflow, as mentioned in the MPTCPv1 protocol [1], a per-connection setting. Note that 'ip mptcp endpoint add id 0' stops early with an error, but other tools might still request the in-kernel PM to create MPTCP endpoints with this restricted ID 0. In other words, it was wrong to call the mptcp_pm_has_addr_attr_id helper to check whether the address ID attribute is set: if it was set to 0, a new MPTCP endpoint would be created with ID 0, which is not expected, and might cause various issues later. Fixes: 584f38942626 ("mptcp: add needs_id for netlink appending addr") Cc: stable@vger.kernel.org Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.2-9 [1] Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-mptcp-revert-pm-needs-id-v2-1-7a25cbc324f8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08mptcp: fix slab-use-after-free in __inet_lookup_establishedJiayuan Chen
The ehash table lookups are lockless and rely on SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability during RCU read-side critical sections. Both tcp_prot and tcpv6_prot have their slab caches created with this flag via proto_register(). However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into tcpv6_prot_override during inet_init() (fs_initcall, level 5), before inet6_init() (module_init/device_initcall, level 6) has called proto_register(&tcpv6_prot). At that point, tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab remains NULL permanently. This causes MPTCP v6 subflow child sockets to be allocated via kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so when these sockets are freed without SOCK_RCU_FREE (which is cleared for child sockets by design), the memory can be immediately reused. Concurrent ehash lookups under rcu_read_lock can then access freed memory, triggering a slab-use-after-free in __inet_lookup_established. Fix this by splitting the IPv6-specific initialization out of mptcp_subflow_init() into a new mptcp_subflow_v6_init(), called from mptcp_proto_v6_init() before protocol registration. This ensures tcpv6_prot_override.slab correctly inherits the SLAB_TYPESAFE_BY_RCU slab cache. Fixes: b19bc2945b40 ("mptcp: implement delegated actions") Cc: stable@vger.kernel.org Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260406031512.189159-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06tcp: add recv_should_stop helperGeliang Tang
Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked() and tcp_splice_read() to check whether to stop receiving. And use this helper in mptcp_recvmsg() and mptcp_splice_read() to reduce redundant code. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-3-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06mptcp: preserve MSG_EOR semantics in sendmsg pathGang Yan
Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag, which marks the end of a record for application-level message boundaries. Data fragments tagged with MSG_EOR are explicitly marked in the mptcp_data_frag structure and skb context to prevent unintended coalescing with subsequent data chunks. This ensures the intent of applications using MSG_EOR is preserved across MPTCP subflows, maintaining consistent message segmentation behavior. Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-2-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-06mptcp: reduce 'overhead' from u16 to u8Gang Yan
The 'overhead' in struct mptcp_data_frag can safely use u8, as it represents 'alignment + sizeof(mptcp_data_frag)'. With a maximum alignment of 7('ALIGN(1, sizeof(long)) - 1'), the overhead is at most 47, well below U8_MAX and validated with BUILD_BUG_ON(). This patch also adds a field named 'unused' for further extensions. Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-1-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()") 0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic") 57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c 4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") 687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling") ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c 078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") 323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-31mptcp: fix soft lockup in mptcp_recvmsg()Li Xiasong
syzbot reported a soft lockup in mptcp_recvmsg() [0]. When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not removed from the sk_receive_queue. This causes sk_wait_data() to always find available data and never perform actual waiting, leading to a soft lockup. Fix this by adding a 'last' parameter to track the last peeked skb. This allows sk_wait_data() to make informed waiting decisions and prevent infinite loops when MSG_PEEK is used. [0]: watchdog: BUG: soft lockup - CPU#2 stuck for 156s! [server:1963] Modules linked in: CPU: 2 UID: 0 PID: 1963 Comm: server Not tainted 6.19.0-rc8 #61 PREEMPT(none) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:sk_wait_data+0x15/0x190 Code: 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 41 56 41 55 41 54 49 89 f4 55 48 89 d5 53 48 89 fb <48> 83 ec 30 65 48 8b 05 17 a4 6b 01 48 89 44 24 28 31 c0 65 48 8b RSP: 0018:ffffc90000603ca0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff888102bf0800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffc90000603d18 RDI: ffff888102bf0800 RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000101 R10: 0000000000000000 R11: 0000000000000075 R12: ffffc90000603d18 R13: ffff888102bf0800 R14: ffff888102bf0800 R15: 0000000000000000 FS: 00007f6e38b8c4c0(0000) GS:ffff8881b877e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055aa7bff1680 CR3: 0000000105cbe000 CR4: 00000000000006f0 Call Trace: <TASK> mptcp_recvmsg+0x547/0x8c0 net/mptcp/protocol.c:2329 inet_recvmsg+0x11f/0x130 net/ipv4/af_inet.c:891 sock_recvmsg+0x94/0xc0 net/socket.c:1100 __sys_recvfrom+0xb2/0x130 net/socket.c:2256 __x64_sys_recvfrom+0x1f/0x30 net/socket.c:2267 do_syscall_64+0x59/0x2d0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e arch/x86/entry/entry_64.S:131 RIP: 0033:0x7f6e386a4a1d Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 f1 de 2c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41 RSP: 002b:00007ffc3c4bb078 EFLAGS: 00000246 ORIG_RAX: 000000000000002d RAX: ffffffffffffffda RBX: 000000000000861e RCX: 00007f6e386a4a1d RDX: 00000000000003ff RSI: 00007ffc3c4bb150 RDI: 0000000000000004 RBP: 00007ffc3c4bb570 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000103 R11: 0000000000000246 R12: 00005605dbc00be0 R13: 00007ffc3c4bb650 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: 8e04ce45a8db ("mptcp: fix MSG_PEEK stream corruption") Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260330120335.659027-1-lixiasong1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc5). net/netfilter/nft_set_rbtree.c 598adea720b97 ("netfilter: revert nft_set_rbtree: validate open interval overlap") 3aea466a43998 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-19MPTCP: fix lock class name family in pm_nl_create_listen_socketLi Xiasong
In mptcp_pm_nl_create_listen_socket(), use entry->addr.family instead of sk->sk_family for lock class setup. The 'sk' parameter is a netlink socket, not the MPTCP subflow socket being created. Fixes: cee4034a3db1 ("mptcp: fix lockdep false positive in mptcp_pm_nl_create_listen_socket()") Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260319112159.3118874-1-lixiasong1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14mptcp: keep rcv_mwnd_seq in sync with subflow rcv_wndSimon Baatz
MPTCP shares a receive window across subflows and applies it at the subflow level by adjusting each subflow's rcv_wnd when needed. With the new TCP tracking of the maximum advertised window sequence, rcv_mwnd_seq must stay consistent with these subflow-level rcv_wnd adjustments. Signed-off-by: Simon Baatz <gmbnomis@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-2-4c7f96b1ec69@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc3). No conflicts. Adjacent changes: net/netfilter/nft_set_rbtree.c fb7fb4016300 ("netfilter: nf_tables: clone set on flush only") 3aea466a4399 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04mptcp: pm: in-kernel: always mark signal+subflow endp as usedMatthieu Baerts (NGI0)
Syzkaller managed to find a combination of actions that was generating this warning: msk->pm.local_addr_used == 0 WARNING: net/mptcp/pm_kernel.c:1071 at __mark_subflow_endp_available net/mptcp/pm_kernel.c:1071 [inline], CPU#1: syz.2.17/961 WARNING: net/mptcp/pm_kernel.c:1071 at mptcp_nl_remove_subflow_and_signal_addr net/mptcp/pm_kernel.c:1103 [inline], CPU#1: syz.2.17/961 WARNING: net/mptcp/pm_kernel.c:1071 at mptcp_pm_nl_del_addr_doit+0x81d/0x8f0 net/mptcp/pm_kernel.c:1210, CPU#1: syz.2.17/961 Modules linked in: CPU: 1 UID: 0 PID: 961 Comm: syz.2.17 Not tainted 6.19.0-08368-gfafda3b4b06b #22 PREEMPT(full) Hardware name: QEMU Ubuntu 25.10 PC v2 (i440FX + PIIX, + 10.1 machine, 1996), BIOS 1.17.0-debian-1.17.0-1build1 04/01/2014 RIP: 0010:__mark_subflow_endp_available net/mptcp/pm_kernel.c:1071 [inline] RIP: 0010:mptcp_nl_remove_subflow_and_signal_addr net/mptcp/pm_kernel.c:1103 [inline] RIP: 0010:mptcp_pm_nl_del_addr_doit+0x81d/0x8f0 net/mptcp/pm_kernel.c:1210 Code: 89 c5 e8 46 30 6f fe e9 21 fd ff ff 49 83 ed 80 e8 38 30 6f fe 4c 89 ef be 03 00 00 00 e8 db 49 df fe eb ac e8 24 30 6f fe 90 <0f> 0b 90 e9 1d ff ff ff e8 16 30 6f fe eb 05 e8 0f 30 6f fe e8 9a RSP: 0018:ffffc90001663880 EFLAGS: 00010293 RAX: ffffffff82de1a6c RBX: 0000000000000000 RCX: ffff88800722b500 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff8880158b22d0 R08: 0000000000010425 R09: ffffffffffffffff R10: ffffffff82de18ba R11: 0000000000000000 R12: ffff88800641a640 R13: ffff8880158b1880 R14: ffff88801ec3c900 R15: ffff88800641a650 FS: 00005555722c3500(0000) GS:ffff8880f909d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f66346e0f60 CR3: 000000001607c000 CR4: 0000000000350ef0 Call Trace: <TASK> genl_family_rcv_msg_doit+0x117/0x180 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x3a8/0x3f0 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x16d/0x240 net/netlink/af_netlink.c:2550 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x3e9/0x4c0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x4aa/0x5b0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0xc9/0xf0 net/socket.c:742 ____sys_sendmsg+0x272/0x3b0 net/socket.c:2592 ___sys_sendmsg+0x2de/0x320 net/socket.c:2646 __sys_sendmsg net/socket.c:2678 [inline] __do_sys_sendmsg net/socket.c:2683 [inline] __se_sys_sendmsg net/socket.c:2681 [inline] __x64_sys_sendmsg+0x110/0x1a0 net/socket.c:2681 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x143/0x440 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f66346f826d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc83d8bdc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f6634985fa0 RCX: 00007f66346f826d RDX: 00000000040000b0 RSI: 0000200000000740 RDI: 0000000000000007 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6634985fa8 R13: 00007f6634985fac R14: 0000000000000000 R15: 0000000000001770 </TASK> The actions that caused that seem to be: - Set the MPTCP subflows limit to 0 - Create an MPTCP endpoint with both the 'signal' and 'subflow' flags - Create a new MPTCP connection from a different address: an ADD_ADDR linked to the MPTCP endpoint will be sent ('signal' flag), but no subflows is initiated ('subflow' flag) - Remove the MPTCP endpoint In this case, msk->pm.local_addr_used has been kept to 0 -- because no subflows have been created -- but the corresponding bit in msk->pm.id_avail_bitmap has been cleared when the ADD_ADDR has been sent. This later causes a splat when removing the MPTCP endpoint because msk->pm.local_addr_used has been kept to 0. Now, if an endpoint has both the signal and subflow flags, but it is not possible to create subflows because of the limits or the c-flag case, then the local endpoint counter is still incremented: the endpoint is used at the end. This avoids issues later when removing the endpoint and calling __mark_subflow_endp_available(), which expects msk->pm.local_addr_used to have been previously incremented if the endpoint was marked as used according to msk->pm.id_avail_bitmap. Note that signal_and_subflow variable is reset to false when the limits and the c-flag case allows subflows creation. Also, local_addr_used is only incremented for non ID0 subflows. Fixes: 85df533a787b ("mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/613 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260303-net-mptcp-misc-fixes-7-0-rc2-v1-4-4b5462b6f016@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>