linux.git/net/ipv6, branch master

ipv6: Change allocation flags to match rcu_read_lock section requirements

2026-07-23T16:13:13+00:00

Since the call to __ip6_del_rt_siblings has been converted under
rcu read lock and it only has one call point
we should no longer block or yield.

Our stack trace from the syzbot reproducer looks as follows:

__ip6_del_rt_siblings
  rtnl_notify (Here we pass gfp_any() -> GFP_KERNEL)
    nlmsg_notify
      nlmsg_multicast
        nlmsg_multicast_filtered
          netlink_broadcast_filtered (GFP_KERNEL passed from earlier)

netlink_broadcast_filtered can yield if GFP_KERNEL
is passed, which we do not want to happen.

Fix this by changing the allocation flag of rtnl_notify.

Also change the flag passed to nlmsg_new. Even though it
is not related to the syzbot generated bug it still falls
under the same requirements.

Reported-by: syzbot+84d4a405ed798b40c96d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=84d4a405ed798b40c96d
Fixes: bd11ff421d36 ("ipv6: Get rid of RTNL for SIOCDELRT and RTM_DELROUTE.")
Signed-off-by: Nikola Z. Ivanov 
Reviewed-by: Ido Schimmel 
Link: https://patch.msgid.link/20260719105759.558050-1-zlatistiv@gmail.com
Signed-off-by: Jakub Kicinski

net: ipv6: fix dif and sdif mismatch in raw6_icmp_error

2026-07-23T15:31:44+00:00

In raw6_icmp_error(), raw_v6_match() is called with inet6_iif(skb) passed
to both the 'dif' and 'sdif' arguments. This is a copy-paste or typo error,
as the last argument should represent the secondary interface index (sdif).

This mismatch breaks ICMPv6 error handling for IPv6 raw sockets in VRF
(Virtual Routing and Forwarding) environments. When a raw socket is bound
to a VRF master device, raw_v6_match() fails to find a match because it is
not given the correct sdif value, causing the socket to miss relevant
ICMPv6 error notifications.

Fix this by properly passing inet6_sdif(skb) as the last argument to
raw_v6_match().

Fixes: 5108ab4bf446fa ("net: ipv6: add second dif to raw socket lookups")
Signed-off-by: Li RongQing 
Reviewed-by: Joe Damato 
Link: https://patch.msgid.link/20260717143230.1836-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski

net: gre: fix lltx regression for GRE tunnels with SEQ/CSUM

2026-07-23T11:01:57+00:00

Before commit 00d066a4d4ed ("netdev_features: convert NETIF_F_LLTX to
dev->lltx"), NETIF_F_LLTX was set unconditionally in both
__gre_tunnel_init() and ip6gre_tnl_init_features() alongside
GRE_FEATURES:

    dev->features |= GRE_FEATURES | NETIF_F_LLTX;

When that commit converted NETIF_F_LLTX to the dev->lltx flag, it
placed 'dev->lltx = true' after the SEQ/CSUM early returns instead
of before them. This causes GRE/GRETAP/ip6gre tunnels with SEQ or
CSUM+encap to lose lockless TX, reintroducing _xmit_lock acquisition
around their ndo_start_xmit. Since GRE xmit re-enters the stack via
ip_tunnel_xmit(), holding _xmit_lock risks ABBA deadlock with the
underlay device.

  CPU0                        CPU1
  ----                        ----
  lock(&qdisc_xmit_lock_key#6);
                              lock(&qdisc_xmit_lock_key#3);
                              lock(&qdisc_xmit_lock_key#6);
  lock(&qdisc_xmit_lock_key#3);

Fix by moving dev->lltx = true before the early returns in both
functions, restoring the original unconditional behavior.

Fixes: 00d066a4d4ed ("netdev_features: convert NETIF_F_LLTX to dev->lltx")
Signed-off-by: Yun Zhou 
Reviewed-by: Ido Schimmel 
Link: https://patch.msgid.link/20260713150945.1779628-1-yun.zhou@windriver.com
Signed-off-by: Paolo Abeni

ila: reload IPv6 header after pskb_may_pull in checksum adjust

2026-07-22T21:00:41+00:00

ila_csum_adjust_transport() caches ip6h = ipv6_hdr(skb) before calling
pskb_may_pull(). On a non-linear skb whose transport header sits in a page
fragment, pskb_may_pull() can call __pskb_pull_tail() / pskb_expand_head()
and free the old skb head, leaving ip6h dangling; the following
get_csum_diff(ip6h, p) then reads freed memory. ila_update_ipv6_locator()
uses ip6h (and the iaddr derived from it) again after the csum-adjust
call and additionally writes the new locator through that pointer.

Impact: a remote IPv6 packet routed through a configured ILA
csum-adjust-transport route or receive-side mapping triggers a
slab-use-after-free in ila_update_ipv6_locator() (KASAN). The route or
mapping requires CAP_NET_ADMIN to configure, but trigger packets are
unauthenticated once it exists.

Reload ip6h after each pskb_may_pull() in ila_csum_adjust_transport()
before the csum-diff read. In ila_update_ipv6_locator() only the
ILA_CSUM_ADJUST_TRANSPORT case pulls the skb, so reload ip6h and iaddr in
that case alone before the destination-address write; the neutral-map
modes never pull and keep their cached pointers.

Fixes: 33f11d16142b ("ila: Create net/ipv6/ila directory")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito 
Reviewed-by: Simon Horman 
Reviewed-by: Antoine Tenart 
Link: https://patch.msgid.link/20260714114903.3763420-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski

tcp: initialize standalone TCP-AO response padding

2026-07-21T22:24:36+00:00

tcp_v4_send_ack() and tcp_v6_send_response() construct standalone TCP
responses with TCP-AO options.  The option length carries the actual MAC
length, but the TCP header length includes the option rounded up to a
four-byte boundary.

tcp_ao_hash_hdr() writes the MAC only.  Thus, when the MAC length is not
four-byte aligned, the one to three bytes after the MAC are left
uninitialized and may be transmitted.  For the normal TCP-AO hashing
mode, those bytes also have to be initialized before computing the MAC.

Initialize only the alignment padding in the TCP-AO branches, before
hashing the header.  Use TCPOPT_NOP, as in the normal TCP-AO output path.
This avoids adding work to non-AO TCP responses while preserving a valid
authenticated header.

Fixes: decde2586b34 ("net/tcp: Add TCP-AO sign to twsk")
Fixes: da7dfaa6d6f7 ("net/tcp: Consistently align TCP-AO option in the header")
Cc: stable@vger.kernel.org
Reported-by: Yizhou Zhao 
Reported-by: Yuxiang Yang 
Reported-by: Ao Wang 
Reported-by: Xuewei Feng 
Reported-by: Qi Li 
Reported-by: Ke Xu 
Suggested-by: Eric Dumazet 
Signed-off-by: Yizhou Zhao 
Reviewed-by: Eric Dumazet 
Link: https://patch.msgid.link/20260713105631.8616-1-zhaoyz24@mails.tsinghua.edu.cn
Signed-off-by: Jakub Kicinski

tcp: fix TIME_WAIT socket reference leak on PSP policy failure

2026-07-17T09:53:55+00:00

Release the TIME_WAIT socket reference and jump to discard_it
upon PSP policy failure in both IPv4 and IPv6 receive paths.
This prevents a memory leak of tcp_tw_bucket structures.

Fixes: 659a2899a57d ("tcp: add datapath logic for PSP with inline key exchange")
Signed-off-by: Eric Dumazet 
Reviewed-by: Kuniyuki Iwashima 
Reviewed-by: Daniel Zahka 
Link: https://patch.msgid.link/20260710181317.4060230-1-edumazet@google.com
Signed-off-by: Paolo Abeni

Merge tag 'nf-26-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

2026-07-17T09:19:37+00:00

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*.
These are fixes for bugs except patches 6 and 9 which fix issues added in
last PR and 7.1-rc1.

1) Reject unsupported target families in xt_nat_checkentry().
From Wyatt Feng.

2) Fix inverted time_after() check in ecache_work_evict_list().
Causes pointless work rescheds and thus way longer time to
clear the pending event backlog. From Yizhou Zhao.

3) Fix a use-after-free in br_ip6_fragment() caused by a dangling prevhdr
pointer.  From Xiang Mei.

4) Fix incorrect conntrack zone comparison in nf_conncount tuple
deduplication. Pass IP_CT_DIR_ORIGINAL, not zone direction.
From Yizhou Zhao.

5) Add bridge tunnel flowtable regression test for a bug that
   got fixed in the previous PR.  From Zhengyang Chen.

6) Use the correct direction when setting up tunnel routes in the flowtable
xmit path.  From Pablo Neira Ayuso.  This fixes a bug added in the
previous PR.

7) Reload IP header after potential skb head reallocation in IPVS.

8) Fix incorrect IPv6 transport offsets in TCP application code. Correct the
ICMPv6 header offset to ensure proper checksumming with extension headers,
from Julian Anastasov.  this is a followup to the previous PR.

9) Remove null-termination requirement for xt_physdev masks, this broke
   device names with 15 characters.

netfilter pull request nf-26-07-10

* tag 'nf-26-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: xt_physdev: masks are not c-strings
  ipvs: fix more places with wrong ipv6 transport offsets
  ipvs: reload ip header after head reallocation
  netfilter: flowtable: use correct direction to set up tunnel route
  selftests: netfilter: add bridge tunnel flowtable regression
  netfilter: nf_conncount: fix zone comparison in tuple dedup
  netfilter: bridge: fix stale prevhdr pointer in br_ip6_fragment()
  netfilter: ecache: fix inverted time_after() check
  netfilter: xt_nat: reject unsupported target families
====================

Link: https://patch.msgid.link/20260710143733.29741-1-fw@strlen.de
Signed-off-by: Paolo Abeni

Merge tag 'ipsec-2026-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

2026-07-11T10:48:08+00:00

Steffen Klassert says:

====================
pull request (net): ipsec 2026-07-10

1) xfrm: propagate -EINPROGRESS from validate_xmit_xfrm()
   Return -EINPROGRESS from xfrm_output_one when validate_xmit_xfrm
   requeues the packet asynchronously, so the caller doesn't treat it
   as a real error and free the skb.

2) xfrm: fix stale skb->prev after async crypto steals a GSO segment
   Re-derive skb->prev from the fragment list after async crypto splits
   a GSO skb, keeping the linked-list pointers validi.

3) xfrm: nat_keepalive: avoid double free on send error
   Hold a state ref while the nat_keepalive timer is active and drop the
   timer before freeing the state, preventing a re-entered free on send
   error.

4) xfrm: fix sk_dst_cache double-free in xfrm_user_policy()
   Null the skb dst cache before freeing the policy so a later skb
   destructor doesn't double-free it.

5) xfrm: cache the offload ifindex for netlink dumps
   Cache the device ifindex at state-add time and use it for netlink
   dumps instead of dereferencing dst->dev, which may have changed by
   the time the dump runs.

6) xfrm: reject optional IPTFS templates in outbound policies
   Reject outbound policies with an optional IPTFS template,
   IPTFS must always be used if configured.

7) xfrm: clear mode callbacks after failed mode setup
   Clear the mode->init_flags and init_state callbacks on the error path
   after xfrm_init_mode fails, so a partially-initialised mode isn't
   reused in xfrm_state_construct.

8) xfrm: iptfs: propagate SKBFL_SHARED_FRAG in iptfs_skb_add_frags()
   Propagate SKBFL_SHARED_FRAG from the original skb to fragments
   allocated by iptfs_skb_add_frags, keeping shared-fragment accounting
   correct after IPTFS reassembly.

9) xfrm6: clear dst.dev on error to avoid double netdev_put in xfrm6_fill_dst()
   Clear dst->dev on the error path of xfrm6_fill_dst() so the caller
   doesn't release the netdev reference twice via dst_release.

10) xfrm: policy: preallocate inexact bins before xfrm_hash_rebuild reinsert
    Preallocate all inexact hash bins before existing entries are
    reinserted during xfrm_hash_rebuild, so reinsertion always hits an
    existing bin.

Please pull or let me know if there are problems.

ipsec-2026-07-10

* tag 'ipsec-2026-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: policy: preallocate inexact bins before xfrm_hash_rebuild reinsert
  xfrm6: clear dst.dev on error to avoid double netdev_put in xfrm6_fill_dst()
  xfrm: iptfs: propagate SKBFL_SHARED_FRAG in iptfs_skb_add_frags()
  xfrm: clear mode callbacks after failed mode setup
  xfrm: reject optional IPTFS templates in outbound policies
  xfrm: cache the offload ifindex for netlink dumps
  xfrm: fix sk_dst_cache double-free in xfrm_user_policy()
  xfrm: nat_keepalive: avoid double free on send error
  xfrm: fix stale skb->prev after async crypto steals a GSO segment
  xfrm: propagate -EINPROGRESS from validate_xmit_xfrm()
====================

Link: https://patch.msgid.link/20260710090349.343389-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni

netfilter: bridge: fix stale prevhdr pointer in br_ip6_fragment()

2026-07-10T14:28:47+00:00

br_ip6_fragment() gets prevhdr, a pointer into the skb head, from
ip6_find_1stfragopt(), then calls skb_checksum_help().  For a cloned skb
skb_checksum_help() reallocates the head via pskb_expand_head(), leaving
prevhdr dangling.  It is later dereferenced in ip6_frag_next(), causing a
use-after-free write.

Save prevhdr's offset before skb_checksum_help() and recompute it after,
like commit ef0efcd3bd3f ("ipv6: Fix dangling pointer when ipv6
fragment").

  BUG: KASAN: slab-use-after-free in ip6_frag_next (net/ipv6/ip6_output.c:857)
  Write of size 1 at addr ffff888013ff5016 by task exploit/141
  Call Trace:
   ...
   kasan_report (mm/kasan/report.c:595)
   ip6_frag_next (net/ipv6/ip6_output.c:857)
   br_ip6_fragment (net/ipv6/netfilter.c:212)
   nf_ct_bridge_post (net/bridge/netfilter/nf_conntrack_bridge.c:407)
   nf_hook_slow (net/netfilter/core.c:619)
   br_forward_finish (net/bridge/br_forward.c:66)
   __br_forward (net/bridge/br_forward.c:115)
   maybe_deliver (net/bridge/br_forward.c:191)
   br_flood (net/bridge/br_forward.c:245)
   br_handle_frame_finish (net/bridge/br_input.c:229)
   br_handle_frame (net/bridge/br_input.c:442)
   ...
   packet_sendmsg (net/packet/af_packet.c:3114)
   ...
   do_syscall_64 (arch/x86/entry/syscall_64.c:94)
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
  Kernel panic - not syncing: Fatal exception in interrupt

Fixes: 764dd163ac92 ("netfilter: nf_conntrack_bridge: add support for IPv6")
Cc: stable@vger.kernel.org
Reported-by: AutonomousCodeSecurity@microsoft.com
Signed-off-by: Xiang Mei (Microsoft) 
Signed-off-by: Florian Westphal

netfilter: handle unreadable frags

2026-07-08T13:33:44+00:00

sashiko reports:
 When an skb with unreadable fragments (such as from devmem TCP, where
 skb_frags_readable(skb) returns false) is processed by the u32 module,
 skb_copy_bits() will safely return a negative error code [..]

xt_u32: bail out with hotdrop in this case.
gather_frags: return -1, just as if we had no fragment header.
nfnetlink_queue: restrict to the linear part.
nfnetlink_log: restrict to the linear part.

v2:
 - skb_zerocopy helpers don't copy readable flag, i.e. nfnetlink_queue
 is broken too
 xt_u32 shouldn't return true if hotdrop was set.

Fixes: 65249feb6b3d ("net: add support for skbs with unreadable frags")
Cc: stable@vger.kernel.org
Acked-by: Mina Almasry 
Signed-off-by: Florian Westphal