<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/ipv4, branch master</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>net: skbuff: propagate shared-frag marker through frag-transfer helpers</title>
<updated>2026-05-21T09:31:05+00:00</updated>
<author>
<name>Hyunwoo Kim</name>
<email>imv4bel@gmail.com</email>
</author>
<published>2026-05-15T22:28:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=48f6a5356a33dd78e7144ae1faef95ffc990aae0'/>
<id>48f6a5356a33dd78e7144ae1faef95ffc990aae0</id>
<content type='text'>
Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail
to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()-&gt;flags when
moving frags from source to destination.  __pskb_copy_fclone() defers
the rest of the shinfo metadata to skb_copy_header() after copying
frag descriptors, but that helper only carries over gso_{size,segs,
type} and never touches skb_shinfo()-&gt;flags; skb_shift() moves frag
descriptors directly and leaves flags untouched.  As a result, the
destination skb keeps a reference to the same externally-owned or
page-cache-backed pages while reporting skb_has_shared_frag() as
false.

The mismatch is harmful in any in-place writer that uses
skb_has_shared_frag() to decide whether shared pages must be detoured
through skb_cow_data().  ESP input is one such writer (esp4.c,
esp6.c), and a single nft 'dup to &lt;local&gt;' rule -- or any other
nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d
skb in esp_input() with the marker stripped, letting an unprivileged
user write into the page cache of a root-owned read-only file via
authencesn-ESN stray writes.

Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors
were actually moved from the source.  skb_copy() and skb_copy_expand()
share skb_copy_header() too but linearize all paged data into freshly
allocated head storage and emerge with nr_frags == 0, so
skb_has_shared_frag() returns false on its own; they need no change.

The same omission exists in skb_gro_receive() and skb_gro_receive_list().
The former moves the incoming skb's frag descriptors into the
accumulator's last sub-skb via two paths (a direct frag-move loop and
the head_frag + memcpy path); the latter chains the incoming skb whole
onto p's frag_list.  Downstream skb_segment() reads only
skb_shinfo(p)-&gt;flags, and skb_segment_list() reuses each sub-skb's
shinfo as the nskb -- both p and lp must carry the marker.

The same omission also exists in tcp_clone_payload(), which builds an
MTU probe skb by moving frag descriptors from skbs on sk_write_queue
into a freshly allocated nskb.  The helper falls into the same family
and warrants the same fix for consistency; no TCP TX-side in-place
writer is currently known to reach a user page through this gap, but
a future consumer depending on the marker would regress silently.

The same omission exists in skb_segment(): the per-iteration flag
merge takes only head_skb's flag, and the inner switch that rebinds
frag_skb to list_skb on head_skb-frags exhaustion does not fold the
new frag_skb's flag into nskb.  Fold frag_skb's flag at both sites
so segments drawing frags from frag_list members carry the marker.

Fixes: cef401de7be8 ("net: fix possible wrong checksum generation")
Fixes: f4c50a4034e6 ("xfrm: esp: avoid in-place decrypt on shared skb frags")
Suggested-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Suggested-by: Sultan Alsawaf &lt;sultan@kerneltoast.com&gt;
Suggested-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Suggested-by: Lin Ma &lt;malin89@huawei.com&gt;
Suggested-by: Jingguo Tan &lt;tanjingguo@huawei.com&gt;
Suggested-by: Aaron Esau &lt;aaron1esau@gmail.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim &lt;imv4bel@gmail.com&gt;
Tested-by: Rajat Gupta &lt;rajat.gupta@oss.qualcomm.com&gt;
Link: https://patch.msgid.link/ageeJfJHwgzmKXbh@v4bel
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail
to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()-&gt;flags when
moving frags from source to destination.  __pskb_copy_fclone() defers
the rest of the shinfo metadata to skb_copy_header() after copying
frag descriptors, but that helper only carries over gso_{size,segs,
type} and never touches skb_shinfo()-&gt;flags; skb_shift() moves frag
descriptors directly and leaves flags untouched.  As a result, the
destination skb keeps a reference to the same externally-owned or
page-cache-backed pages while reporting skb_has_shared_frag() as
false.

The mismatch is harmful in any in-place writer that uses
skb_has_shared_frag() to decide whether shared pages must be detoured
through skb_cow_data().  ESP input is one such writer (esp4.c,
esp6.c), and a single nft 'dup to &lt;local&gt;' rule -- or any other
nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d
skb in esp_input() with the marker stripped, letting an unprivileged
user write into the page cache of a root-owned read-only file via
authencesn-ESN stray writes.

Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors
were actually moved from the source.  skb_copy() and skb_copy_expand()
share skb_copy_header() too but linearize all paged data into freshly
allocated head storage and emerge with nr_frags == 0, so
skb_has_shared_frag() returns false on its own; they need no change.

The same omission exists in skb_gro_receive() and skb_gro_receive_list().
The former moves the incoming skb's frag descriptors into the
accumulator's last sub-skb via two paths (a direct frag-move loop and
the head_frag + memcpy path); the latter chains the incoming skb whole
onto p's frag_list.  Downstream skb_segment() reads only
skb_shinfo(p)-&gt;flags, and skb_segment_list() reuses each sub-skb's
shinfo as the nskb -- both p and lp must carry the marker.

The same omission also exists in tcp_clone_payload(), which builds an
MTU probe skb by moving frag descriptors from skbs on sk_write_queue
into a freshly allocated nskb.  The helper falls into the same family
and warrants the same fix for consistency; no TCP TX-side in-place
writer is currently known to reach a user page through this gap, but
a future consumer depending on the marker would regress silently.

The same omission exists in skb_segment(): the per-iteration flag
merge takes only head_skb's flag, and the inner switch that rebinds
frag_skb to list_skb on head_skb-frags exhaustion does not fold the
new frag_skb's flag into nskb.  Fold frag_skb's flag at both sites
so segments drawing frags from frag_list members carry the marker.

Fixes: cef401de7be8 ("net: fix possible wrong checksum generation")
Fixes: f4c50a4034e6 ("xfrm: esp: avoid in-place decrypt on shared skb frags")
Suggested-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Suggested-by: Sultan Alsawaf &lt;sultan@kerneltoast.com&gt;
Suggested-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Suggested-by: Lin Ma &lt;malin89@huawei.com&gt;
Suggested-by: Jingguo Tan &lt;tanjingguo@huawei.com&gt;
Suggested-by: Aaron Esau &lt;aaron1esau@gmail.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim &lt;imv4bel@gmail.com&gt;
Tested-by: Rajat Gupta &lt;rajat.gupta@oss.qualcomm.com&gt;
Link: https://patch.msgid.link/ageeJfJHwgzmKXbh@v4bel
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction</title>
<updated>2026-05-21T02:14:06+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-05-19T08:46:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1bbf0ced1d9db73ac7893c2187f3459288603e0d'/>
<id>1bbf0ced1d9db73ac7893c2187f3459288603e0d</id>
<content type='text'>
Blamed commit moved the TIME_WAIT-derived ISN from the skb control
block to a per-CPU variable, assuming the value would always be consumed
by tcp_conn_request() for the same packet that wrote it. That assumption
is violated by multiple drop paths between the producer
(__this_cpu_write(tcp_tw_isn, isn) in tcp_v{4,6}_rcv()) and the consumer
(tcp_conn_request()):

 - min_ttl / min_hopcount check
 - xfrm policy check
 - tcp_inbound_hash() MD5/AO mismatch
 - tcp_filter() eBPF/SO_ATTACH_FILTER drop
 - th-&gt;syn &amp;&amp; th-&gt;fin discard in tcp_rcv_state_process() TCP_LISTEN
 - psp_sk_rx_policy_check() in tcp_v{4,6}_do_rcv()
 - tcp_checksum_complete() in tcp_v{4,6}_do_rcv()
 - tcp_v{4,6}_cookie_check() returning NULL

When a packet is dropped on any of these paths, tcp_tw_isn is left set.

The next SYN processed on the same CPU then consumes the non zero value in
tcp_conn_request(), receiving a potentially predictable ISN.

This patch moves back tcp_tw_isn to skb-&gt;cb[], getting rid of the per-cpu
variable.

Note that tcp_v{4,6}_fill_cb() do not set it.

Very litle impact on overall code size/complexity:

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7)
Function                                     old     new   delta
tcp_v6_rcv                                  3038    3042      +4
tcp_v4_rcv                                  3035    3039      +4
tcp_conn_request                            2938    2923     -15
Total: Before=24436060, After=24436053, chg -0.00%

Fixes: 41eecbd712b7 ("tcp: replace TCP_SKB_CB(skb)-&gt;tcp_tw_isn with a per-cpu field")
Reported-by: Chris Mason &lt;clm@meta.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260519084611.2485277-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Blamed commit moved the TIME_WAIT-derived ISN from the skb control
block to a per-CPU variable, assuming the value would always be consumed
by tcp_conn_request() for the same packet that wrote it. That assumption
is violated by multiple drop paths between the producer
(__this_cpu_write(tcp_tw_isn, isn) in tcp_v{4,6}_rcv()) and the consumer
(tcp_conn_request()):

 - min_ttl / min_hopcount check
 - xfrm policy check
 - tcp_inbound_hash() MD5/AO mismatch
 - tcp_filter() eBPF/SO_ATTACH_FILTER drop
 - th-&gt;syn &amp;&amp; th-&gt;fin discard in tcp_rcv_state_process() TCP_LISTEN
 - psp_sk_rx_policy_check() in tcp_v{4,6}_do_rcv()
 - tcp_checksum_complete() in tcp_v{4,6}_do_rcv()
 - tcp_v{4,6}_cookie_check() returning NULL

When a packet is dropped on any of these paths, tcp_tw_isn is left set.

The next SYN processed on the same CPU then consumes the non zero value in
tcp_conn_request(), receiving a potentially predictable ISN.

This patch moves back tcp_tw_isn to skb-&gt;cb[], getting rid of the per-cpu
variable.

Note that tcp_v{4,6}_fill_cb() do not set it.

Very litle impact on overall code size/complexity:

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7)
Function                                     old     new   delta
tcp_v6_rcv                                  3038    3042      +4
tcp_v4_rcv                                  3035    3039      +4
tcp_conn_request                            2938    2923     -15
Total: Before=24436060, After=24436053, chg -0.00%

Fixes: 41eecbd712b7 ("tcp: replace TCP_SKB_CB(skb)-&gt;tcp_tw_isn with a per-cpu field")
Reported-by: Chris Mason &lt;clm@meta.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260519084611.2485277-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: use WARN_ON_ONCE() in ip_rt_bug()</title>
<updated>2026-05-21T02:00:36+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-05-19T19:32:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e4bdef4d320b2fe73b8ebfc0cc0507fa9dc4a3b7'/>
<id>e4bdef4d320b2fe73b8ebfc0cc0507fa9dc4a3b7</id>
<content type='text'>
It turns out ip_rt_bug() can be called more than expected.

syzbot will still panic (because of panic_on_warn=1), but non debug
kernels will no longer die while repeating stack traces on the console.

Fixes: c378a9c019cf ("ipv4: Give backtrace in ip_rt_bug().")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Jiayuan Chen &lt;jiayuan.chen@linux.dev&gt;
Link: https://patch.msgid.link/20260519193248.4018872-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It turns out ip_rt_bug() can be called more than expected.

syzbot will still panic (because of panic_on_warn=1), but non debug
kernels will no longer die while repeating stack traces on the console.

Fixes: c378a9c019cf ("ipv4: Give backtrace in ip_rt_bug().")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Jiayuan Chen &lt;jiayuan.chen@linux.dev&gt;
Link: https://patch.msgid.link/20260519193248.4018872-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: icmp: reject broadcast/multicast routes</title>
<updated>2026-05-21T02:00:02+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2026-05-19T20:08:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7eb72c1e3984150c45f77aa4299f7c2598a68e9b'/>
<id>7eb72c1e3984150c45f77aa4299f7c2598a68e9b</id>
<content type='text'>
syzbot was able to trigger ip_rt_bug() in a loop, using an IPv4 packet
with a crafted IPOPT_SSRR option:

  options: ipv4_options {
    options: array[ipv4_option] {
      union ipv4_option {
        ssrr: ipv4_option_route[IPOPT_SSRR] {
         type: const = 0x89 (1 bytes)
         length: len = 0x7 (1 bytes)
         pointer: int8 = 0xa2 (1 bytes)
         data: array[ipv4_addr] {
           union ipv4_addr {
             broadcast: const = 0xffffffff (4 bytes)
           }
         }
       }
     }

Change __icmp_send() to not send ICMP to broadcast/multicast destinations.

Fixes: c378a9c019cf ("ipv4: Give backtrace in ip_rt_bug().")
Reported-by: syzbot+c13a57c2639c2c0d03a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a0cc169.170a0220.1f6c2d.0004.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20260519200836.4141061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
syzbot was able to trigger ip_rt_bug() in a loop, using an IPv4 packet
with a crafted IPOPT_SSRR option:

  options: ipv4_options {
    options: array[ipv4_option] {
      union ipv4_option {
        ssrr: ipv4_option_route[IPOPT_SSRR] {
         type: const = 0x89 (1 bytes)
         length: len = 0x7 (1 bytes)
         pointer: int8 = 0xa2 (1 bytes)
         data: array[ipv4_addr] {
           union ipv4_addr {
             broadcast: const = 0xffffffff (4 bytes)
           }
         }
       }
     }

Change __icmp_send() to not send ICMP to broadcast/multicast destinations.

Fixes: c378a9c019cf ("ipv4: Give backtrace in ip_rt_bug().")
Reported-by: syzbot+c13a57c2639c2c0d03a6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a0cc169.170a0220.1f6c2d.0004.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20260519200836.4141061-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: Fix UDP length on last GSO_PARTIAL segment</title>
<updated>2026-05-20T22:03:47+00:00</updated>
<author>
<name>Gal Pressman</name>
<email>gal@nvidia.com</email>
</author>
<published>2026-05-18T06:22:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=78effd896eee11ac9db9bcbb53e7bbcad96073d7'/>
<id>78effd896eee11ac9db9bcbb53e7bbcad96073d7</id>
<content type='text'>
Following the cited commit, __udp_gso_segment() writes single MSS length
in the UDP header.
The cited patch doesn't account for the fact that the last segment could
be a GSO skb by itself. This could happen when the size of the packet is
a multiple of MSS, hence the first segment is also the last one (there
is no need for a remainder skb).

When the post-loop segment is a GSO skb, assign the single MSS length in
the UDP header.

Fixes: b10b446ce7ad ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Reported-by: Matthew Schwartz &lt;matthew.schwartz@linux.dev&gt;
Closes: https://lore.kernel.org/all/6c3fb15e-711d-4b8d-b152-e03d9b05293f@linux.dev/
Tested-by: Matthew Schwartz &lt;matthew.schwartz@linux.dev&gt;
Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Signed-off-by: Gal Pressman &lt;gal@nvidia.com&gt;
Link: https://patch.msgid.link/20260518062250.3019914-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Following the cited commit, __udp_gso_segment() writes single MSS length
in the UDP header.
The cited patch doesn't account for the fact that the last segment could
be a GSO skb by itself. This could happen when the size of the packet is
a multiple of MSS, hence the first segment is also the last one (there
is no need for a remainder skb).

When the post-loop segment is a GSO skb, assign the single MSS length in
the UDP header.

Fixes: b10b446ce7ad ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Reported-by: Matthew Schwartz &lt;matthew.schwartz@linux.dev&gt;
Closes: https://lore.kernel.org/all/6c3fb15e-711d-4b8d-b152-e03d9b05293f@linux.dev/
Tested-by: Matthew Schwartz &lt;matthew.schwartz@linux.dev&gt;
Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Signed-off-by: Gal Pressman &lt;gal@nvidia.com&gt;
Link: https://patch.msgid.link/20260518062250.3019914-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: gso: Fix handling checksum in __udp_gso_segment</title>
<updated>2026-05-20T22:02:47+00:00</updated>
<author>
<name>Alice Mikityanska</name>
<email>alice@isovalent.com</email>
</author>
<published>2026-05-18T06:22:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5f17ae0f595aeb560155ce98edbe44d3eacc7e40'/>
<id>5f17ae0f595aeb560155ce98edbe44d3eacc7e40</id>
<content type='text'>
The cited commit started using msslen for uh-&gt;len, but still uses newlen
to adjust uh-&gt;check. Although the checksum is ignored in most cases due
to the hardware offload, __udp_gso_segment attempts to maintain the
correct one. Fix uh-&gt;check and adjust it by the right value.

Additionally, after the fix, newlen becomes assigned and unused before
the loop. The code can be simplified a bit if mss adjustment is dropped,
so that newlen becomes equal to msslen before the loop, and msslen can
be also dropped, saving a few lines of code.

This brings us back to one variable, drops an unneeded arithmetic for
mss, and fixes the UDP checksum.

Fixes: b10b446ce7ad ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Signed-off-by: Alice Mikityanska &lt;alice@isovalent.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Gal Pressman &lt;gal@nvidia.com&gt;
Link: https://patch.msgid.link/20260518062250.3019914-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cited commit started using msslen for uh-&gt;len, but still uses newlen
to adjust uh-&gt;check. Although the checksum is ignored in most cases due
to the hardware offload, __udp_gso_segment attempts to maintain the
correct one. Fix uh-&gt;check and adjust it by the right value.

Additionally, after the fix, newlen becomes assigned and unused before
the loop. The code can be simplified a bit if mss adjustment is dropped,
so that newlen becomes equal to msslen before the loop, and msslen can
be also dropped, saving a few lines of code.

This brings us back to one variable, drops an unneeded arithmetic for
mss, and fixes the UDP checksum.

Fixes: b10b446ce7ad ("udp: gso: Use single MSS length in UDP header for GSO_PARTIAL")
Signed-off-by: Alice Mikityanska &lt;alice@isovalent.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Gal Pressman &lt;gal@nvidia.com&gt;
Link: https://patch.msgid.link/20260518062250.3019914-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: raw: reject IP_HDRINCL packets with ihl &lt; 5</title>
<updated>2026-05-15T22:55:02+00:00</updated>
<author>
<name>Michael Bommarito</name>
<email>michael.bommarito@gmail.com</email>
</author>
<published>2026-05-12T20:51:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=915fab69823a14c170dbaa3b41978768e0fe62fc'/>
<id>915fab69823a14c170dbaa3b41978768e0fe62fc</id>
<content type='text'>
raw_send_hdrinc() validates that the caller-supplied IPv4 header
fits within the message length:

    iphlen = iph-&gt;ihl * 4;
    err = -EINVAL;
    if (iphlen &gt; length)
        goto error_free;

    if (iphlen &gt;= sizeof(*iph)) {
        /* fix up saddr, tot_len, id, csum, transport_header */
    }

It does not, however, reject ihl &lt; 5.  For such a packet the
"if (iphlen &gt;= sizeof(*iph))" branch is skipped, leaving the
crafted iphdr untouched, but the packet is still handed to
__ip_local_out() and onward.  Downstream consumers that read
iph-&gt;ihl assume a sane value: net/ipv4/ah4.c:ah_output() in
particular subtracts sizeof(struct iphdr) from top_iph-&gt;ihl * 4
and passes the (signed-int-negative, then cast to size_t)
result to memcpy(), producing an OOB access of length close to
SIZE_MAX and a host kernel panic.

An IPv4 header with ihl &lt; 5 is malformed by definition (RFC 791:
"Internet Header Length is the length of the internet header in
32 bit words ... Note that the minimum value for a correct header
is 5.").  The kernel should not be willing to inject such a
packet into its own output path.

Reject "iphlen &lt; sizeof(*iph)" alongside the existing
"iphlen &gt; length" check.  This matches the principle that locally
constructed packets that re-enter the IP stack must pass the same
basic sanity tests that a foreign packet would be subjected to.

Once this lands, the "if (iphlen &gt;= sizeof(*iph))" wrapper around
the fixup branch becomes redundant; left in place to keep the
patch minimal and backport-friendly.  A follow-up can unwrap it.

Note that commit 86f4c90a1c5c ("ipv4, ipv6: ensure raw socket
message is big enough to hold an IP header") ensures the message
buffer is large enough to hold an iphdr, but does not constrain
the self-reported iph-&gt;ihl.

Reachability: the malformed packet source is any caller with
CAP_NET_RAW, including an unprivileged process in a user+net
namespace on a kernel with CONFIG_USER_NS=y.  The reproduced AH
crash also requires a matching xfrm AH policy on the outgoing
route; a container granted CAP_NET_ADMIN can install that state
and policy in its netns.  Loopback bypasses xfrm_output, so the
trigger uses a real netdev.

Reproduced on UML + KASAN: kernel-mode fault at addr 0x0 with
memcpy_orig at the crash site.  Same shape reproduces inside a
rootless Docker container with --cap-add NET_ADMIN on a stock
distro kernel.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Suggested-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Michael Bommarito &lt;michael.bommarito@gmail.com&gt;
Link: https://patch.msgid.link/77ec2b5e8111961c2c39883c92e8aa2709039c17.1778614451.git.michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
raw_send_hdrinc() validates that the caller-supplied IPv4 header
fits within the message length:

    iphlen = iph-&gt;ihl * 4;
    err = -EINVAL;
    if (iphlen &gt; length)
        goto error_free;

    if (iphlen &gt;= sizeof(*iph)) {
        /* fix up saddr, tot_len, id, csum, transport_header */
    }

It does not, however, reject ihl &lt; 5.  For such a packet the
"if (iphlen &gt;= sizeof(*iph))" branch is skipped, leaving the
crafted iphdr untouched, but the packet is still handed to
__ip_local_out() and onward.  Downstream consumers that read
iph-&gt;ihl assume a sane value: net/ipv4/ah4.c:ah_output() in
particular subtracts sizeof(struct iphdr) from top_iph-&gt;ihl * 4
and passes the (signed-int-negative, then cast to size_t)
result to memcpy(), producing an OOB access of length close to
SIZE_MAX and a host kernel panic.

An IPv4 header with ihl &lt; 5 is malformed by definition (RFC 791:
"Internet Header Length is the length of the internet header in
32 bit words ... Note that the minimum value for a correct header
is 5.").  The kernel should not be willing to inject such a
packet into its own output path.

Reject "iphlen &lt; sizeof(*iph)" alongside the existing
"iphlen &gt; length" check.  This matches the principle that locally
constructed packets that re-enter the IP stack must pass the same
basic sanity tests that a foreign packet would be subjected to.

Once this lands, the "if (iphlen &gt;= sizeof(*iph))" wrapper around
the fixup branch becomes redundant; left in place to keep the
patch minimal and backport-friendly.  A follow-up can unwrap it.

Note that commit 86f4c90a1c5c ("ipv4, ipv6: ensure raw socket
message is big enough to hold an IP header") ensures the message
buffer is large enough to hold an iphdr, but does not constrain
the self-reported iph-&gt;ihl.

Reachability: the malformed packet source is any caller with
CAP_NET_RAW, including an unprivileged process in a user+net
namespace on a kernel with CONFIG_USER_NS=y.  The reproduced AH
crash also requires a matching xfrm AH policy on the outgoing
route; a container granted CAP_NET_ADMIN can install that state
and policy in its netns.  Loopback bypasses xfrm_output, so the
trigger uses a real netdev.

Reproduced on UML + KASAN: kernel-mode fault at addr 0x0 with
memcpy_orig at the crash site.  Same shape reproduces inside a
rootless Docker container with --cap-add NET_ADMIN on a stock
distro kernel.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Suggested-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Michael Bommarito &lt;michael.bommarito@gmail.com&gt;
Link: https://patch.msgid.link/77ec2b5e8111961c2c39883c92e8aa2709039c17.1778614451.git.michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2026-05-14T15:57:43+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-05-14T15:57:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=66182ca873a4e87b3496eca79d57f86b76d7f52d'/>
<id>66182ca873a4e87b3496eca79d57f86b76d7f52d</id>
<content type='text'>
Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Previous releases - regressions:

   - ethtool: fix NULL pointer dereference in phy_reply_size

   - netfilter:
      - allocate hook ops while under mutex
      - close dangling table module init race
      - restore nf_conntrack helper propagation via expectation

   - tcp:
      - fix potential UAF in reqsk_timer_handler().
      - fix out-of-bounds access for twsk in tcp_ao_established_key().

   - vsock: fix empty payload in tap skb for non-linear buffers

   - hsr: fix NULL pointer dereference in hsr_get_node_data()

   - eth:
      - cortina: fix RX drop accounting
      - ice: fix locking in ice_dcb_rebuild()

  Previous releases - always broken:

   - napi: avoid gro timer misfiring at end of busypoll

   - sched:
      - dualpi2: initialize timer earlier in dualpi2_init()
      - sch_cbs: Call qdisc_reset for child qdisc

   - shaper:
      - fix ordering issue in net_shaper_commit()
      - reject handle IDs exceeding internal bit-width

   - ipv6: flowlabel: enforce per-netns limit for unprivileged callers

   - tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring

   - smc: avoid NULL deref of conn-&gt;lnk in smc_msg_event tracepoint

   - sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL

   - batman-adv:
      - reject new tp_meter sessions during teardown
      - purge non-released claims

   - eth:
      - i40e: cleanup PTP registration on probe failure
      - idpf: fix double free and use-after-free in aux device error paths
      - ena: fix potential use-after-free in get_timestamp"

* tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  net: phy: DP83TC811: add reading of abilities
  net: tls: prevent chain-after-chain in plain text SG
  net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
  net/smc: reject CHID-0 ACCEPT that matches an empty ism_dev slot
  macsec: use rcu_work to defer TX SA crypto cleanup out of softirq
  macsec: use rcu_work to defer RX SA crypto cleanup out of softirq
  macsec: introduce dedicated workqueue for SA crypto cleanup
  net: net_failover: Fix the deadlock in slave register
  MAINTAINERS: update atlantic driver maintainer
  selftests/tc-testing: Add QFQ/CBS qlen underflow test
  net/sched: sch_cbs: Call qdisc_reset for child qdisc
  FDDI: defza: Sanitise the reset safety timer
  net: ethernet: ravb: Do not check URAM suspension when WoL is active
  ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
  net/smc: avoid NULL deref of conn-&gt;lnk in smc_msg_event tracepoint
  net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
  net: atm: fix skb leak in sigd_send() default branch
  net: ethtool: phy: avoid NULL deref when PHY driver is unbound
  net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
  net: shaper: reject QUEUE scope handle with missing id
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Previous releases - regressions:

   - ethtool: fix NULL pointer dereference in phy_reply_size

   - netfilter:
      - allocate hook ops while under mutex
      - close dangling table module init race
      - restore nf_conntrack helper propagation via expectation

   - tcp:
      - fix potential UAF in reqsk_timer_handler().
      - fix out-of-bounds access for twsk in tcp_ao_established_key().

   - vsock: fix empty payload in tap skb for non-linear buffers

   - hsr: fix NULL pointer dereference in hsr_get_node_data()

   - eth:
      - cortina: fix RX drop accounting
      - ice: fix locking in ice_dcb_rebuild()

  Previous releases - always broken:

   - napi: avoid gro timer misfiring at end of busypoll

   - sched:
      - dualpi2: initialize timer earlier in dualpi2_init()
      - sch_cbs: Call qdisc_reset for child qdisc

   - shaper:
      - fix ordering issue in net_shaper_commit()
      - reject handle IDs exceeding internal bit-width

   - ipv6: flowlabel: enforce per-netns limit for unprivileged callers

   - tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring

   - smc: avoid NULL deref of conn-&gt;lnk in smc_msg_event tracepoint

   - sctp: revalidate list cursor after sctp_sendmsg_to_asoc() in SCTP_SENDALL

   - batman-adv:
      - reject new tp_meter sessions during teardown
      - purge non-released claims

   - eth:
      - i40e: cleanup PTP registration on probe failure
      - idpf: fix double free and use-after-free in aux device error paths
      - ena: fix potential use-after-free in get_timestamp"

* tag 'net-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  net: phy: DP83TC811: add reading of abilities
  net: tls: prevent chain-after-chain in plain text SG
  net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
  net/smc: reject CHID-0 ACCEPT that matches an empty ism_dev slot
  macsec: use rcu_work to defer TX SA crypto cleanup out of softirq
  macsec: use rcu_work to defer RX SA crypto cleanup out of softirq
  macsec: introduce dedicated workqueue for SA crypto cleanup
  net: net_failover: Fix the deadlock in slave register
  MAINTAINERS: update atlantic driver maintainer
  selftests/tc-testing: Add QFQ/CBS qlen underflow test
  net/sched: sch_cbs: Call qdisc_reset for child qdisc
  FDDI: defza: Sanitise the reset safety timer
  net: ethernet: ravb: Do not check URAM suspension when WoL is active
  ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
  net/smc: avoid NULL deref of conn-&gt;lnk in smc_msg_event tracepoint
  net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
  net: atm: fix skb leak in sigd_send() default branch
  net: ethtool: phy: avoid NULL deref when PHY driver is unbound
  net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
  net: shaper: reject QUEUE scope handle with missing id
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Fix out-of-bounds access for twsk in tcp_ao_established_key().</title>
<updated>2026-05-12T00:50:15+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2026-05-08T12:08:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=03cb001ef87b3f8d859cf7f96329acf3d6235d29'/>
<id>03cb001ef87b3f8d859cf7f96329acf3d6235d29</id>
<content type='text'>
lockdep_sock_is_held() was added in tcp_ao_established_key()
by the cited commit.

It can be called from tcp_v[46]_timewait_ack() with twsk.

Since it does not have sk-&gt;sk_lock, the lockdep annotation
results in out-of-bound access.

  $ pahole -C tcp_timewait_sock vmlinux | grep size
  	/* size: 288, cachelines: 5, members: 8 */
  $ pahole -C sock vmlinux | grep sk_lock
  	socket_lock_t              sk_lock;              /*   440   192 */

Let's not use lockdep_sock_is_held() for TCP_TIME_WAIT.

Fixes: 6b2d11e2d8fc ("net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals")
Reported-by: Damiano Melotti &lt;melotti@google.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20260508120853.4098365-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
lockdep_sock_is_held() was added in tcp_ao_established_key()
by the cited commit.

It can be called from tcp_v[46]_timewait_ack() with twsk.

Since it does not have sk-&gt;sk_lock, the lockdep annotation
results in out-of-bound access.

  $ pahole -C tcp_timewait_sock vmlinux | grep size
  	/* size: 288, cachelines: 5, members: 8 */
  $ pahole -C sock vmlinux | grep sk_lock
  	socket_lock_t              sk_lock;              /*   440   192 */

Let's not use lockdep_sock_is_held() for TCP_TIME_WAIT.

Fixes: 6b2d11e2d8fc ("net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals")
Reported-by: Damiano Melotti &lt;melotti@google.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20260508120853.4098365-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2026-05-10T01:42:54+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-05-10T01:42:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=515186b7be488f37c63c2436fc2d1a160ef9bb95'/>
<id>515186b7be488f37c63c2436fc2d1a160ef9bb95</id>
<content type='text'>
Pull bpf fixes from Alexei Starovoitov:

 - Fix sk_local_storage diag dump via netlink (Amery Hung)

 - Fix off-by-one in arena direct-value access (Junyoung Jang)

 - Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)

 - Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)

 - Reject TX-only AF_XDP sockets (Linpu Yu)

 - Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)

 - Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
   (Weiming Shi)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix off-by-one boundary validation in arena direct-value access
  xskmap: reject TX-only AF_XDP sockets
  bpf: Don't run arg-tracking analysis twice on main subprog
  bpf: Free reuseport cBPF prog after RCU grace period.
  bpf: tcp: Fix type confusion in sol_tcp_sockopt().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
  mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
  selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
  bpf: tcp: Fix type confusion in bpf_tcp_sock().
  tools/headers: Regenerate stddef.h to fix BPF selftests
  bpf: Fix sk_local_storage diag dumping uninitialized special fields
  bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
  sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
  bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
  selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  bpf: Reject TCP_NODELAY in bpf-tcp-cc
  bpf: Reject TCP_NODELAY in TCP header option callbacks
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull bpf fixes from Alexei Starovoitov:

 - Fix sk_local_storage diag dump via netlink (Amery Hung)

 - Fix off-by-one in arena direct-value access (Junyoung Jang)

 - Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)

 - Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)

 - Reject TX-only AF_XDP sockets (Linpu Yu)

 - Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)

 - Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
   (Weiming Shi)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix off-by-one boundary validation in arena direct-value access
  xskmap: reject TX-only AF_XDP sockets
  bpf: Don't run arg-tracking analysis twice on main subprog
  bpf: Free reuseport cBPF prog after RCU grace period.
  bpf: tcp: Fix type confusion in sol_tcp_sockopt().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
  mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
  selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
  bpf: tcp: Fix type confusion in bpf_tcp_sock().
  tools/headers: Regenerate stddef.h to fix BPF selftests
  bpf: Fix sk_local_storage diag dumping uninitialized special fields
  bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
  sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
  bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
  selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  bpf: Reject TCP_NODELAY in bpf-tcp-cc
  bpf: Reject TCP_NODELAY in TCP header option callbacks
</pre>
</div>
</content>
</entry>
</feed>
