<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/skbuff.h, branch v4.9.243</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>skbuff: fix a data race in skb_queue_len()</title>
<updated>2020-10-01T18:40:06+00:00</updated>
<author>
<name>Qian Cai</name>
<email>cai@lca.pw</email>
</author>
<published>2020-02-04T18:40:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ce5d34a31676bb33b578577d64dcf4111a03eceb'/>
<id>ce5d34a31676bb33b578577d64dcf4111a03eceb</id>
<content type='text'>
[ Upstream commit 86b18aaa2b5b5bb48e609cd591b3d2d0fdbe0442 ]

sk_buff.qlen can be accessed concurrently as noticed by KCSAN,

 BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg

 read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
  unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
				 net/unix/af_unix.c:1761
  ____sys_sendmsg+0x33e/0x370
  ___sys_sendmsg+0xa6/0xf0
  __sys_sendmsg+0x69/0xf0
  __x64_sys_sendmsg+0x51/0x70
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
  __skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
  __skb_try_recv_datagram+0xbe/0x220
  unix_dgram_recvmsg+0xee/0x850
  ____sys_recvmsg+0x1fb/0x210
  ___sys_recvmsg+0xa2/0xf0
  __sys_recvmsg+0x66/0xf0
  __x64_sys_recvmsg+0x51/0x70
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Since only the read is operating as lockless, it could introduce a logic
bug in unix_recvq_full() due to the load tearing. Fix it by adding
a lockless variant of skb_queue_len() and unix_recvq_full() where
READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
the commit d7d16a89350a ("net: add skb_queue_empty_lockless()").

Signed-off-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 86b18aaa2b5b5bb48e609cd591b3d2d0fdbe0442 ]

sk_buff.qlen can be accessed concurrently as noticed by KCSAN,

 BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg

 read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
  unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
				 net/unix/af_unix.c:1761
  ____sys_sendmsg+0x33e/0x370
  ___sys_sendmsg+0xa6/0xf0
  __sys_sendmsg+0x69/0xf0
  __x64_sys_sendmsg+0x51/0x70
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
  __skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
  __skb_try_recv_datagram+0xbe/0x220
  unix_dgram_recvmsg+0xee/0x850
  ____sys_recvmsg+0x1fb/0x210
  ___sys_recvmsg+0xa2/0xf0
  __sys_recvmsg+0x66/0xf0
  __x64_sys_recvmsg+0x51/0x70
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Since only the read is operating as lockless, it could introduce a logic
bug in unix_recvq_full() due to the load tearing. Fix it by adding
a lockless variant of skb_queue_len() and unix_recvq_full() where
READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
the commit d7d16a89350a ("net: add skb_queue_empty_lockless()").

Signed-off-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: add __must_check to skb_put_padto()</title>
<updated>2020-10-01T18:40:01+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2020-09-09T08:27:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=48668eb1ede2fff6648420a8e8f626622e29b1ee'/>
<id>48668eb1ede2fff6648420a8e8f626622e29b1ee</id>
<content type='text'>
[ Upstream commit 4a009cb04aeca0de60b73f37b102573354214b52 ]

skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 4a009cb04aeca0de60b73f37b102573354214b52 ]

skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/flow_dissector: switch to siphash</title>
<updated>2019-11-10T10:23:31+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-10-22T14:57:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1f94465d13ace2d4610c4eb2b362454ce2a9d87c'/>
<id>1f94465d13ace2d4610c4eb2b362454ce2a9d87c</id>
<content type='text'>
commit 55667441c84fa5e0911a0aac44fb059c15ba6da2 upstream.

UDP IPv6 packets auto flowlabels are using a 32bit secret
(static u32 hashrnd in net/core/flow_dissector.c) and
apply jhash() over fields known by the receivers.

Attackers can easily infer the 32bit secret and use this information
to identify a device and/or user, since this 32bit secret is only
set at boot time.

Really, using jhash() to generate cookies sent on the wire
is a serious security concern.

Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
a dead end. Trying to periodically change the secret (like in sch_sfq.c)
could change paths taken in the network for long lived flows.

Let's switch to siphash, as we did in commit df453700e8d8
("inet: switch IP ID generator to siphash")

Using a cryptographically strong pseudo random function will solve this
privacy issue and more generally remove other weak points in the stack.

Packet schedulers using skb_get_hash_perturb() benefit from this change.

Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default")
Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels")
Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb-&gt;hash in ip6_make_flowlabel")
Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Jonathan Berger &lt;jonathann1@walla.com&gt;
Reported-by: Amit Klein &lt;aksecurity@gmail.com&gt;
Reported-by: Benny Pinkas &lt;benny@pinkas.net&gt;
Cc: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Mahesh Bandewar &lt;maheshb@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 55667441c84fa5e0911a0aac44fb059c15ba6da2 upstream.

UDP IPv6 packets auto flowlabels are using a 32bit secret
(static u32 hashrnd in net/core/flow_dissector.c) and
apply jhash() over fields known by the receivers.

Attackers can easily infer the 32bit secret and use this information
to identify a device and/or user, since this 32bit secret is only
set at boot time.

Really, using jhash() to generate cookies sent on the wire
is a serious security concern.

Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
a dead end. Trying to periodically change the secret (like in sch_sfq.c)
could change paths taken in the network for long lived flows.

Let's switch to siphash, as we did in commit df453700e8d8
("inet: switch IP ID generator to siphash")

Using a cryptographically strong pseudo random function will solve this
privacy issue and more generally remove other weak points in the stack.

Packet schedulers using skb_get_hash_perturb() benefit from this change.

Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default")
Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels")
Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb-&gt;hash in ip6_make_flowlabel")
Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Jonathan Berger &lt;jonathann1@walla.com&gt;
Reported-by: Amit Klein &lt;aksecurity@gmail.com&gt;
Reported-by: Benny Pinkas &lt;benny@pinkas.net&gt;
Cc: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Mahesh Bandewar &lt;maheshb@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: create skb_gso_validate_mac_len()</title>
<updated>2019-02-20T09:18:28+00:00</updated>
<author>
<name>Daniel Axtens</name>
<email>dja@axtens.net</email>
</author>
<published>2019-02-14T10:31:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=84d8c3a7e1a48df8d21d100181b1d97a3adc50e5'/>
<id>84d8c3a7e1a48df8d21d100181b1d97a3adc50e5</id>
<content type='text'>
commit 2b16f048729bf35e6c28a40cbfad07239f9dcd90 upstream

If you take a GSO skb, and split it into packets, will the MAC
length (L2 + L3 + L4 headers + payload) of those packets be small
enough to fit within a given length?

Move skb_gso_mac_seglen() to skbuff.h with other related functions
like skb_gso_network_seglen() so we can use it, and then create
skb_gso_validate_mac_len to do the full calculation.

Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[jwang: cherry pick for CVE-2018-1000026]
Signed-off-by: Jack Wang &lt;jinpu.wang@cloud.ionos.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2b16f048729bf35e6c28a40cbfad07239f9dcd90 upstream

If you take a GSO skb, and split it into packets, will the MAC
length (L2 + L3 + L4 headers + payload) of those packets be small
enough to fit within a given length?

Move skb_gso_mac_seglen() to skbuff.h with other related functions
like skb_gso_network_seglen() so we can use it, and then create
skb_gso_validate_mac_len to do the full calculation.

Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[jwang: cherry pick for CVE-2018-1000026]
Signed-off-by: Jack Wang &lt;jinpu.wang@cloud.ionos.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix usage of pskb_trim_rcsum</title>
<updated>2019-01-31T07:12:33+00:00</updated>
<author>
<name>Ross Lagerwall</name>
<email>ross.lagerwall@citrix.com</email>
</author>
<published>2019-01-17T15:34:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=04663e84f0429d3777ccb2c3f895fc8a37bddf31'/>
<id>04663e84f0429d3777ccb2c3f895fc8a37bddf31</id>
<content type='text'>
[ Upstream commit 6c57f0458022298e4da1729c67bd33ce41c14e7a ]

In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall &lt;ross.lagerwall@citrix.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 6c57f0458022298e4da1729c67bd33ce41c14e7a ]

In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall &lt;ross.lagerwall@citrix.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ip: use rb trees for IP frag queue.</title>
<updated>2018-10-18T07:13:26+00:00</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-10-10T19:30:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=10043954eadac2d8f8c1886190f7a7ee584ff939'/>
<id>10043954eadac2d8f8c1886190f7a7ee584ff939</id>
<content type='text'>
(commit fa0f527358bd900ef92f925878ed6bfbd51305cc upstream)

Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.

Tested:

- a follow-up patch contains a rather comprehensive ip defrag
  self-test (functional)
- ran neper `udp_stream -c -H &lt;host&gt; -F 100 -l 300 -T 20`:
    netstat --statistics
    Ip:
        282078937 total packets received
        0 forwarded
        0 incoming packets discarded
        946760 incoming packets delivered
        18743456 requests sent out
        101 fragments dropped after timeout
        282077129 reassemblies required
        944952 packets reassembled ok
        262734239 packet reassembles failed
   (The numbers/stats above are somewhat better re:
    reassemblies vs a kernel without this patchset. More
    comprehensive performance testing TBD).

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Reported-by: Juha-Matti Tilli &lt;juha-matti.tilli@iki.fi&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
(commit fa0f527358bd900ef92f925878ed6bfbd51305cc upstream)

Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.

Tested:

- a follow-up patch contains a rather comprehensive ip defrag
  self-test (functional)
- ran neper `udp_stream -c -H &lt;host&gt; -F 100 -l 300 -T 20`:
    netstat --statistics
    Ip:
        282078937 total packets received
        0 forwarded
        0 incoming packets discarded
        946760 incoming packets delivered
        18743456 requests sent out
        101 fragments dropped after timeout
        282077129 reassemblies required
        944952 packets reassembled ok
        262734239 packet reassembles failed
   (The numbers/stats above are somewhat better re:
    reassemblies vs a kernel without this patchset. More
    comprehensive performance testing TBD).

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Reported-by: Juha-Matti Tilli &lt;juha-matti.tilli@iki.fi&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: add rb_to_skb() and other rb tree helpers</title>
<updated>2018-10-18T07:13:26+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b475cf3bf1e8212b0287c6d15249e2c942693ae5'/>
<id>b475cf3bf1e8212b0287c6d15249e2c942693ae5</id>
<content type='text'>
Geeralize private netem_rb_to_skb()

TCP rtx queue will soon be converted to rb-tree,
so we will need skb_rbtree_walk() helpers.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Geeralize private netem_rb_to_skb()

TCP rtx queue will soon be converted to rb-tree,
so we will need skb_rbtree_walk() helpers.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends</title>
<updated>2018-10-18T07:13:26+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=791521e2e377f66ef5ee6e5002dec758234d8d32'/>
<id>791521e2e377f66ef5ee6e5002dec758234d8d32</id>
<content type='text'>
After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb-&gt;ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb-&gt;ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: modify skb_rbtree_purge to return the truesize of all purged skbs.</title>
<updated>2018-10-18T07:13:25+00:00</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-10-10T19:30:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=871695951ec6f6b0b1a258c9bb5336bfeffab409'/>
<id>871695951ec6f6b0b1a258c9bb5336bfeffab409</id>
<content type='text'>
Tested: see the next patch is the series.

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Tested: see the next patch is the series.

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: frags: get rid of ipfrag_skb_cb/FRAG_CB</title>
<updated>2018-10-18T07:13:25+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=316986fe4dcac011b4f85d9bbef1edf4953c0219'/>
<id>316986fe4dcac011b4f85d9bbef1edf4953c0219</id>
<content type='text'>
ip_defrag uses skb-&gt;cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb-&gt;next,
meaning that we use two cache lines per skb when finding the insertion point.

By aliasing skb-&gt;ip_defrag_offset and skb-&gt;dev, we pack all the fields
in a single cache line and save precious memory bandwidth.

Note that after the fast path added by Changli Gao in commit
d6bebca92c66 ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev-&gt;len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.

Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ip_defrag uses skb-&gt;cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb-&gt;next,
meaning that we use two cache lines per skb when finding the insertion point.

By aliasing skb-&gt;ip_defrag_offset and skb-&gt;dev, we pack all the fields
in a single cache line and save precious memory bandwidth.

Note that after the fast path added by Changli Gao in commit
d6bebca92c66 ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev-&gt;len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.

Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
