<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4/datagram.c, branch v5.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ipv4: Avoid using RTO_ONLINK with ip_route_connect().</title>
<updated>2022-04-22T12:06:03+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2022-04-20T23:21:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=67e1e2f4854bb2c0dd2b8440cf090016a0e1a091'/>
<id>67e1e2f4854bb2c0dd2b8440cf090016a0e1a091</id>
<content type='text'>
Now that ip_rt_fix_tos() doesn't reset -&gt;flowi4_scope unconditionally,
we don't have to rely on the RTO_ONLINK bit to properly set the scope
of a flowi4 structure. We can just set -&gt;flowi4_scope explicitly and
avoid using RTO_ONLINK in -&gt;flowi4_tos.

This patch converts callers of ip_route_connect(). Instead of setting
the tos parameter with RT_CONN_FLAGS(sk), as all callers do, we can:

  1- Drop the tos parameter from ip_route_connect(): its value was
     entirely based on sk, which is also passed as parameter.

  2- Set -&gt;flowi4_scope depending on the SOCK_LOCALROUTE socket option
     instead of always initialising it with RT_SCOPE_UNIVERSE (let's
     define ip_sock_rt_scope() for this purpose).

  3- Avoid overloading -&gt;flowi4_tos with RTO_ONLINK: since the scope is
     now properly initialised, we don't need to tell ip_rt_fix_tos() to
     adjust -&gt;flowi4_scope for us. So let's define ip_sock_rt_tos(),
     which is the same as RT_CONN_FLAGS() but without the RTO_ONLINK
     bit overload.

Note:
  In the original ip_route_connect() code, __ip_route_output_key()
  might clear the RTO_ONLINK bit of fl4-&gt;flowi4_tos (because of
  ip_rt_fix_tos()). Therefore flowi4_update_output() had to reuse the
  original tos variable. Now that we don't set RTO_ONLINK any more,
  this is not a problem and we can use fl4-&gt;flowi4_tos in
  flowi4_update_output().

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that ip_rt_fix_tos() doesn't reset -&gt;flowi4_scope unconditionally,
we don't have to rely on the RTO_ONLINK bit to properly set the scope
of a flowi4 structure. We can just set -&gt;flowi4_scope explicitly and
avoid using RTO_ONLINK in -&gt;flowi4_tos.

This patch converts callers of ip_route_connect(). Instead of setting
the tos parameter with RT_CONN_FLAGS(sk), as all callers do, we can:

  1- Drop the tos parameter from ip_route_connect(): its value was
     entirely based on sk, which is also passed as parameter.

  2- Set -&gt;flowi4_scope depending on the SOCK_LOCALROUTE socket option
     instead of always initialising it with RT_SCOPE_UNIVERSE (let's
     define ip_sock_rt_scope() for this purpose).

  3- Avoid overloading -&gt;flowi4_tos with RTO_ONLINK: since the scope is
     now properly initialised, we don't need to tell ip_rt_fix_tos() to
     adjust -&gt;flowi4_scope for us. So let's define ip_sock_rt_tos(),
     which is the same as RT_CONN_FLAGS() but without the RTO_ONLINK
     bit overload.

Note:
  In the original ip_route_connect() code, __ip_route_output_key()
  might clear the RTO_ONLINK bit of fl4-&gt;flowi4_tos (because of
  ip_rt_fix_tos()). Therefore flowi4_update_output() had to reuse the
  original tos variable. Now that we don't set RTO_ONLINK any more,
  this is not a problem and we can use fl4-&gt;flowi4_tos in
  flowi4_update_output().

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/ipv4/datagram.c: remove superfluous header files from datagram.c</title>
<updated>2021-09-29T10:39:33+00:00</updated>
<author>
<name>Mianhan Liu</name>
<email>liumh1@shanghaitech.edu.cn</email>
</author>
<published>2021-09-29T05:31:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6a832a6c72b9365299406f13c799b63dafe03677'/>
<id>6a832a6c72b9365299406f13c799b63dafe03677</id>
<content type='text'>
datagram.c hasn't use any macro or function declared in linux/ip.h.
Thus, these files can be removed from datagram.c safely without
affecting the compilation of the net/ipv4 module

Signed-off-by: Mianhan Liu &lt;liumh1@shanghaitech.edu.cn&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
datagram.c hasn't use any macro or function declared in linux/ip.h.
Thus, these files can be removed from datagram.c safely without
affecting the compilation of the net/ipv4 module

Signed-off-by: Mianhan Liu &lt;liumh1@shanghaitech.edu.cn&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: stop leaking jiffies on the wire</title>
<updated>2019-11-01T21:57:52+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-11-01T17:32:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a904a0693c189691eeee64f6c6b188bd7dc244e9'/>
<id>a904a0693c189691eeee64f6c6b188bd7dc244e9</id>
<content type='text'>
Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.

RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.

Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.

Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.

Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Thiemo Nagel &lt;tnagel@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.

RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.

Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.

Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.

Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Thiemo Nagel &lt;tnagel@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: correct reuseport selection with connected sockets</title>
<updated>2019-09-16T07:02:18+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2019-09-13T01:16:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=acdcecc61285faed359f1a3568c32089cc3a8329'/>
<id>acdcecc61285faed359f1a3568c32089cc3a8329</id>
<content type='text'>
UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.

Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.

Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.

New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.

Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.

Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.

New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152</title>
<updated>2019-05-30T18:26:32+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-27T06:55:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2874c5fd284268364ece81a7bd936f3c8168e567'/>
<id>2874c5fd284268364ece81a7bd936f3c8168e567</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Allow sending multicast packets on specific i/f using VRF socket</title>
<updated>2018-10-03T05:28:17+00:00</updated>
<author>
<name>Robert Shearman</name>
<email>rshearma@vyatta.att-mail.com</email>
</author>
<published>2018-10-01T08:40:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=854da991733d1b4f60042423875e32be7f8f0421'/>
<id>854da991733d1b4f60042423875e32be7f8f0421</id>
<content type='text'>
It is useful to be able to use the same socket for listening in a
specific VRF, as for sending multicast packets out of a specific
interface. However, the bound device on the socket currently takes
precedence and results in the packets not being sent.

Relax the condition on overriding the output interface to use for
sending packets out of UDP, raw and ping sockets to allow multicast
packets to be sent using the specified multicast interface.

Signed-off-by: Robert Shearman &lt;rshearma@vyatta.att-mail.com&gt;
Signed-off-by: Mike Manning &lt;mmanning@vyatta.att-mail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is useful to be able to use the same socket for listening in a
specific VRF, as for sending multicast packets out of a specific
interface. However, the bound device on the socket currently takes
precedence and results in the packets not being sent.

Relax the condition on overriding the output interface to use for
sending packets out of UDP, raw and ping sockets to allow multicast
packets to be sent using the specified multicast interface.

Signed-off-by: Robert Shearman &lt;rshearma@vyatta.att-mail.com&gt;
Signed-off-by: Mike Manning &lt;mmanning@vyatta.att-mail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Set sk_txhash from a random number</title>
<updated>2015-07-30T05:44:04+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>tom@herbertland.com</email>
</author>
<published>2015-07-28T23:02:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=877d1f6291f8e391237e324be58479a3e3a7407c'/>
<id>877d1f6291f8e391237e324be58479a3e3a7407c</id>
<content type='text'>
This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.

Signed-off-by: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.

Signed-off-by: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: lock socket in ip6_datagram_connect()</title>
<updated>2015-07-16T00:25:51+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-07-14T06:10:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=03645a11a570d52e70631838cb786eb4253eb463'/>
<id>03645a11a570d52e70631838cb786eb4253eb463</id>
<content type='text'>
ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)-&gt;udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)-&gt;udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Save TX flow hash in sock and set in skbuf on xmit</title>
<updated>2014-07-08T04:14:21+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2014-07-02T04:32:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b73c3d0e4f0e1961e15bec18720e48aabebe2109'/>
<id>b73c3d0e4f0e1961e15bec18720e48aabebe2109</id>
<content type='text'>
For a connected socket we can precompute the flow hash for setting
in skb-&gt;hash on output. This is a performance advantage over
calculating the skb-&gt;hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb-&gt;hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For a connected socket we can precompute the flow hash for setting
in skb-&gt;hash on output. This is a performance advantage over
calculating the skb-&gt;hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb-&gt;hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fix a race in ip4_datagram_release_cb()</title>
<updated>2014-06-11T22:39:18+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-10T13:43:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9709674e68646cee5a24e3000b3558d25412203a'/>
<id>9709674e68646cee5a24e3000b3558d25412203a</id>
<content type='text'>
Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk-&gt;sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [&lt;ffffffff817daa3a&gt;] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [&lt;ffffffff8175b789&gt;] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [&lt;ffffffff81830a36&gt;] ip4_datagram_release_cb+0x46/0x390 ??:0
 [&lt;ffffffff8175eaea&gt;] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [&lt;ffffffff81830882&gt;] ip4_datagram_connect+0x462/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [&lt;ffffffff8178d9b8&gt;] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [&lt;ffffffff8178de25&gt;] dst_release+0x45/0x80 ./net/core/dst.c:280
 [&lt;ffffffff818304c1&gt;] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [&lt;ffffffff8178d291&gt;] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [&lt;ffffffff817db3b7&gt;] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [&lt;     inlined    &gt;] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [&lt;ffffffff817dde08&gt;] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [&lt;ffffffff817deb34&gt;] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [&lt;ffffffff81830737&gt;] ip4_datagram_connect+0x317/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
&lt;4&gt;[196727.311203] general protection fault: 0000 [#1] SMP
&lt;4&gt;[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
&lt;4&gt;[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
&lt;4&gt;[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
&lt;4&gt;[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
&lt;4&gt;[196727.311377] RIP: 0010:[&lt;ffffffff815f8c7f&gt;]  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
&lt;4&gt;[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
&lt;4&gt;[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
&lt;4&gt;[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
&lt;4&gt;[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
&lt;4&gt;[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
&lt;4&gt;[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
&lt;4&gt;[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
&lt;4&gt;[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
&lt;4&gt;[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
&lt;4&gt;[196727.311713] Stack:
&lt;4&gt;[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
&lt;4&gt;[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
&lt;4&gt;[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
&lt;4&gt;[196727.311885] Call Trace:
&lt;4&gt;[196727.311907]  &lt;IRQ&gt;
&lt;4&gt;[196727.311912]  [&lt;ffffffff815b7f42&gt;] dst_destroy+0x32/0xe0
&lt;4&gt;[196727.311959]  [&lt;ffffffff815b86c6&gt;] dst_release+0x56/0x80
&lt;4&gt;[196727.311986]  [&lt;ffffffff81620bd5&gt;] tcp_v4_do_rcv+0x2a5/0x4a0
&lt;4&gt;[196727.312013]  [&lt;ffffffff81622b5a&gt;] tcp_v4_rcv+0x7da/0x820
&lt;4&gt;[196727.312041]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312070]  [&lt;ffffffff815de02d&gt;] ? nf_hook_slow+0x7d/0x150
&lt;4&gt;[196727.312097]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312125]  [&lt;ffffffff815fda92&gt;] ip_local_deliver_finish+0xb2/0x230
&lt;4&gt;[196727.312154]  [&lt;ffffffff815fdd9a&gt;] ip_local_deliver+0x4a/0x90
&lt;4&gt;[196727.312183]  [&lt;ffffffff815fd799&gt;] ip_rcv_finish+0x119/0x360
&lt;4&gt;[196727.312212]  [&lt;ffffffff815fe00b&gt;] ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312242]  [&lt;ffffffffa0339680&gt;] ? macvlan_broadcast+0x160/0x160 [macvlan]
&lt;4&gt;[196727.312275]  [&lt;ffffffff815b0c62&gt;] __netif_receive_skb_core+0x512/0x640
&lt;4&gt;[196727.312308]  [&lt;ffffffff811427fb&gt;] ? kmem_cache_alloc+0x13b/0x150
&lt;4&gt;[196727.312338]  [&lt;ffffffff815b0db1&gt;] __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312368]  [&lt;ffffffff815b0fa1&gt;] netif_receive_skb+0x31/0xa0
&lt;4&gt;[196727.312397]  [&lt;ffffffff815b1ae8&gt;] napi_gro_receive+0xe8/0x140
&lt;4&gt;[196727.312433]  [&lt;ffffffffa00274f1&gt;] ixgbe_poll+0x551/0x11f0 [ixgbe]
&lt;4&gt;[196727.312463]  [&lt;ffffffff815fe00b&gt;] ? ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312491]  [&lt;ffffffff815b1691&gt;] net_rx_action+0x111/0x210
&lt;4&gt;[196727.312521]  [&lt;ffffffff815b0db1&gt;] ? __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312552]  [&lt;ffffffff810519d0&gt;] __do_softirq+0xd0/0x270
&lt;4&gt;[196727.312583]  [&lt;ffffffff816cef3c&gt;] call_softirq+0x1c/0x30
&lt;4&gt;[196727.312613]  [&lt;ffffffff81004205&gt;] do_softirq+0x55/0x90
&lt;4&gt;[196727.312640]  [&lt;ffffffff81051c85&gt;] irq_exit+0x55/0x60
&lt;4&gt;[196727.312668]  [&lt;ffffffff816cf5c3&gt;] do_IRQ+0x63/0xe0
&lt;4&gt;[196727.312696]  [&lt;ffffffff816c5aaa&gt;] common_interrupt+0x6a/0x6a
&lt;4&gt;[196727.312722]  &lt;EOI&gt;
&lt;1&gt;[196727.313071] RIP  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.313100]  RSP &lt;ffff885effd23a70&gt;
&lt;4&gt;[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
&lt;0&gt;[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Alexey Preobrazhensky &lt;preobr@google.com&gt;
Reported-by: dormando &lt;dormando@rydia.ne&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk-&gt;sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [&lt;ffffffff817daa3a&gt;] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [&lt;ffffffff8175b789&gt;] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [&lt;ffffffff81830a36&gt;] ip4_datagram_release_cb+0x46/0x390 ??:0
 [&lt;ffffffff8175eaea&gt;] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [&lt;ffffffff81830882&gt;] ip4_datagram_connect+0x462/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [&lt;ffffffff8178d9b8&gt;] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [&lt;ffffffff8178de25&gt;] dst_release+0x45/0x80 ./net/core/dst.c:280
 [&lt;ffffffff818304c1&gt;] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [&lt;ffffffff8178d291&gt;] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [&lt;ffffffff817db3b7&gt;] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [&lt;     inlined    &gt;] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [&lt;ffffffff817dde08&gt;] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [&lt;ffffffff817deb34&gt;] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [&lt;ffffffff81830737&gt;] ip4_datagram_connect+0x317/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
&lt;4&gt;[196727.311203] general protection fault: 0000 [#1] SMP
&lt;4&gt;[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
&lt;4&gt;[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
&lt;4&gt;[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
&lt;4&gt;[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
&lt;4&gt;[196727.311377] RIP: 0010:[&lt;ffffffff815f8c7f&gt;]  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
&lt;4&gt;[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
&lt;4&gt;[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
&lt;4&gt;[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
&lt;4&gt;[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
&lt;4&gt;[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
&lt;4&gt;[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
&lt;4&gt;[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
&lt;4&gt;[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
&lt;4&gt;[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
&lt;4&gt;[196727.311713] Stack:
&lt;4&gt;[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
&lt;4&gt;[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
&lt;4&gt;[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
&lt;4&gt;[196727.311885] Call Trace:
&lt;4&gt;[196727.311907]  &lt;IRQ&gt;
&lt;4&gt;[196727.311912]  [&lt;ffffffff815b7f42&gt;] dst_destroy+0x32/0xe0
&lt;4&gt;[196727.311959]  [&lt;ffffffff815b86c6&gt;] dst_release+0x56/0x80
&lt;4&gt;[196727.311986]  [&lt;ffffffff81620bd5&gt;] tcp_v4_do_rcv+0x2a5/0x4a0
&lt;4&gt;[196727.312013]  [&lt;ffffffff81622b5a&gt;] tcp_v4_rcv+0x7da/0x820
&lt;4&gt;[196727.312041]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312070]  [&lt;ffffffff815de02d&gt;] ? nf_hook_slow+0x7d/0x150
&lt;4&gt;[196727.312097]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312125]  [&lt;ffffffff815fda92&gt;] ip_local_deliver_finish+0xb2/0x230
&lt;4&gt;[196727.312154]  [&lt;ffffffff815fdd9a&gt;] ip_local_deliver+0x4a/0x90
&lt;4&gt;[196727.312183]  [&lt;ffffffff815fd799&gt;] ip_rcv_finish+0x119/0x360
&lt;4&gt;[196727.312212]  [&lt;ffffffff815fe00b&gt;] ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312242]  [&lt;ffffffffa0339680&gt;] ? macvlan_broadcast+0x160/0x160 [macvlan]
&lt;4&gt;[196727.312275]  [&lt;ffffffff815b0c62&gt;] __netif_receive_skb_core+0x512/0x640
&lt;4&gt;[196727.312308]  [&lt;ffffffff811427fb&gt;] ? kmem_cache_alloc+0x13b/0x150
&lt;4&gt;[196727.312338]  [&lt;ffffffff815b0db1&gt;] __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312368]  [&lt;ffffffff815b0fa1&gt;] netif_receive_skb+0x31/0xa0
&lt;4&gt;[196727.312397]  [&lt;ffffffff815b1ae8&gt;] napi_gro_receive+0xe8/0x140
&lt;4&gt;[196727.312433]  [&lt;ffffffffa00274f1&gt;] ixgbe_poll+0x551/0x11f0 [ixgbe]
&lt;4&gt;[196727.312463]  [&lt;ffffffff815fe00b&gt;] ? ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312491]  [&lt;ffffffff815b1691&gt;] net_rx_action+0x111/0x210
&lt;4&gt;[196727.312521]  [&lt;ffffffff815b0db1&gt;] ? __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312552]  [&lt;ffffffff810519d0&gt;] __do_softirq+0xd0/0x270
&lt;4&gt;[196727.312583]  [&lt;ffffffff816cef3c&gt;] call_softirq+0x1c/0x30
&lt;4&gt;[196727.312613]  [&lt;ffffffff81004205&gt;] do_softirq+0x55/0x90
&lt;4&gt;[196727.312640]  [&lt;ffffffff81051c85&gt;] irq_exit+0x55/0x60
&lt;4&gt;[196727.312668]  [&lt;ffffffff816cf5c3&gt;] do_IRQ+0x63/0xe0
&lt;4&gt;[196727.312696]  [&lt;ffffffff816c5aaa&gt;] common_interrupt+0x6a/0x6a
&lt;4&gt;[196727.312722]  &lt;EOI&gt;
&lt;1&gt;[196727.313071] RIP  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.313100]  RSP &lt;ffff885effd23a70&gt;
&lt;4&gt;[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
&lt;0&gt;[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Alexey Preobrazhensky &lt;preobr@google.com&gt;
Reported-by: dormando &lt;dormando@rydia.ne&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
