<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4, branch v3.16-rc5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ipv4: fix dst race in sk_dst_get()</title>
<updated>2014-06-26T00:41:44+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-24T17:05:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f88649721268999bdff09777847080a52004f691'/>
<id>f88649721268999bdff09777847080a52004f691</id>
<content type='text'>
When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Dormando &lt;dormando@rydia.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Dormando &lt;dormando@rydia.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb</title>
<updated>2014-06-20T03:50:49+00:00</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2014-06-19T01:15:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2cd0d743b05e87445c54ca124a9916f22f16742e'/>
<id>2cd0d743b05e87445c54ca124a9916f22f16742e</id>
<content type='text'>
If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)-&gt;seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len &gt; skb-&gt;len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len &gt;= skb-&gt;len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Ilpo Jarvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)-&gt;seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len &gt; skb-&gt;len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len &gt;= skb-&gt;len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Ilpo Jarvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: remove unnecessary tcp_sk assignment.</title>
<updated>2014-06-17T04:35:00+00:00</updated>
<author>
<name>Dave Jones</name>
<email>davej@redhat.com</email>
</author>
<published>2014-06-16T20:30:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=17846376f21c07c1b9ddfdef1a01bf3828fc1e06'/>
<id>17846376f21c07c1b9ddfdef1a01bf3828fc1e06</id>
<content type='text'>
This variable is overwritten by the child socket assignment before
it ever gets used.

Signed-off-by: Dave Jones &lt;davej@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This variable is overwritten by the child socket assignment before
it ever gets used.

Signed-off-by: Dave Jones &lt;davej@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup</title>
<updated>2014-06-13T22:39:24+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-12T23:13:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=63c6f81cdde58c41da62a8d8a209592e42a0203e'/>
<id>63c6f81cdde58c41da62a8d8a209592e42a0203e</id>
<content type='text'>
Its too easy to add thousand of UDP sockets on a particular bucket,
and slow down an innocent multicast receiver.

Early demux is supposed to be an optimization, we should avoid spending
too much time in it.

It is interesting to note __udp4_lib_demux_lookup() only tries to
match first socket in the chain.

10 is the threshold we already have in __udp4_lib_lookup() to switch
to secondary hash.

Fixes: 421b3885bf6d5 ("udp: ipv4: Add udp early demux")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: David Held &lt;drheld@google.com&gt;
Cc: Shawn Bohrer &lt;sbohrer@rgmadvisors.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Its too easy to add thousand of UDP sockets on a particular bucket,
and slow down an innocent multicast receiver.

Early demux is supposed to be an optimization, we should avoid spending
too much time in it.

It is interesting to note __udp4_lib_demux_lookup() only tries to
match first socket in the chain.

10 is the threshold we already have in __udp4_lib_lookup() to switch
to secondary hash.

Fixes: 421b3885bf6d5 ("udp: ipv4: Add udp early demux")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: David Held &lt;drheld@google.com&gt;
Cc: Shawn Bohrer &lt;sbohrer@rgmadvisors.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2014-06-12T21:27:40+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-06-12T21:27:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f9da455b93f6ba076935b4ef4589f61e529ae046'/>
<id>f9da455b93f6ba076935b4ef4589f61e529ae046</id>
<content type='text'>
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 &lt; v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 &lt; v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fixing TLP's FIN recovery</title>
<updated>2014-06-12T18:05:51+00:00</updated>
<author>
<name>Per Hurtig</name>
<email>per.hurtig@kau.se</email>
</author>
<published>2014-06-12T15:08:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bef1909ee3ed1ca39231b260a8d3b4544ecd0c8f'/>
<id>bef1909ee3ed1ca39231b260a8d3b4544ecd0c8f</id>
<content type='text'>
Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.

Signed-off-by: Per Hurtig &lt;per.hurtig@kau.se&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Nandita Dukkipati &lt;nanditad@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.

Signed-off-by: Per Hurtig &lt;per.hurtig@kau.se&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Nandita Dukkipati &lt;nanditad@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2014-06-11T23:02:55+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2014-06-11T23:02:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=902455e00720018d1dbd38327c3fd5bda6d844ee'/>
<id>902455e00720018d1dbd38327c3fd5bda6d844ee</id>
<content type='text'>
Conflicts:
	net/core/rtnetlink.c
	net/core/skbuff.c

Both conflicts were very simple overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	net/core/rtnetlink.c
	net/core/skbuff.c

Both conflicts were very simple overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Add skb_gro_postpull_rcsum to udp and vxlan</title>
<updated>2014-06-11T22:46:13+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2014-06-11T01:54:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6bae1d4cc395ad46613e40c9e865ee171dc9de5c'/>
<id>6bae1d4cc395ad46613e40c9e865ee171dc9de5c</id>
<content type='text'>
Need to gro_postpull_rcsum for GRO to work with checksum complete.

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Need to gro_postpull_rcsum for GRO to work with checksum complete.

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Save software checksum complete</title>
<updated>2014-06-11T22:46:13+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2014-06-11T01:54:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7e3cead5172927732f51fde77fef6f521e22f209'/>
<id>7e3cead5172927732f51fde77fef6f521e22f209</id>
<content type='text'>
In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.

Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.

Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fix a race in ip4_datagram_release_cb()</title>
<updated>2014-06-11T22:39:18+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-10T13:43:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9709674e68646cee5a24e3000b3558d25412203a'/>
<id>9709674e68646cee5a24e3000b3558d25412203a</id>
<content type='text'>
Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk-&gt;sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [&lt;ffffffff817daa3a&gt;] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [&lt;ffffffff8175b789&gt;] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [&lt;ffffffff81830a36&gt;] ip4_datagram_release_cb+0x46/0x390 ??:0
 [&lt;ffffffff8175eaea&gt;] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [&lt;ffffffff81830882&gt;] ip4_datagram_connect+0x462/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [&lt;ffffffff8178d9b8&gt;] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [&lt;ffffffff8178de25&gt;] dst_release+0x45/0x80 ./net/core/dst.c:280
 [&lt;ffffffff818304c1&gt;] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [&lt;ffffffff8178d291&gt;] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [&lt;ffffffff817db3b7&gt;] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [&lt;     inlined    &gt;] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [&lt;ffffffff817dde08&gt;] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [&lt;ffffffff817deb34&gt;] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [&lt;ffffffff81830737&gt;] ip4_datagram_connect+0x317/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
&lt;4&gt;[196727.311203] general protection fault: 0000 [#1] SMP
&lt;4&gt;[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
&lt;4&gt;[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
&lt;4&gt;[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
&lt;4&gt;[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
&lt;4&gt;[196727.311377] RIP: 0010:[&lt;ffffffff815f8c7f&gt;]  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
&lt;4&gt;[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
&lt;4&gt;[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
&lt;4&gt;[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
&lt;4&gt;[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
&lt;4&gt;[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
&lt;4&gt;[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
&lt;4&gt;[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
&lt;4&gt;[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
&lt;4&gt;[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
&lt;4&gt;[196727.311713] Stack:
&lt;4&gt;[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
&lt;4&gt;[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
&lt;4&gt;[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
&lt;4&gt;[196727.311885] Call Trace:
&lt;4&gt;[196727.311907]  &lt;IRQ&gt;
&lt;4&gt;[196727.311912]  [&lt;ffffffff815b7f42&gt;] dst_destroy+0x32/0xe0
&lt;4&gt;[196727.311959]  [&lt;ffffffff815b86c6&gt;] dst_release+0x56/0x80
&lt;4&gt;[196727.311986]  [&lt;ffffffff81620bd5&gt;] tcp_v4_do_rcv+0x2a5/0x4a0
&lt;4&gt;[196727.312013]  [&lt;ffffffff81622b5a&gt;] tcp_v4_rcv+0x7da/0x820
&lt;4&gt;[196727.312041]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312070]  [&lt;ffffffff815de02d&gt;] ? nf_hook_slow+0x7d/0x150
&lt;4&gt;[196727.312097]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312125]  [&lt;ffffffff815fda92&gt;] ip_local_deliver_finish+0xb2/0x230
&lt;4&gt;[196727.312154]  [&lt;ffffffff815fdd9a&gt;] ip_local_deliver+0x4a/0x90
&lt;4&gt;[196727.312183]  [&lt;ffffffff815fd799&gt;] ip_rcv_finish+0x119/0x360
&lt;4&gt;[196727.312212]  [&lt;ffffffff815fe00b&gt;] ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312242]  [&lt;ffffffffa0339680&gt;] ? macvlan_broadcast+0x160/0x160 [macvlan]
&lt;4&gt;[196727.312275]  [&lt;ffffffff815b0c62&gt;] __netif_receive_skb_core+0x512/0x640
&lt;4&gt;[196727.312308]  [&lt;ffffffff811427fb&gt;] ? kmem_cache_alloc+0x13b/0x150
&lt;4&gt;[196727.312338]  [&lt;ffffffff815b0db1&gt;] __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312368]  [&lt;ffffffff815b0fa1&gt;] netif_receive_skb+0x31/0xa0
&lt;4&gt;[196727.312397]  [&lt;ffffffff815b1ae8&gt;] napi_gro_receive+0xe8/0x140
&lt;4&gt;[196727.312433]  [&lt;ffffffffa00274f1&gt;] ixgbe_poll+0x551/0x11f0 [ixgbe]
&lt;4&gt;[196727.312463]  [&lt;ffffffff815fe00b&gt;] ? ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312491]  [&lt;ffffffff815b1691&gt;] net_rx_action+0x111/0x210
&lt;4&gt;[196727.312521]  [&lt;ffffffff815b0db1&gt;] ? __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312552]  [&lt;ffffffff810519d0&gt;] __do_softirq+0xd0/0x270
&lt;4&gt;[196727.312583]  [&lt;ffffffff816cef3c&gt;] call_softirq+0x1c/0x30
&lt;4&gt;[196727.312613]  [&lt;ffffffff81004205&gt;] do_softirq+0x55/0x90
&lt;4&gt;[196727.312640]  [&lt;ffffffff81051c85&gt;] irq_exit+0x55/0x60
&lt;4&gt;[196727.312668]  [&lt;ffffffff816cf5c3&gt;] do_IRQ+0x63/0xe0
&lt;4&gt;[196727.312696]  [&lt;ffffffff816c5aaa&gt;] common_interrupt+0x6a/0x6a
&lt;4&gt;[196727.312722]  &lt;EOI&gt;
&lt;1&gt;[196727.313071] RIP  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.313100]  RSP &lt;ffff885effd23a70&gt;
&lt;4&gt;[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
&lt;0&gt;[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Alexey Preobrazhensky &lt;preobr@google.com&gt;
Reported-by: dormando &lt;dormando@rydia.ne&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk-&gt;sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [&lt;ffffffff817daa3a&gt;] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [&lt;ffffffff8175b789&gt;] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [&lt;ffffffff81830a36&gt;] ip4_datagram_release_cb+0x46/0x390 ??:0
 [&lt;ffffffff8175eaea&gt;] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [&lt;ffffffff81830882&gt;] ip4_datagram_connect+0x462/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [&lt;ffffffff8178d9b8&gt;] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [&lt;ffffffff8178de25&gt;] dst_release+0x45/0x80 ./net/core/dst.c:280
 [&lt;ffffffff818304c1&gt;] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [&lt;ffffffff8178d291&gt;] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [&lt;ffffffff817db3b7&gt;] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [&lt;     inlined    &gt;] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [&lt;ffffffff817dde08&gt;] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [&lt;ffffffff817deb34&gt;] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [&lt;ffffffff81830737&gt;] ip4_datagram_connect+0x317/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
&lt;4&gt;[196727.311203] general protection fault: 0000 [#1] SMP
&lt;4&gt;[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
&lt;4&gt;[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
&lt;4&gt;[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
&lt;4&gt;[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
&lt;4&gt;[196727.311377] RIP: 0010:[&lt;ffffffff815f8c7f&gt;]  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
&lt;4&gt;[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
&lt;4&gt;[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
&lt;4&gt;[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
&lt;4&gt;[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
&lt;4&gt;[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
&lt;4&gt;[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
&lt;4&gt;[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
&lt;4&gt;[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
&lt;4&gt;[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
&lt;4&gt;[196727.311713] Stack:
&lt;4&gt;[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
&lt;4&gt;[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
&lt;4&gt;[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
&lt;4&gt;[196727.311885] Call Trace:
&lt;4&gt;[196727.311907]  &lt;IRQ&gt;
&lt;4&gt;[196727.311912]  [&lt;ffffffff815b7f42&gt;] dst_destroy+0x32/0xe0
&lt;4&gt;[196727.311959]  [&lt;ffffffff815b86c6&gt;] dst_release+0x56/0x80
&lt;4&gt;[196727.311986]  [&lt;ffffffff81620bd5&gt;] tcp_v4_do_rcv+0x2a5/0x4a0
&lt;4&gt;[196727.312013]  [&lt;ffffffff81622b5a&gt;] tcp_v4_rcv+0x7da/0x820
&lt;4&gt;[196727.312041]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312070]  [&lt;ffffffff815de02d&gt;] ? nf_hook_slow+0x7d/0x150
&lt;4&gt;[196727.312097]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312125]  [&lt;ffffffff815fda92&gt;] ip_local_deliver_finish+0xb2/0x230
&lt;4&gt;[196727.312154]  [&lt;ffffffff815fdd9a&gt;] ip_local_deliver+0x4a/0x90
&lt;4&gt;[196727.312183]  [&lt;ffffffff815fd799&gt;] ip_rcv_finish+0x119/0x360
&lt;4&gt;[196727.312212]  [&lt;ffffffff815fe00b&gt;] ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312242]  [&lt;ffffffffa0339680&gt;] ? macvlan_broadcast+0x160/0x160 [macvlan]
&lt;4&gt;[196727.312275]  [&lt;ffffffff815b0c62&gt;] __netif_receive_skb_core+0x512/0x640
&lt;4&gt;[196727.312308]  [&lt;ffffffff811427fb&gt;] ? kmem_cache_alloc+0x13b/0x150
&lt;4&gt;[196727.312338]  [&lt;ffffffff815b0db1&gt;] __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312368]  [&lt;ffffffff815b0fa1&gt;] netif_receive_skb+0x31/0xa0
&lt;4&gt;[196727.312397]  [&lt;ffffffff815b1ae8&gt;] napi_gro_receive+0xe8/0x140
&lt;4&gt;[196727.312433]  [&lt;ffffffffa00274f1&gt;] ixgbe_poll+0x551/0x11f0 [ixgbe]
&lt;4&gt;[196727.312463]  [&lt;ffffffff815fe00b&gt;] ? ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312491]  [&lt;ffffffff815b1691&gt;] net_rx_action+0x111/0x210
&lt;4&gt;[196727.312521]  [&lt;ffffffff815b0db1&gt;] ? __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312552]  [&lt;ffffffff810519d0&gt;] __do_softirq+0xd0/0x270
&lt;4&gt;[196727.312583]  [&lt;ffffffff816cef3c&gt;] call_softirq+0x1c/0x30
&lt;4&gt;[196727.312613]  [&lt;ffffffff81004205&gt;] do_softirq+0x55/0x90
&lt;4&gt;[196727.312640]  [&lt;ffffffff81051c85&gt;] irq_exit+0x55/0x60
&lt;4&gt;[196727.312668]  [&lt;ffffffff816cf5c3&gt;] do_IRQ+0x63/0xe0
&lt;4&gt;[196727.312696]  [&lt;ffffffff816c5aaa&gt;] common_interrupt+0x6a/0x6a
&lt;4&gt;[196727.312722]  &lt;EOI&gt;
&lt;1&gt;[196727.313071] RIP  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.313100]  RSP &lt;ffff885effd23a70&gt;
&lt;4&gt;[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
&lt;0&gt;[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Alexey Preobrazhensky &lt;preobr@google.com&gt;
Reported-by: dormando &lt;dormando@rydia.ne&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
