<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/ipv6, branch v3.2.75</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>net: add validation for the socket syscall protocol argument</title>
<updated>2015-12-30T02:26:03+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2015-12-14T21:03:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ef6d51d24d878be2291d7af783441356eb77649d'/>
<id>ef6d51d24d878be2291d7af783441356eb77649d</id>
<content type='text'>
[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ]

郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

	int socket_fd;
	struct sockaddr_in addr;
	addr.sin_port = 0;
	addr.sin_addr.s_addr = INADDR_ANY;
	addr.sin_family = 10;

	socket_fd = socket(10,3,0x40000000);
	connect(socket_fd , &amp;addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel:  [&lt;ffffffff816db90e&gt;] ? inet_autobind+0x2e/0x70
kernel:  [&lt;ffffffff816db9a4&gt;] inet_dgram_connect+0x54/0x80
kernel:  [&lt;ffffffff81645069&gt;] SYSC_connect+0xd9/0x110
kernel:  [&lt;ffffffff810ac51b&gt;] ? ptrace_notify+0x5b/0x80
kernel:  [&lt;ffffffff810236d8&gt;] ? syscall_trace_enter_phase2+0x108/0x200
kernel:  [&lt;ffffffff81645e0e&gt;] SyS_connect+0xe/0x10
kernel:  [&lt;ffffffff81779515&gt;] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang &lt;cwang@twopensource.com&gt;
Reported-by: 郭永刚 &lt;guoyonggang@360.cn&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: open-code U8_MAX]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ]

郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

	int socket_fd;
	struct sockaddr_in addr;
	addr.sin_port = 0;
	addr.sin_addr.s_addr = INADDR_ANY;
	addr.sin_family = 10;

	socket_fd = socket(10,3,0x40000000);
	connect(socket_fd , &amp;addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel:  [&lt;ffffffff816db90e&gt;] ? inet_autobind+0x2e/0x70
kernel:  [&lt;ffffffff816db9a4&gt;] inet_dgram_connect+0x54/0x80
kernel:  [&lt;ffffffff81645069&gt;] SYSC_connect+0xd9/0x110
kernel:  [&lt;ffffffff810ac51b&gt;] ? ptrace_notify+0x5b/0x80
kernel:  [&lt;ffffffff810236d8&gt;] ? syscall_trace_enter_phase2+0x108/0x200
kernel:  [&lt;ffffffff81645e0e&gt;] SyS_connect+0xe/0x10
kernel:  [&lt;ffffffff81779515&gt;] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang &lt;cwang@twopensource.com&gt;
Reported-by: 郭永刚 &lt;guoyonggang@360.cn&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: open-code U8_MAX]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: add complete rcu protection around np-&gt;opt</title>
<updated>2015-12-30T02:26:02+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-11-30T03:37:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5bf369b4470d3618af67b572a82d76b92ce1abd1'/>
<id>5bf369b4470d3618af67b572a82d76b92ce1abd1</id>
<content type='text'>
[ Upstream commit 45f6fad84cc305103b28d73482b344d7f5b76f39 ]

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np-&gt;opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np-&gt;opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np-&gt;opt

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2:
 - Drop changes to l2tp
 - Fix an additional use of np-&gt;opt in tcp_v6_send_synack()
 - Fold in commit 43264e0bd963 ("ipv6: remove unnecessary codes in tcp_ipv6.c")
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 45f6fad84cc305103b28d73482b344d7f5b76f39 ]

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np-&gt;opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np-&gt;opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np-&gt;opt

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2:
 - Drop changes to l2tp
 - Fix an additional use of np-&gt;opt in tcp_v6_send_synack()
 - Fold in commit 43264e0bd963 ("ipv6: remove unnecessary codes in tcp_ipv6.c")
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: distinguish frag queues by device for multicast and link-local packets</title>
<updated>2015-12-30T02:26:02+00:00</updated>
<author>
<name>Michal Kubeček</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2015-11-24T14:07:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=55989a061426b670247fa315b7822ce2bd4ef2b4'/>
<id>55989a061426b670247fa315b7822ce2bd4ef2b4</id>
<content type='text'>
[ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ]

If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.

To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.

Note: similar patch has been already submitted by Yoshifuji Hideaki in

  http://patchwork.ozlabs.org/patch/220979/

but got lost and forgotten for some reason.

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ]

If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.

To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.

Note: similar patch has been already submitted by Yoshifuji Hideaki in

  http://patchwork.ozlabs.org/patch/220979/

but got lost and forgotten for some reason.

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>snmp: Remove duplicate OUTMCAST stat increment</title>
<updated>2015-12-30T02:26:01+00:00</updated>
<author>
<name>Neil Horman</name>
<email>nhorman@tuxdriver.com</email>
</author>
<published>2015-11-16T18:09:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ca194f27b9dcd0d389b83f27c3a5c84caacb6246'/>
<id>ca194f27b9dcd0d389b83f27c3a5c84caacb6246</id>
<content type='text'>
[ Upstream commit 41033f029e393a64e81966cbe34d66c6cf8a2e7e ]

the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path.  Remove the mcast bump, as its
not needed

Validated by the reporter, with good results

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Reported-by: Claus Jensen &lt;claus.jensen@microsemi.com&gt;
CC: Claus Jensen &lt;claus.jensen@microsemi.com&gt;
CC: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 41033f029e393a64e81966cbe34d66c6cf8a2e7e ]

the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path.  Remove the mcast bump, as its
not needed

Validated by the reporter, with good results

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Reported-by: Claus Jensen &lt;claus.jensen@microsemi.com&gt;
CC: Claus Jensen &lt;claus.jensen@microsemi.com&gt;
CC: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: ip6mr: fix static mfc/dev leaks on table destruction</title>
<updated>2015-12-30T02:25:57+00:00</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@cumulusnetworks.com</email>
</author>
<published>2015-11-20T12:54:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=611da32f948f22d20208891bbf08ceea70d3086e'/>
<id>611da32f948f22d20208891bbf08ceea70d3086e</id>
<content type='text'>
commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725 upstream.

Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.

Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
CC: Benjamin Thery &lt;benjamin.thery@bull.net&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Reviewed-by: Cong Wang &lt;cwang@twopensource.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4c6980462f32b4f282c5d8e5f7ea8070e2937725 upstream.

Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.

Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
CC: Benjamin Thery &lt;benjamin.thery@bull.net&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Reviewed-by: Cong Wang &lt;cwang@twopensource.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ip6mr: call del_timer_sync() in ip6mr_free_table()</title>
<updated>2015-12-30T02:25:57+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2015-03-31T18:01:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=64344727d8b615a46f5b0e0630f992cc0122058f'/>
<id>64344727d8b615a46f5b0e0630f992cc0122058f</id>
<content type='text'>
commit 7ba0c47c34a1ea5bc7a24ca67309996cce0569b5 upstream.

We need to wait for the flying timers, since we
are going to free the mrtable right after it.

Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7ba0c47c34a1ea5bc7a24ca67309996cce0569b5 upstream.

We need to wait for the flying timers, since we
are going to free the mrtable right after it.

Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: fix tunnel error handling</title>
<updated>2015-11-27T12:48:22+00:00</updated>
<author>
<name>Michal Kubeček</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2015-11-03T07:51:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=29fbb7880914dcdefe243d1bb26990c64134404b'/>
<id>29fbb7880914dcdefe243d1bb26990c64134404b</id>
<content type='text'>
commit ebac62fe3d24c0ce22dd83afa7b07d1a2aaef44d upstream.

Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.

Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register")
Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ebac62fe3d24c0ce22dd83afa7b07d1a2aaef44d upstream.

Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.

Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register")
Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: add length argument to skb_copy_and_csum_datagram_iovec</title>
<updated>2015-11-17T15:54:45+00:00</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2015-10-15T12:25:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=127500d724f8c43f452610c9080444eedb5eaa6c'/>
<id>127500d724f8c43f452610c9080444eedb5eaa6c</id>
<content type='text'>
Without this length argument, we can read past the end of the iovec in
memcpy_toiovec because we have no way of knowing the total length of the
iovec's buffers.

This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb
csum races when peeking") has been backported but that don't have the
ioviter conversion, which is almost all the stable trees &lt;= 3.18.

This also fixes a kernel crash for NFS servers when the client uses
 -onfsvers=3,proto=udp to mount the export.

Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Reviewed-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
[bwh: Backported to 3.2: adjust context in include/linux/skbuff.h]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Without this length argument, we can read past the end of the iovec in
memcpy_toiovec because we have no way of knowing the total length of the
iovec's buffers.

This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb
csum races when peeking") has been backported but that don't have the
ioviter conversion, which is almost all the stable trees &lt;= 3.18.

This also fixes a kernel crash for NFS servers when the client uses
 -onfsvers=3,proto=udp to mount the export.

Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Reviewed-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
[bwh: Backported to 3.2: adjust context in include/linux/skbuff.h]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: update ip6_rt_last_gc every time GC is run</title>
<updated>2015-10-13T02:46:13+00:00</updated>
<author>
<name>Michal Kubeček</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2013-08-01T08:04:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=24277c76396d5ff90832f0a60c6c5987566256ea'/>
<id>24277c76396d5ff90832f0a60c6c5987566256ea</id>
<content type='text'>
commit 49a18d86f66d33a20144ecb5a34bba0d1856b260 upstream.

As pointed out by Eric Dumazet, net-&gt;ipv6.ip6_rt_last_gc should
hold the last time garbage collector was run so that we should
update it whenever fib6_run_gc() calls fib6_clean_all(), not only
if we got there from ip6_dst_gc().

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 49a18d86f66d33a20144ecb5a34bba0d1856b260 upstream.

As pointed out by Eric Dumazet, net-&gt;ipv6.ip6_rt_last_gc should
hold the last time garbage collector was run so that we should
update it whenever fib6_run_gc() calls fib6_clean_all(), not only
if we got there from ip6_dst_gc().

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: prevent fib6_run_gc() contention</title>
<updated>2015-10-13T02:46:13+00:00</updated>
<author>
<name>Michal Kubeček</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2013-08-01T08:04:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2491f0184830c5f1e1c794283177179937def4b7'/>
<id>2491f0184830c5f1e1c794283177179937def4b7</id>
<content type='text'>
commit 2ac3ac8f86f2fe065d746d9a9abaca867adec577 upstream.

On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net-&gt;ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2ac3ac8f86f2fe065d746d9a9abaca867adec577 upstream.

On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net-&gt;ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
</pre>
</div>
</content>
</entry>
</feed>
