linux-stable.git/net/ipv4, branch v3.12.52

net: fix IP early demux races

2016-01-05T17:18:01+00:00

[ Upstream commit 5037e9ef9454917b047f9f3a19b4dd179fbf7cd4 ]

David Wilder reported crashes caused by dst reuse.


  I am seeing a crash on a distro V4.2.3 kernel caused by a double
  release of a dst_entry.  In ipv4_dst_destroy() the call to
  list_empty() finds a poisoned next pointer, indicating the dst_entry
  has already been removed from the list and freed. The crash occurs
  18 to 24 hours into a run of a network stress exerciser.


Thanks to his detailed report and analysis, we were able to understand
the core issue.

IP early demux can associate a dst to skb, after a lookup in TCP/UDP
sockets.

When socket cache is not properly set, we want to store into
sk->sk_dst_cache the dst for future IP early demux lookups,
by acquiring a stable refcount on the dst.

Problem is this acquisition is simply using an atomic_inc(),
which works well, unless the dst was queued for destruction from
dst_release() noticing dst refcount went to zero, if DST_NOCACHE
was set on dst.

We need to make sure current refcount is not zero before incrementing
it, or risk double free as David reported.

This patch, being a stable candidate, adds two new helpers, and use
them only from IP early demux problematic paths.

It might be possible to merge in net-next skb_dst_force() and
skb_dst_force_safe(), but I prefer having the smallest patch for stable
kernels : Maybe some skb_dst_force() callers do not expect skb->dst
can suddenly be cleared.

Can probably be backported back to linux-3.6 kernels

Reported-by: David J. Wilder 
Tested-by: David J. Wilder 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

net: add validation for the socket syscall protocol argument

2016-01-05T17:04:00+00:00

[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ]

郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

	int socket_fd;
	struct sockaddr_in addr;
	addr.sin_port = 0;
	addr.sin_addr.s_addr = INADDR_ANY;
	addr.sin_family = 10;

	socket_fd = socket(10,3,0x40000000);
	connect(socket_fd , &addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel:  [] ? inet_autobind+0x2e/0x70
kernel:  [] inet_dgram_connect+0x54/0x80
kernel:  [] SYSC_connect+0xd9/0x110
kernel:  [] ? ptrace_notify+0x5b/0x80
kernel:  [] ? syscall_trace_enter_phase2+0x108/0x200
kernel:  [] SyS_connect+0xe/0x10
kernel:  [] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang 
Reported-by: 郭永刚 
Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

netfilter: ipt_rpfilter: remove the nh_scope test in rpfilter_lookup_reverse

2016-01-05T16:00:01+00:00

commit cc4998febd567d1c671684abce5595344bd4e8b2 upstream.

--accept-local  option works for res.type == RTN_LOCAL, which should be
from the local table, but there, the fib_info's nh->nh_scope =
RT_SCOPE_NOWHERE ( > RT_SCOPE_HOST). in fib_create_info().

	if (cfg->fc_scope == RT_SCOPE_HOST) {
		struct fib_nh *nh = fi->fib_nh;

		/* Local address is added. */
		if (nhs != 1 || nh->nh_gw)
			goto err_inval;
		nh->nh_scope = RT_SCOPE_NOWHERE;   <===
		nh->nh_dev = dev_get_by_index(net, fi->fib_nh->nh_oif);
		err = -ENODEV;
		if (!nh->nh_dev)
			goto failure;

but in our rpfilter_lookup_reverse():

	if (dev_match || flags & XT_RPFILTER_LOOSE)
		return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST;

if nh->nh_scope > RT_SCOPE_HOST, it will fail. --accept-local option
will never be passed.

it seems the test is bogus and can be removed to fix this issue.

	if (dev_match || flags & XT_RPFILTER_LOOSE)
		return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST;

ipv6 does not have this issue.

Signed-off-by: Xin Long 
Acked-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Jiri Slaby

net: ipmr: fix static mfc/dev leaks on table destruction

2016-01-05T15:11:12+00:00

[ Upstream commit 0e615e9601a15efeeb8942cf7cd4dadba0c8c5a7 ]

When destroying an mrt table the static mfc entries and the static
devices are kept, which leads to devices that can never be destroyed
(because of refcnt taken) and leaked memory, for example:
unreferenced object 0xffff880034c144c0 (size 192):
  comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
  hex dump (first 32 bytes):
    98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.....S.4....
    ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  ................
  backtrace:
    [] kmemleak_alloc+0x4e/0xb0
    [] kmem_cache_alloc+0x190/0x300
    [] ip_mroute_setsockopt+0x5cb/0x910
    [] do_ip_setsockopt.isra.11+0x105/0xff0
    [] ip_setsockopt+0x30/0xa0
    [] raw_setsockopt+0x33/0x90
    [] sock_common_setsockopt+0x14/0x20
    [] SyS_setsockopt+0x71/0xc0
    [] entry_SYSCALL_64_fastpath+0x16/0x7a
    [] 0xffffffffffffffff

Make sure that everything is cleaned on netns destruction.

Signed-off-by: Nikolay Aleksandrov 
Reviewed-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

tcp: initialize tp->copied_seq in case of cross SYN connection

2016-01-05T15:11:11+00:00

[ Upstream commit 142a2e7ece8d8ac0e818eb2c91f99ca894730e2a ]

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :

WARN_ON(tp->copied_seq != tp->rcv_nxt &&
        !(flags & (MSG_PEEK | MSG_TRUNC)));

His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.

Thanks again Dmitry for your report and testings.

Signed-off-by: Eric Dumazet 
Reported-by: Dmitry Vyukov 
Tested-by: Dmitry Vyukov 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

tcp: md5: fix lockdep annotation

2016-01-05T15:11:10+00:00

[ Upstream commit 1b8e6a01e19f001e9f93b39c32387961c91ed3cc ]

When a passive TCP is created, we eventually call tcp_md5_do_add()
with sk pointing to the child. It is not owner by the user yet (we
will add this socket into listener accept queue a bit later anyway)

But we do own the spinlock, so amend the lockdep annotation to avoid
following splat :

[ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
[ 8451.090932]
[ 8451.090932] other info that might help us debug this:
[ 8451.090932]
[ 8451.090934]
[ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
[ 8451.090936] 3 locks held by socket_sockopt_/214795:
[ 8451.090936]  #0:  (rcu_read_lock){.+.+..}, at: [] __netif_receive_skb_core+0x151/0xe90
[ 8451.090947]  #1:  (rcu_read_lock){.+.+..}, at: [] ip_local_deliver_finish+0x43/0x2b0
[ 8451.090952]  #2:  (slock-AF_INET){+.-...}, at: [] sk_clone_lock+0x1c5/0x500
[ 8451.090958]
[ 8451.090958] stack backtrace:
[ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_

[ 8451.091215] Call Trace:
[ 8451.091216]    [] dump_stack+0x55/0x76
[ 8451.091229]  [] lockdep_rcu_suspicious+0xeb/0x110
[ 8451.091235]  [] tcp_md5_do_add+0x1bf/0x1e0
[ 8451.091239]  [] tcp_v4_syn_recv_sock+0x1f1/0x4c0
[ 8451.091242]  [] ? tcp_v4_md5_hash_skb+0x167/0x190
[ 8451.091246]  [] tcp_check_req+0x3c8/0x500
[ 8451.091249]  [] ? tcp_v4_inbound_md5_hash+0x11e/0x190
[ 8451.091253]  [] tcp_v4_rcv+0x3c0/0x9f0
[ 8451.091256]  [] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091260]  [] ip_local_deliver_finish+0xb6/0x2b0
[ 8451.091263]  [] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091267]  [] ip_local_deliver+0x48/0x80
[ 8451.091270]  [] ip_rcv_finish+0x160/0x700
[ 8451.091273]  [] ip_rcv+0x29e/0x3d0
[ 8451.091277]  [] __netif_receive_skb_core+0xb47/0xe90

Fixes: a8afca0329988 ("tcp: md5: protects md5sig_info with RCU")
Signed-off-by: Eric Dumazet 
Reported-by: Willem de Bruijn 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.

2015-11-14T15:48:47+00:00

[ Upstream commit 44f49dd8b5a606870a1f21101522a0f9c4414784 ]

Fixes the following kernel BUG :

BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
caller is __this_cpu_preempt_check+0x13/0x15
CPU: 0 PID: 2758 Comm: bash Tainted: P           O   3.18.19 #2
 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
Call Trace:
[] dump_stack+0x52/0x80
[] check_preemption_disabled+0xce/0xe1
[] __this_cpu_preempt_check+0x13/0x15
[] ipmr_queue_xmit+0x647/0x70c
[] ip_mr_forward+0x32f/0x34e
[] ip_mroute_setsockopt+0xe03/0x108c
[] ? get_parent_ip+0x11/0x42
[] ? pollwake+0x4d/0x51
[] ? default_wake_function+0x0/0xf
[] ? get_parent_ip+0x11/0x42
[] ? __wake_up_common+0x45/0x77
[] ? _raw_spin_unlock_irqrestore+0x1d/0x32
[] ? __wake_up_sync_key+0x4a/0x53
[] ? sock_def_readable+0x71/0x75
[] do_ip_setsockopt+0x9d/0xb55
[] ? unix_seqpacket_sendmsg+0x3f/0x41
[] ? sock_sendmsg+0x6d/0x86
[] ? sockfd_lookup_light+0x12/0x5d
[] ? SyS_sendto+0xf3/0x11b
[] ? new_sync_read+0x82/0xaa
[] compat_ip_setsockopt+0x3b/0x99
[] compat_raw_setsockopt+0x11/0x32
[] compat_sock_common_setsockopt+0x18/0x1f
[] compat_SyS_setsockopt+0x1a9/0x1cf
[] compat_SyS_socketcall+0x180/0x1e3
[] cstar_dispatch+0x7/0x1e

Signed-off-by: Ani Sinha 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

net: add length argument to skb_copy_and_csum_datagram_iovec

2015-10-26T12:36:01+00:00

Without this length argument, we can read past the end of the iovec in
memcpy_toiovec because we have no way of knowing the total length of the
iovec's buffers.

This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb
csum races when peeking") has been backported but that don't have the
ioviter conversion, which is almost all the stable trees <= 3.18.

This also fixes a kernel crash for NFS servers when the client uses
 -onfsvers=3,proto=udp to mount the export.

Signed-off-by: Sabrina Dubroca 
Reviewed-by: Hannes Frederic Sowa 
Signed-off-by: Jiri Slaby

inet: frags: fix defragmented packet's IP header for af_packet

2015-08-27T07:27:02+00:00

[ Upstream commit 0848f6428ba3a2e42db124d41ac6f548655735bf ]

When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.

However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.

Also, IPv4 checksum is not corrected after reassembly.

Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee 
Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Cc: Jerry Chu 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby

ipv6: lock socket in ip6_datagram_connect()

2015-08-27T07:27:01+00:00

[ Upstream commit 03645a11a570d52e70631838cb786eb4253eb463 ]

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet 
Acked-by: Herbert Xu 
Signed-off-by: David S. Miller 
Signed-off-by: Jiri Slaby