linux-stable.git/net/ipv4, branch v4.4.35

tcp: take care of truncations done by sk_filter()

2016-11-21T09:06:40+00:00

[ Upstream commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 ]

With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()

Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.

We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq

Many thanks to syzkaller team and Marco for giving us a reproducer.

Signed-off-by: Eric Dumazet 
Reported-by: Marco Grassi 
Reported-by: Vladis Dronov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: use new_gw for redirect neigh lookup

2016-11-21T09:06:40+00:00

[ Upstream commit 969447f226b451c453ddc83cac6144eaeac6f2e3 ]

In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().

After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.

So, use the new_gw for neigh lookup.

Changes from v1:
 - use __ipv4_neigh_lookup instead (per Eric Dumazet).

Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
Signed-off-by: Stephen Suryaputra Lin 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

fib_trie: Correct /proc/net/route off by one error

2016-11-21T09:06:40+00:00

[ Upstream commit fd0285a39b1cb496f60210a9a00ad33a815603e7 ]

The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.

In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.

This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key.  In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid.  With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.

Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in /proc/net/route")
Cc: Andy Whitcroft 
Reported-by: Jason Baron 
Tested-by: Jason Baron 
Signed-off-by: Alexander Duyck 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: fix potential memory corruption

2016-11-21T09:06:39+00:00

[ Upstream commit ac9e70b17ecd7c6e933ff2eaf7ab37429e71bf4d ]

Imagine initial value of max_skb_frags is 17, and last
skb in write queue has 15 frags.

Then max_skb_frags is lowered to 14 or smaller value.

tcp_sendmsg() will then be allowed to add additional page frags
and eventually go past MAX_SKB_FRAGS, overflowing struct
skb_shared_info.

Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
Signed-off-by: Eric Dumazet 
Cc: Hans Westgaard Ry 
Cc: Håkon Bugge 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

dctcp: avoid bogus doubling of cwnd after loss

2016-11-21T09:06:39+00:00

[ Upstream commit ce6dd23329b1ee6a794acf5f7e40f8e89b8317ee ]

If a congestion control module doesn't provide .undo_cwnd function,
tcp_undo_cwnd_reduction() will set cwnd to

   tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);

... which makes sense for reno (it sets ssthresh to half the current cwnd),
but it makes no sense for dctcp, which sets ssthresh based on the current
congestion estimate.

This can cause severe growth of cwnd (eventually overflowing u32).

Fix this by saving last cwnd on loss and restore cwnd based on that,
similar to cubic and other algorithms.

Fixes: e3118e8359bb7c ("net: tcp: add DCTCP congestion control algorithm")
Cc: Lawrence Brakmo 
Cc: Andrew Shewmaker 
Cc: Glenn Judd 
Acked-by: Daniel Borkmann 
Signed-off-by: Florian Westphal 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

udp: fix IP_CHECKSUM handling

2016-11-15T06:46:39+00:00

[ Upstream commit 10df8e6152c6c400a563a673e9956320bfce1871 ]

First bug was added in commit ad6f939ab193 ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));

Then commit e6afc8ace6dd ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.

Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet 
Cc: Sam Kumar 
Cc: Willem de Bruijn 
Tested-by: Willem de Bruijn 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: use the right lock for ping_group_range

2016-11-15T06:46:38+00:00

[ Upstream commit 396a30cce15d084b2b1a395aa6d515c3d559c674 ]

This reverts commit a681574c99be23e4d20b769bf0e543239c364af5
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).

Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.

We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.

Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()")
Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL")
Cc: Eric Dumazet 
Cc: Eric Salo 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: disable BH in set_ping_group_range()

2016-11-15T06:46:38+00:00

[ Upstream commit a681574c99be23e4d20b769bf0e543239c364af5 ]

In commit 4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port
range") Cong added BH protection in set_local_port_range() but missed
that same fix was needed in set_ping_group_range()

Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels")
Signed-off-by: Eric Dumazet 
Reported-by: Eric Salo 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: add recursion limit to GRO

2016-11-15T06:46:38+00:00

[ Upstream commit fcd91dd449867c6bfe56a81cabba76b829fd05cd ]

Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš  for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca 
Reviewed-by: Jiri Benc 
Acked-by: Hannes Frederic Sowa 
Acked-by: Tom Herbert 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route

2016-11-15T06:46:37+00:00

[ Upstream commit 2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8 ]

Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
instead of the previous dst_pid which was copied from in_skb's portid.
Since the skb is new the portid is 0 at that point so the packets are sent
to the kernel and we get scheduling while atomic or a deadlock (depending
on where it happens) by trying to acquire rtnl two times.
Also since this is RTM_GETROUTE, it can be triggered by a normal user.

Here's the sleeping while atomic trace:
[ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 7858.212881] 2 locks held by swapper/0/0:
[ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [] call_timer_fn+0x5/0x350
[ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [] ipmr_expire_process+0x25/0x130
[ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
[ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
[ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
[ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
[ 7858.215251] Call Trace:
[ 7858.215412]    [] dump_stack+0x85/0xc1
[ 7858.215662]  [] ___might_sleep+0x192/0x250
[ 7858.215868]  [] __might_sleep+0x6f/0x100
[ 7858.216072]  [] mutex_lock_nested+0x33/0x4d0
[ 7858.216279]  [] ? netlink_lookup+0x25f/0x460
[ 7858.216487]  [] rtnetlink_rcv+0x1b/0x40
[ 7858.216687]  [] netlink_unicast+0x19c/0x260
[ 7858.216900]  [] rtnl_unicast+0x20/0x30
[ 7858.217128]  [] ipmr_destroy_unres+0xa9/0xf0
[ 7858.217351]  [] ipmr_expire_process+0x8f/0x130
[ 7858.217581]  [] ? ipmr_net_init+0x180/0x180
[ 7858.217785]  [] ? ipmr_net_init+0x180/0x180
[ 7858.217990]  [] call_timer_fn+0xa5/0x350
[ 7858.218192]  [] ? call_timer_fn+0x5/0x350
[ 7858.218415]  [] ? ipmr_net_init+0x180/0x180
[ 7858.218656]  [] run_timer_softirq+0x260/0x640
[ 7858.218865]  [] ? __do_softirq+0xbb/0x54f
[ 7858.219068]  [] __do_softirq+0xe8/0x54f
[ 7858.219269]  [] irq_exit+0xb8/0xc0
[ 7858.219463]  [] smp_apic_timer_interrupt+0x42/0x50
[ 7858.219678]  [] apic_timer_interrupt+0x8c/0xa0
[ 7858.219897]    [] ? native_safe_halt+0x6/0x10
[ 7858.220165]  [] ? trace_hardirqs_on+0xd/0x10
[ 7858.220373]  [] default_idle+0x23/0x190
[ 7858.220574]  [] arch_cpu_idle+0xf/0x20
[ 7858.220790]  [] default_idle_call+0x4c/0x60
[ 7858.221016]  [] cpu_startup_entry+0x39b/0x4d0
[ 7858.221257]  [] rest_init+0x135/0x140
[ 7858.221469]  [] start_kernel+0x50e/0x51b
[ 7858.221670]  [] ? early_idt_handler_array+0x120/0x120
[ 7858.221894]  [] x86_64_start_reservations+0x2a/0x2c
[ 7858.222113]  [] x86_64_start_kernel+0x13b/0x14a

Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman