linux-stable.git/net/core/dev.c, branch linux-4.8.y

net: mangle zero checksum in skb_checksum_help()

2016-11-21T09:11:34+00:00

[ Upstream commit 4f2e4ad56a65f3b7d64c258e373cb71e8d2499f4 ]

Sending zero checksum is ok for TCP, but not for UDP.

UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.

Simply replace such checksum by 0xffff, regardless of transport.

This error was caught on SIT tunnels, but seems generic.

Signed-off-by: Eric Dumazet 
Cc: Maciej Żenczykowski 
Cc: Willem de Bruijn 
Acked-by: Maciej Żenczykowski 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

packet: on direct_xmit, limit tso and csum to supported devices

2016-11-15T06:48:53+00:00

[ Upstream commit 104ba78c98808ae837d1f63aae58c183db5505df ]

When transmitting on a packet socket with PACKET_VNET_HDR and
PACKET_QDISC_BYPASS, validate device support for features requested
in vnet_hdr.

Drop TSO packets sent to devices that do not support TSO or have the
feature disabled. Note that the latter currently do process those
packets correctly, regardless of not advertising the feature.

Because of SKB_GSO_DODGY, it is not sufficient to test device features
with netif_needs_gso. Full validate_xmit_skb is needed.

Switch to software checksum for non-TSO packets that request checksum
offload if that device feature is unsupported or disabled. Note that
similar to the TSO case, device drivers may perform checksum offload
correctly even when not advertising it.

When switching to software checksum, packets hit skb_checksum_help,
which has two BUG_ON checksum not in linear segment. Packet sockets
always allocate at least up to csum_start + csum_off + 2 as linear.

Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c

  ethtool -K eth0 tso off tx on
  psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
  psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N

  ethtool -K eth0 tx off
  psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
  psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N

v2:
  - add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)

Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Willem de Bruijn 
Acked-by: Eric Dumazet 
Acked-by: Daniel Borkmann 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: add recursion limit to GRO

2016-11-15T06:48:52+00:00

[ Upstream commit fcd91dd449867c6bfe56a81cabba76b829fd05cd ]

Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš  for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca 
Reviewed-by: Jiri Benc 
Acked-by: Hannes Frederic Sowa 
Acked-by: Tom Herbert 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: core: Correctly iterate over lower adjacency list

2016-11-15T06:48:52+00:00

[ Upstream commit e4961b0768852d9eb7383e1a5df178eacb714656 ]

Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:

 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
 Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
 task: ffff88017edfea40 task.stack: ffff88017ee10000
 RIP: 0010:[]  [] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
 Call Trace:
  
  [] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
  [] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
  [] notifier_call_chain+0x4a/0x70
  [] atomic_notifier_call_chain+0x1a/0x20
  [] call_netevent_notifiers+0x1b/0x20
  [] neigh_update+0x306/0x740
  [] neigh_event_ns+0x4e/0xb0
  [] arp_process+0x66f/0x700
  [] ? common_interrupt+0x8c/0x8c
  [] arp_rcv+0x139/0x1d0
  [] ? vlan_do_receive+0xda/0x320
  [] __netif_receive_skb_core+0x524/0xab0
  [] ? dev_queue_xmit+0x10/0x20
  [] ? br_forward_finish+0x3d/0xc0 [bridge]
  [] ? br_handle_vlan+0xf6/0x1b0 [bridge]
  [] __netif_receive_skb+0x18/0x60
  [] netif_receive_skb_internal+0x40/0xb0
  [] netif_receive_skb+0x1c/0x70
  [] br_pass_frame_up+0xc6/0x160 [bridge]
  [] ? deliver_clone+0x37/0x50 [bridge]
  [] ? br_flood+0xcc/0x160 [bridge]
  [] br_handle_frame_finish+0x224/0x4f0 [bridge]
  [] br_handle_frame+0x174/0x300 [bridge]
  [] __netif_receive_skb_core+0x329/0xab0
  [] ? find_next_bit+0x15/0x20
  [] ? cpumask_next_and+0x32/0x50
  [] ? load_balance+0x178/0x9b0
  [] __netif_receive_skb+0x18/0x60
  [] netif_receive_skb_internal+0x40/0xb0
  [] netif_receive_skb+0x1c/0x70
  [] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
  [] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
  [] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
  [] tasklet_action+0xf6/0x110
  [] __do_softirq+0xf6/0x280
  [] irq_exit+0xdf/0xf0
  [] do_IRQ+0x54/0xd0
  [] common_interrupt+0x8c/0x8c

The problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.

Fix this by advancing the iterator and avoid the infinite loop.

Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel 
Reported-by: Tamir Winetroub 
Reviewed-by: Jiri Pirko 
Acked-by: David Ahern 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: Add netdev all_adj_list refcnt propagation to fix panic

2016-11-15T06:48:51+00:00

[ Upstream commit 93409033ae653f1c9a949202fb537ab095b2092f ]

This is a respin of a patch to fix a relatively easily reproducible kernel
panic related to the all_adj_list handling for netdevs in recent kernels.

The following sequence of commands will reproduce the issue:

ip link add link eth0 name eth0.100 type vlan id 100
ip link add link eth0 name eth0.200 type vlan id 200
ip link add name testbr type bridge
ip link set eth0.100 master testbr
ip link set eth0.200 master testbr
ip link add link testbr mac0 type macvlan
ip link delete dev testbr

This creates an upper/lower tree of (excuse the poor ASCII art):

            /---eth0.100-eth0
mac0-testbr-
            \---eth0.200-eth0

When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
reference to eth0 is added, so this results in a panic.

This change adds reference count propagation so things are handled properly.

Matthias Schiffer reported a similar crash in batman-adv:

https://github.com/freifunk-gluon/gluon/issues/680
https://www.open-mesh.org/issues/247

which this patch also seems to resolve.

Signed-off-by: Andrew Collins 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

bonding: Fix bonding crash

2016-09-04T18:41:12+00:00

Following few steps will crash kernel -

  (a) Create bonding master
      > modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
	   type macvlan
  (c) Now try adding eth2 into the bond
      > echo +eth2 > /sys/class/net/bond0/bonding/slaves
      

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: remove type_check from dev_get_nest_level()

2016-08-13T22:15:54+00:00

The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:

eth0 <--- macvlan0 <--- vlan0 <--- macvlan1

However, this doesn't prevent a warning on a configuration such as:

eth0 <--- macvlan0 <--- vlan0
eth1 <--- vlan1 <--- macvlan1

In this case, all the locks end up with a nesting subclass of 1, so
lockdep thinks that there is still a deadlock:

- in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
  take (vlan_netdev_xmit_lock_key, 1)
- in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
  take (macvlan_netdev_addr_lock_key, 1)

By removing the linktype check in dev_get_nest_level() and always
incrementing the nesting depth, lockdep considers this configuration
valid.

Signed-off-by: Sabrina Dubroca 
Signed-off-by: David S. Miller

Merge branch 'salted-string-hash'

2016-07-28T19:26:31+00:00

This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
  fs/dcache.c: Save one 32-bit multiply in dcache lookup
  vfs: make the string hashes salt the hash

net: add ndo to setup/query xdp prog in adapter rx

2016-07-20T04:46:31+00:00

Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP
filter. The single op is used for both setup/query of the xdp program,
modelled after ndo_setup_tc.

Signed-off-by: Brenden Blanco 
Signed-off-by: David S. Miller

net: tracepoint napi:napi_poll add work and budget

2016-07-09T22:05:02+00:00

An important information for the napi_poll tracepoint is knowing
the work done (packets processed) by the napi_poll() call. Add
both the work done and budget, as they are related.

Handle trace_napi_poll() param change in dropwatch/drop_monitor
and in python perf script netdev-times.py in backward compat way,
as python fortunately supports optional parameter handling.

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: David S. Miller