linux-stable.git/net/packet, branch v3.16.40

packet: fix race condition in packet_set_ring

2017-02-23T03:54:41+00:00

commit 84ac7260236a49c79eede91617700174c2c19b0c upstream.

When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.

This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.

The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

packet: on direct_xmit, limit tso and csum to supported devices

2017-02-23T03:54:24+00:00

commit 104ba78c98808ae837d1f63aae58c183db5505df upstream.

When transmitting on a packet socket with PACKET_VNET_HDR and
PACKET_QDISC_BYPASS, validate device support for features requested
in vnet_hdr.

Drop TSO packets sent to devices that do not support TSO or have the
feature disabled. Note that the latter currently do process those
packets correctly, regardless of not advertising the feature.

Because of SKB_GSO_DODGY, it is not sufficient to test device features
with netif_needs_gso. Full validate_xmit_skb is needed.

Switch to software checksum for non-TSO packets that request checksum
offload if that device feature is unsupported or disabled. Note that
similar to the TSO case, device drivers may perform checksum offload
correctly even when not advertising it.

When switching to software checksum, packets hit skb_checksum_help,
which has two BUG_ON checksum not in linear segment. Packet sockets
always allocate at least up to csum_start + csum_off + 2 as linear.

Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c

  ethtool -K eth0 tso off tx on
  psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
  psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N

  ethtool -K eth0 tx off
  psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
  psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N

v2:
  - add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)

Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Willem de Bruijn 
Acked-by: Eric Dumazet 
Acked-by: Daniel Borkmann 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.16: open-code the necessary checks as we don't have
 validate_xmit_skb_list()]
Signed-off-by: Ben Hutchings

packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface

2016-06-15T20:29:23+00:00

commit 309cf37fe2a781279b7675d4bb7173198e532867 upstream.

Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.

Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].

Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module")
Signed-off-by: Mathias Krause 
Cc: Eric W. Biederman 
Cc: Pavel Emelyanov 
Acked-by: Pavel Emelyanov 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

packet: infer protocol from ethernet header if unset

2015-12-18T16:06:18+00:00

commit c72219b75fde768efccf7666342282fab7f9e4e7 upstream.

In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).

Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, , 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.

That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.

Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().

Reported-by: Eric Dumazet 
Signed-off-by: Daniel Borkmann 
Acked-by: Willem de Bruijn 
Signed-off-by: David S. Miller 
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques

packet: always probe for transport header

2015-12-18T16:06:18+00:00

commit 8fd6c80d9dd938ca338c70698533a7e304752846 upstream.

We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.

While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.

Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann 
Acked-by: Jason Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques

packet: fix tpacket_snd max frame len

2015-12-14T10:17:12+00:00

commit 5cfb4c8d05b4409c4044cb9c05b19705c1d9818b upstream.

Since it's introduction in commit 69e3c75f4d54 ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).

I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().

Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann 
Acked-by: Willem de Bruijn 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques

packet: only allow extra vlan len on ethernet devices

2015-12-14T10:17:12+00:00

commit 3c70c132488794e2489ab045559b0ce0afcf17de upstream.

Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.

This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).

Fixes: 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f629f ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann 
Acked-by: Willem de Bruijn 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques

packet: do skb_probe_transport_header when we actually have data

2015-12-14T10:17:12+00:00

commit efdfa2f7848f64517008136fb41f53c4a1faf93a upstream.

In tpacket_fill_skb() commit c1aad275b029 ("packet: set transport
header before doing xmit") and later on 40893fd0fd4e ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.

Fixes: c1aad275b029 ("packet: set transport header before doing xmit")
Reported-by: Eric Dumazet 
Signed-off-by: Daniel Borkmann 
Acked-by: Jason Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques

packet: fix match_fanout_group()

2015-12-13T17:49:32+00:00

commit 161642e24fee40fba2c5bc2ceacc00d118a22d65 upstream.

Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_priv

But SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".

We can read non existent memory and crash.

Fixes: c0de08d04215 ("af_packet: don't emit packet on orig fanout group")
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Cc: Eric Leblond 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques

packet: race condition in packet_bind

2015-12-09T11:40:54+00:00

commit 30f7ea1c2b5f5fb7462c5ae44fe2e40cb2d6a474 upstream.

There is a race conditions between packet_notifier and packet_bind{_spkt}.

It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.

This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.

import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
   vlanId=int(sys.argv[1])
dev='vlan%d' % vlanId

os.system('taskset -p 0x10 %d' % os.getpid())

s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
          (realDev, dev, vlanId))
os.system('ip netns add dummy')

pid=os.fork()

if pid == 0:
   # dev should be moved while packet_do_bind is in synchronize net
   os.system('taskset -p 0x20000 %d' % os.getpid())
   os.system('ip link set %s netns dummy' % dev)
   os.system('ip netns exec dummy ip link del %s' % dev)
   s.close()
   sys.exit(0)

time.sleep(.004)
try:
   s.bind(('%s' % dev, proto+1))
except:
   print 'Could not bind socket'
   s.close()
   os.system('ip netns del dummy')
   sys.exit(0)

os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)

Signed-off-by: Francesco Ruggeri 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques