linux-stable.git/drivers/net/vxlan.c, branch linux-3.14.y

Fix race condition between vxlan_sock_add and vxlan_sock_release

2014-12-16T17:34:27+00:00

[ Upstream commit 00c83b01d58068dfeb2e1351cca6fccf2a83fa8f ]

Currently, when trying to reuse a socket, vxlan_sock_add will grab
vn->sock_lock, locate a reusable socket, inc refcount and release
vn->sock_lock.

But vxlan_sock_release() will first decrement refcount, and then grab
that lock. refcnt operations are atomic but as currently we have
deferred works which hold vs->refcnt each, this might happen, leading to
a use after free (specially after vxlan_igmp_leave):

  CPU 1                            CPU 2

deferred work                    vxlan_sock_add
  ...                              ...
                                   spin_lock(&vn->sock_lock)
                                   vs = vxlan_find_sock();
  vxlan_sock_release
    dec vs->refcnt, reaches 0
    spin_lock(&vn->sock_lock)
                                   vxlan_sock_hold(vs), refcnt=1
                                   spin_unlock(&vn->sock_lock)
    hlist_del_rcu(&vs->hlist);
    vxlan_notify_del_rx_port(vs)
    spin_unlock(&vn->sock_lock)

So when we look for a reusable socket, we check if it wasn't freed
already before reusing it.

Signed-off-by: Marcelo Ricardo Leitner 
Fixes: 7c47cedf43a8b3 ("vxlan: move IGMP join/leave to work queue")
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vxlan: Do not reuse sockets for a different address family

2014-11-21T17:23:00+00:00

[ Upstream commit 19ca9fc1445b76b60d34148f7ff837b055f5dcf3 ]

Currently, we only match against local port number in order to reuse
socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound
to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will
follow. The following steps reproduce it:

   # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \
       srcport 5000 6000 dev eth0
   # ip link add vxlan7 type vxlan id 43 group ff0e::110 \
       srcport 5000 6000 dev eth0
   # ip link set vxlan6 up
   # ip link set vxlan7 up
   

[    4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
...
[    4.188076] Call Trace:
[    4.188085]  [] ? ipv6_sock_mc_join+0x3a/0x630
[    4.188098]  [] vxlan_igmp_join+0x66/0xd0 [vxlan]
[    4.188113]  [] process_one_work+0x220/0x710
[    4.188125]  [] ? process_one_work+0x1b4/0x710
[    4.188138]  [] worker_thread+0x11b/0x3a0
[    4.188149]  [] ? process_one_work+0x710/0x710

So address family must also match in order to reuse a socket.

Reported-by: Jean-Tsung Hsiao 
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vxlan: fix a free after use

2014-11-14T16:59:43+00:00

[ Upstream commit 7a9f526fc3ee49b6034af2f243676ee0a27dcaa8 ]

pskb_may_pull maybe change skb->data and make eth pointer oboslete,
so eth needs to reload

Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible")
Cc: Eric Dumazet 
Signed-off-by: Li RongQing 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vxlan: using pskb_may_pull as early as possible

2014-11-14T16:59:43+00:00

[ Upstream commit 91269e390d062b526432f2ef1352b8df82e0e0bc ]

pskb_may_pull should be used to check if skb->data has enough space,
skb->len can not ensure that.

Cc: Cong Wang 
Signed-off-by: Li RongQing 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vxlan: fix a use after free in vxlan_encap_bypass

2014-11-14T16:59:42+00:00

[ Upstream commit ce6502a8f9572179f044a4d62667c4645256d6e4 ]

when netif_rx() is done, the netif_rx handled skb maybe be freed,
and should not be used.

Signed-off-by: Li RongQing 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vxlan: fix incorrect initializer in union vxlan_addr

2014-10-15T06:36:41+00:00

[ Upstream commit a45e92a599e77ee6a850eabdd0141633fde03915 ]

The first initializer in the following

        union vxlan_addr ipa = {
            .sin.sin_addr.s_addr = tip,
            .sa.sa_family = AF_INET,
        };

is optimised away by the compiler, due to the second initializer,
therefore initialising .sin.sin_addr.s_addr always to 0.
This results in netlink messages indicating a L3 miss never contain the
missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
The problem affects user space programs relying on an IP address being
sent as part of a netlink message indicating a L3 miss.

Changing
            .sa.sa_family = AF_INET,
to
            .sin.sin_family = AF_INET,
fixes the problem.

Signed-off-by: Gerhard Stenzel 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vxlan: use dev->needed_headroom instead of dev->hard_header_len

2014-06-26T19:15:40+00:00

[ Upstream commit 2853af6a2ea1a8ed09b09dd4fb578e7f435e8d34 ]

When we mirror packets from a vxlan tunnel to other device,
the mirror device should see the same packets (that is, without
outer header). Because vxlan tunnel sets dev->hard_header_len,
tcf_mirred() resets mac header back to outer mac, the mirror device
actually sees packets with outer headers

Vxlan tunnel should set dev->needed_headroom instead of
dev->hard_header_len, like what other ip tunnels do. This fixes
the above problem.

Cc: "David S. Miller" 
Cc: stephen hemminger 
Cc: Pravin B Shelar 
Signed-off-by: Cong Wang 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: vxlan: fix crash when interface is created with no group

2014-04-14T13:50:04+00:00

[ Upstream commit 5933a7bbb5de66482ea8aa874a7ebaf8e67603c4 ]

If the vxlan interface is created without explicit group definition,
there are corner cases which may cause kernel panic.

For instance, in the following scenario:

node A:
$ ip link add dev vxlan42  address 2c:c2:60:00:10:20 type vxlan id 42
$ ip addr add dev vxlan42 10.0.0.1/24
$ ip link set up dev vxlan42
$ arp -i vxlan42 -s 10.0.0.2 2c:c2:60:00:01:02
$ bridge fdb add dev vxlan42 to 2c:c2:60:00:01:02 dst 
$ ping 10.0.0.2

node B:
$ ip link add dev vxlan42 address 2c:c2:60:00:01:02 type vxlan id 42
$ ip addr add dev vxlan42 10.0.0.2/24
$ ip link set up dev vxlan42
$ arp -i vxlan42 -s 10.0.0.1 2c:c2:60:00:10:20

node B crashes:

 vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
 vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000046
 IP: [] ip6_route_output+0x58/0x82
 PGD 7bd89067 PUD 7bd4e067 PMD 0
 Oops: 0000 [#1] SMP
 Modules linked in:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc8-hvx-xen-00019-g97a5221-dirty #154
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff88007c774f50 ti: ffff88007c79c000 task.ti: ffff88007c79c000
 RIP: 0010:[]  [] ip6_route_output+0x58/0x82
 RSP: 0018:ffff88007fd03668  EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffffffff8186a000 RCX: 0000000000000040
 RDX: 0000000000000000 RSI: ffff88007b0e4a80 RDI: ffff88007fd03754
 RBP: ffff88007fd03688 R08: ffff88007b0e4a80 R09: 0000000000000000
 R10: 0200000a0100000a R11: 0001002200000000 R12: ffff88007fd03740
 R13: ffff88007b0e4a80 R14: ffff88007b0e4a80 R15: ffff88007bba0c50
 FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000046 CR3: 000000007bb60000 CR4: 00000000000006e0
 Stack:
  0000000000000000 ffff88007fd037a0 ffffffff8186a000 ffff88007fd03740
  ffff88007fd036c8 ffffffff814320bb 0000000000006e49 ffff88007b8b7360
  ffff88007bdbf200 ffff88007bcbc000 ffff88007b8b7000 ffff88007b8b7360
 Call Trace:
  
  [] ip6_dst_lookup_tail+0x2d/0xa4
  [] ip6_dst_lookup+0x10/0x12
  [] vxlan_xmit_one+0x32a/0x68c
  [] ? _raw_spin_unlock_irqrestore+0x12/0x14
  [] ? lock_timer_base.isra.23+0x26/0x4b
  [] vxlan_xmit+0x66a/0x6a8
  [] ? ipt_do_table+0x35f/0x37e
  [] ? selinux_ip_postroute+0x41/0x26e
  [] dev_hard_start_xmit+0x2ce/0x3ce
  [] __dev_queue_xmit+0x2d0/0x392
  [] ? eth_header+0x28/0xb5
  [] dev_queue_xmit+0xb/0xd
  [] neigh_resolve_output+0x134/0x152
  [] ip_finish_output2+0x236/0x299
  [] ip_finish_output+0x98/0x9d
  [] ip_output+0x62/0x67
  [] dst_output+0xf/0x11
  [] ip_local_out+0x1b/0x1f
  [] ip_send_skb+0x11/0x37
  [] ip_push_pending_frames+0x2f/0x33
  [] icmp_push_reply+0x106/0x115
  [] icmp_reply+0x142/0x164
  [] icmp_echo.part.16+0x46/0x48
  [] ? nf_iterate+0x43/0x80
  [] ? xfrm4_policy_check.constprop.11+0x52/0x52
  [] icmp_echo+0x25/0x27
  [] icmp_rcv+0x1d2/0x20a
  [] ? xfrm4_policy_check.constprop.11+0x52/0x52
  [] ip_local_deliver_finish+0xd6/0x14f
  [] ? xfrm4_policy_check.constprop.11+0x52/0x52
  [] NF_HOOK.constprop.10+0x4c/0x53
  [] ip_local_deliver+0x4a/0x4f
  [] ip_rcv_finish+0x253/0x26a
  [] ? inet_add_protocol+0x3e/0x3e
  [] NF_HOOK.constprop.10+0x4c/0x53
  [] ip_rcv+0x2a6/0x2ec
  [] __netif_receive_skb_core+0x43e/0x478
  [] ? virtqueue_poll+0x16/0x27
  [] __netif_receive_skb+0x55/0x5a
  [] process_backlog+0x76/0x12f
  [] net_rx_action+0xa2/0x1ab
  [] __do_softirq+0xca/0x1d1
  [] irq_exit+0x3e/0x85
  [] do_IRQ+0xa9/0xc4
  [] common_interrupt+0x6d/0x6d
  
  [] ? native_safe_halt+0x6/0x8
  [] default_idle+0x9/0xd
  [] arch_cpu_idle+0x13/0x1c
  [] cpu_startup_entry+0xbc/0x137
  [] start_secondary+0x1a0/0x1a5
 Code: 24 14 e8 f1 e5 01 00 31 d2 a8 32 0f 95 c2 49 8b 44 24 2c 49 0b 44 24 24 74 05 83 ca 04 eb 1c 4d 85 ed 74 17 49 8b 85 a8 02 00 00 <66> 8b 40 46 66 c1 e8 07 83 e0 07 c1 e0 03 09 c2 4c 89 e6 48 89
 RIP  [] ip6_route_output+0x58/0x82
  RSP 
 CR2: 0000000000000046
 ---[ end trace 4612329caab37efd ]---

When vxlan interface is created without explicit group definition, the
default_dst protocol family is initialiazed to AF_UNSPEC and the driver
assumes IPv4 configuration. On the other side, the default_dst protocol
family is used to differentiate between IPv4 and IPv6 cases and, since,
AF_UNSPEC != AF_INET, the processing takes the IPv6 path.

Making the IPv4 assumption explicit by settting default_dst protocol
family to AF_INET4 and preventing mixing of IPv4 and IPv6 addresses in
snooped fdb entries fixes the corner case crashes.

Signed-off-by: Mike Rapoport 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

vxlan: fix nonfunctional neigh_reduce()

2014-03-24T19:35:10+00:00

The VXLAN neigh_reduce() code is completely non-functional since
check-in. Specific errors:

1) The original code drops all packets with a multicast destination address,
	even though neighbor solicitations are sent to the solicited-node
	address, a multicast address. The code after this check was never run.
2) The neighbor table lookup used the IPv6 header destination, which is the
	solicited node address, rather than the target address from the
	neighbor solicitation. So neighbor lookups would always fail if it
	got this far. Also for L3MISSes.
3) The code calls ndisc_send_na(), which does a send on the tunnel device.
	The context for neigh_reduce() is the transmit path, vxlan_xmit(),
	where the host or a bridge-attached neighbor is trying to transmit
	a neighbor solicitation. To respond to it, the tunnel endpoint needs
	to do a *receive* of the appropriate neighbor advertisement. Doing a
	send, would only try to send the advertisement, encapsulated, to the
	remote destinations in the fdb -- hosts that definitely did not do the
	corresponding solicitation.
4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the
	isrouter flag in the advertisement. This has nothing to do with whether
	or not the target is a router, and generally won't be set since the
	tunnel endpoint is bridging, not routing, traffic.

	The patch below creates a proxy neighbor advertisement to respond to
neighbor solicitions as intended, providing proper IPv6 support for neighbor
reduction.

Signed-off-by: David L Stevens 
Signed-off-by: David S. Miller

vxlan: fix potential NULL dereference in arp_reduce()

2014-03-18T20:09:34+00:00

This patch fixes a NULL pointer dereference in the event of an
skb allocation failure in arp_reduce().

Signed-Off-By: David L Stevens 
Acked-by: Cong Wang 

Signed-off-by: David S. Miller