linux-stable.git/net/core, branch v4.6.4

neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()

2016-07-11T16:30:01+00:00

[ Upstream commit b560f03ddfb072bca65e9440ff0dc4f9b1d1f056 ]

neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso 
Signed-off-by: Lennert Buytenhek 
Acked-by: David Ahern 
Acked-by: Robert Shearman 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: hwbm: Fix unbalanced spinlock in error case

2016-06-24T17:22:00+00:00

[ Upstream commit b388fc7405e901c7d6f7817d05193c054e761815 ]

When hwbm_pool_add exited in error the spinlock was not released. This
patch fixes this issue.

Fixes: 8cb2d8bf57e6 ("net: add a hardware buffer management helper API")
Reported-by: Jean-Jacques Hiblot 
Cc: 
Signed-off-by: Gregory CLEMENT 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

2016-05-04T20:35:31+00:00

Steffen Klassert says:

====================
pull request (net): ipsec 2016-05-04

1) The flowcache can hit an OOM condition if too
   many entries are in the gc_list. Fix this by
   counting the entries in the gc_list and refuse
   new allocations if the value is too high.

2) The inner headers are invalid after a xfrm transformation,
   so reset the skb encapsulation field to ensure nobody tries
   access the inner headers. Otherwise tunnel devices stacked
   on top of xfrm may build the outer headers based on wrong
   informations.

3) Add pmtu handling to vti, we need it to report
   pmtu informations for local generated packets.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller

net: fix infoleak in rtnetlink

2016-05-04T20:19:42+00:00

The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.

Signed-off-by: Kangjie Lu 
Signed-off-by: David S. Miller

net: Disable segmentation if checksumming is not supported

2016-05-03T20:00:54+00:00

In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum
offload for tunnels.  With this being the case we should disable GSO in
addition to the checksum offload features when we find that a device cannot
perform a checksum on a given packet type.

Signed-off-by: Alexander Duyck 
Signed-off-by: David S. Miller

vlan: pull on __vlan_insert_tag error path and fix csum correction

2016-04-16T03:20:11+00:00

When __vlan_insert_tag() fails from skb_vlan_push() path due to the
skb_cow_head(), we need to undo the __skb_push() in the error path
as well that was done earlier to move skb->data pointer to mac header.

Moreover, I noticed that when in the non-error path the __skb_pull()
is done and the original offset to mac header was non-zero, we fixup
from a wrong skb->data offset in the checksum complete processing.

So the skb_postpush_rcsum() really needs to be done before __skb_pull()
where skb->data still points to the mac header start and thus operates
under the same conditions as in __vlan_insert_tag().

Fixes: 93515d53b133 ("net: move vlan pop/push functions into common code")
Signed-off-by: Daniel Borkmann 
Reviewed-by: Jiri Pirko 
Signed-off-by: David S. Miller

GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU

2016-04-07T20:56:33+00:00

This patch fixes an issue I found in which we were dropping frames if we
had enabled checksums on GRE headers that were encapsulated by either FOU
or GUE.  Without this patch I was barely able to get 1 Gb/s of throughput.
With this patch applied I am now at least getting around 6 Gb/s.

The issue is due to the fact that with FOU or GUE applied we do not provide
a transport offset pointing to the GRE header, nor do we offload it in
software as the GRE header is completely skipped by GSO and treated like a
VXLAN or GENEVE type header.  As such we need to prevent the stack from
generating it and also prevent GRE from generating it via any interface we
create.

Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE")
Signed-off-by: Alexander Duyck 
Signed-off-by: David S. Miller

net: add the AF_KCM entries to family name tables

2016-04-06T20:59:01+00:00

This is for the recent kcm driver, which introduces AF_KCM(41) in
b7ac4eb(kcm: Kernel Connection Multiplexor module).

Signed-off-by: Dexuan Cui 
Cc: Signed-off-by: Tom Herbert 
Signed-off-by: David S. Miller

Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"

2016-04-05T23:34:44+00:00

This reverts commit 543e3a8da5a4c453e992d5351ef405d5e32f27d7.

Direct callers of __netpoll_setup() depend on it to set np->dev,
so we can't simply move that assignment up to netpoll_stup().

Reported-by: Bart Van Assche 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: David S. Miller

tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter

2016-04-01T18:33:46+00:00

Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:

  [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
  [   52.765688] other info that might help us debug this:
  [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
  [   52.765701] 1 lock held by a.out/1525:
  [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
  [   52.765721] stack backtrace:
  [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
  [...]
  [   52.765768] Call Trace:
  [   52.765775]  [] dump_stack+0x85/0xc8
  [   52.765784]  [] lockdep_rcu_suspicious+0xd5/0x110
  [   52.765792]  [] sk_detach_filter+0x82/0x90
  [   52.765801]  [] tun_detach_filter+0x35/0x90 [tun]
  [   52.765810]  [] __tun_chr_ioctl+0x354/0x1130 [tun]
  [   52.765818]  [] ? selinux_file_ioctl+0x130/0x210
  [   52.765827]  [] tun_chr_ioctl+0x13/0x20 [tun]
  [   52.765834]  [] do_vfs_ioctl+0x96/0x690
  [   52.765843]  [] ? security_file_ioctl+0x43/0x60
  [   52.765850]  [] SyS_ioctl+0x79/0x90
  [   52.765858]  [] do_syscall_64+0x62/0x140
  [   52.765866]  [] entry_SYSCALL64_slow_path+0x25/0x25

Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.

Since the fix in f91ff5b9ff52 ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.

Since its introduction in 994051625981 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.

Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.

Reported-by: Sasha Levin 
Signed-off-by: Daniel Borkmann 
Signed-off-by: David S. Miller