linux-stable.git/net/netlink, branch linux-3.16.y

netlink: Don't shift on 64 for ngroups

2018-11-20T18:05:54+00:00

commit 91874ecf32e41b5d86a4cb9d60e0bee50d828058 upstream.

It's legal to have 64 groups for netlink_sock.

As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.

The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.

Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" 
Cc: Herbert Xu 
Cc: Steffen Klassert 
Cc: netdev@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor 
Signed-off-by: Dmitry Safonov 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: Don't shift with UB on nlk->ngroups

2018-11-20T18:05:53+00:00

commit 61f4b23769f0cc72ae62c9a81cf08f0397d40da8 upstream.

On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.

Fixes: 7acf9d4237c4 ("netlink: Do not subscribe to non-existent groups").
Reported-by: kernel test robot 
Signed-off-by: Dmitry Safonov 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: Do not subscribe to non-existent groups

2018-11-20T18:05:52+00:00

commit 7acf9d4237c46894e0fa0492dd96314a41742e84 upstream.

Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)

Cc: "David S. Miller" 
Cc: Herbert Xu 
Cc: Steffen Klassert 
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: fix uninit-value in netlink_sendmsg

2018-10-21T07:45:17+00:00

commit 6091f09c2f79730d895149bcfe3d66140288cd0e upstream.

syzbot reported :

BUG: KMSAN: uninit-value in ffs arch/x86/include/asm/bitops.h:432 [inline]
BUG: KMSAN: uninit-value in netlink_sendmsg+0xb26/0x1310 net/netlink/af_netlink.c:1851

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: make sure nladdr has correct size in netlink_connect()

2018-06-16T21:22:48+00:00

commit 7880287981b60a6808f39f297bb66936e8bdf57a upstream.

KMSAN reports use of uninitialized memory in the case when |alen| is
smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
fully copied from the userspace.

Signed-off-by: Alexander Potapenko 
Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: avoid a double skb free in genlmsg_mcast()

2018-06-16T21:22:11+00:00

commit 02a2385f37a7c6594c9d89b64c4a1451276f08eb upstream.

nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.

Fixes: cb9f7a9a5c96 ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()")
Reported-by: Ben Hutchings 
Signed-off-by: Nicolas Dichtel 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: ensure to loop over all netns in genlmsg_multicast_allns()

2018-06-16T21:22:11+00:00

commit cb9f7a9a5c96a773bbc9c70660dc600cfff82f82 upstream.

Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
case when commit 134e63756d5f was pushed.
However, there was no reason to stop the loop if a netns does not have
listeners.
Returns -ESRCH only if there was no listeners in all netns.

To avoid having the same problem in the future, I didn't take the
assumption that nlmsg_multicast() returns only 0 or -ESRCH.

Fixes: 134e63756d5f ("genetlink: make netns aware")
CC: Johannes Berg 
Signed-off-by: Nicolas Dichtel 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: Add netns check on taps

2018-01-01T20:52:09+00:00

commit 93c647643b48f0131f02e45da3bd367d80443291 upstream.

Currently, a nlmon link inside a child namespace can observe systemwide
netlink activity.  Filter the traffic so that nlmon can only sniff
netlink messages from its own netns.

Test case:

    vpnns -- bash -c "ip link add nlmon0 type nlmon; \
                      ip link set nlmon0 up; \
                      tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
    sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
        spi 0x1 mode transport \
        auth sha1 0x6162633132330000000000000000000000000000 \
        enc aes 0x00000000000000000000000000000000
    grep --binary abc123 /tmp/nlmon.pcap

Signed-off-by: Kevin Cernekee 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

netlink: remove mmapped netlink support

2017-04-04T21:21:56+00:00

commit d1b4c689d4130bcfd3532680b64db562300716b6 upstream.

mmapped netlink has a number of unresolved issues:

- TX zerocopy support had to be disabled more than a year ago via
  commit 4682a0358639b29cf ("netlink: Always copy on mmap TX.")
  because the content of the mmapped area can change after netlink
  attribute validation but before message processing.

- RX support was implemented mainly to speed up nfqueue dumping packet
  payload to userspace.  However, since commit ae08ce0021087a5d812d2
  ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
  with the socket-based interface too (via the skb_zerocopy helper).

The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:

- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.

- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb->head.
See for instance
commit aa3a022094fa ("netlink: not trim skb for mmaped socket when dump").

- skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
crash because it needs the sk to check if a tx ring is attached.

Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").

mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch.  Daniel Borkmann fixed this via
commit 6bb0fef489f6 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.

nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
  Problem is that in the mmap case, the allocation time also determines
  the ordering in which the frame will be seen by userspace (A
  allocating before B means that A is located in earlier ring slot,
  but this also means that B might get a lower sequence number then A
  since seqno is decided later.  To fix this we would need to extend the
  spinlocked region to also cover the allocation and message setup which
  isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
  Queing GSO packets is faster than having to force a software segmentation
  in the kernel, so this is a desirable option.  However, with a mmap based
  ring one has to use 64kb per ring slot element, else mmap has to fall back
  to the socket path (NL_MMAP_STATUS_COPY) for all large packets.

To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.

Cc: Daniel Borkmann 
Cc: Ken-ichirou MATSUZAWA 
Cc: Pablo Neira Ayuso 
Cc: Patrick McHardy 
Cc: Thomas Graf 
Signed-off-by: Florian Westphal 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.16: deleted code and documentation is different in places]
Signed-off-by: Ben Hutchings 
Cc: Shi Yuejie

netlink: do not enter direct reclaim from netlink_dump()

2017-02-23T03:54:09+00:00

commit d35c99ff77ecb2eb239731b799386f3b3637a31e upstream.

Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
allocations.

Due to struct skb_shared_info ~320 bytes overhead, we end up using
order-3 (on x86) page allocations, that might trigger direct reclaim and
add stress.

The intent was really to attempt a large allocation but immediately
fallback to a smaller one (order-1 on x86) in case of memory stress.

On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
meet the goal. Old kernels would need to remove __GFP_WAIT

While we are at it, since we do an order-3 allocation, allow to use
all the allocated bytes instead of 16384 to reduce syscalls during
large dumps.

iproute2 already uses 32KB recvmsg() buffer sizes.

Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)

Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet 
Reported-by: Alexei Starovoitov 
Cc: Greg Thelen 
Reviewed-by: Greg Rose 
Acked-by: Alexei Starovoitov 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.16:
 - Adjust context
 - Mask out __GFP_WAIT, as suggested]
Signed-off-by: Ben Hutchings