linux.git/net/ipv6, branch v5.4-rc2

ipv6: Handle missing host route in __ipv6_ifa_notify

2019-10-05T01:08:58+00:00

Rajendra reported a kernel panic when a link was taken down:

    [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    [ 6870.271856] IP: [] __ipv6_ifa_notify+0x154/0x290

    

    [ 6870.570501] Call Trace:
    [ 6870.573238] [] ? ipv6_ifa_notify+0x26/0x40
    [ 6870.579665] [] ? addrconf_dad_completed+0x4c/0x2c0
    [ 6870.586869] [] ? ipv6_dev_mc_inc+0x196/0x260
    [ 6870.593491] [] ? addrconf_dad_work+0x10a/0x430
    [ 6870.600305] [] ? __switch_to_asm+0x34/0x70
    [ 6870.606732] [] ? process_one_work+0x18a/0x430
    [ 6870.613449] [] ? worker_thread+0x4d/0x490
    [ 6870.619778] [] ? process_one_work+0x430/0x430
    [ 6870.626495] [] ? kthread+0xd9/0xf0
    [ 6870.632145] [] ? __switch_to_asm+0x34/0x70
    [ 6870.638573] [] ? kthread_park+0x60/0x60
    [ 6870.644707] [] ? ret_from_fork+0x57/0x70
    [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0

addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in __ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.

Since the DAD sequence can not be aborted, add a check for the missing
host route in __ipv6_ifa_notify. The only way this should happen is due
to the previously mentioned race. The host route is created when the
address is added to an interface; it is only removed on a down event
where the address is kept. Add a warning if the host route is missing
AND the device is up; this is a situation that should never happen.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri 
Signed-off-by: David Ahern 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller

Revert "ipv6: Handle race in addrconf_dad_work"

2019-10-04T21:31:10+00:00

This reverts commit a3ce2a21bb8969ae27917281244fa91bf5f286d7.

Eric reported tests failings with commit. After digging into it,
the bottom line is that the DAD sequence is not to be messed with.
There are too many cases that are expected to proceed regardless
of whether a device is up.

Revert the patch and I will send a different solution for the
problem Rajendra reported.

Signed-off-by: David Ahern 
Cc: Eric Dumazet 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller

udp: only do GSO if # of segs > 1

2019-10-03T15:47:10+00:00

Prior to this change an application sending <= 1MSS worth of data and
enabling UDP GSO would fail if the system had SW GSO enabled, but the
same send would succeed if HW GSO offload is enabled. In addition to this
inconsistency the error in the SW GSO case does not get back to the
application if sending out of a real device so the user is unaware of this
failure.

With this change we only perform GSO if the # of segments is > 1 even
if the application has enabled segmentation. I've also updated the
relevant udpgso selftests.

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Josh Hunt 
Reviewed-by: Willem de Bruijn 
Reviewed-by: Alexander Duyck 
Signed-off-by: David S. Miller

udp: fix gso_segs calculations

2019-10-03T15:47:10+00:00

Commit dfec0ee22c0a ("udp: Record gso_segs when supporting UDP segmentation offload")
added gso_segs calculation, but incorrectly got sizeof() the pointer and
not the underlying data type. In addition let's fix the v6 case.

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Fixes: dfec0ee22c0a ("udp: Record gso_segs when supporting UDP segmentation offload")
Signed-off-by: Josh Hunt 
Reviewed-by: Alexander Duyck 
Acked-by: Willem de Bruijn 
Signed-off-by: David S. Miller

ipv6: drop incoming packets having a v4mapped source address

2019-10-03T15:40:21+00:00

This began with a syzbot report. syzkaller was injecting
IPv6 TCP SYN packets having a v4mapped source address.

After an unsuccessful 4-tuple lookup, TCP creates a request
socket (SYN_RECV) and calls reqsk_queue_hash_req()

reqsk_queue_hash_req() calls sk_ehashfn(sk)

At this point we have AF_INET6 sockets, and the heuristic
used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)

For the particular spoofed packet, we end up hashing V4 addresses
which were not initialized by the TCP IPv6 stack, so KMSAN fired
a warning.

I first fixed sk_ehashfn() to test both source and destination addresses,
but then faced various problems, including user-space programs
like packetdrill that had similar assumptions.

Instead of trying to fix the whole ecosystem, it is better
to admit that we have a dual stack behavior, and that we
can not build linux kernels without V4 stack anyway.

The dual stack API automatically forces the traffic to be IPv4
if v4mapped addresses are used at bind() or connect(), so it makes
no sense to allow IPv6 traffic to use the same v4mapped class.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Cc: Florian Westphal 
Cc: Hannes Frederic Sowa 
Reported-by: syzbot 
Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

2019-10-02T20:23:13+00:00

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Remove the skb_ext_del from nf_reset, and renames it to a more
   fitting nf_reset_ct(). Patch from Florian Westphal.

2) Fix deadlock in nft_connlimit between packet path updates and
   the garbage collector.
====================

Signed-off-by: David S. Miller

ipv6: Handle race in addrconf_dad_work

2019-10-02T01:43:41+00:00

Rajendra reported a kernel panic when a link was taken down:

[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [] __ipv6_ifa_notify+0x154/0x290



[ 6870.570501] Call Trace:
[ 6870.573238] [] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [] ? process_one_work+0x18a/0x430
[ 6870.613449] [] ? worker_thread+0x4d/0x490
[ 6870.619778] [] ? process_one_work+0x430/0x430
[ 6870.626495] [] ? kthread+0xd9/0xf0
[ 6870.632145] [] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [] ? kthread_park+0x60/0x60
[ 6870.644707] [] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0

addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.

This scenario does not occur when the ipv6 address is not kept
(net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
state of the ifp to DEAD. Handle when the addresses are kept by checking
IF_READY which is reset by addrconf_ifdown.

The 'dead' flag for an inet6_addr is set only under rtnl, in
addrconf_ifdown and it means the device is getting removed (or IPv6 is
disabled). The interesting cases for changing the idev flag are
addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
(reset the flag). The former does not have the idev lock - only rtnl;
the latter has both. Based on that the existing dead + IF_READY check
can be moved to right after the rtnl_lock in addrconf_dad_work.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri 
Signed-off-by: David Ahern 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller

netfilter: drop bridge nf reset from nf_reset

2019-10-01T16:42:15+00:00

commit 174e23810cd31
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions.  The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.

Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.

This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().

In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.

I am submitting this for "net", because we're still early in the release
cycle.  The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.

Suggested-by: Eric Dumazet 
Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso

tcp: honor SO_PRIORITY in TIME_WAIT state

2019-09-27T10:05:02+00:00

ctl packets sent on behalf of TIME_WAIT sockets currently
have a zero skb->priority, which can cause various problems.

In this patch we :

- add a tw_priority field in struct inet_timewait_sock.

- populate it from sk->sk_priority when a TIME_WAIT is created.

- For IPv4, change ip_send_unicast_reply() and its two
  callers to propagate tw_priority correctly.
  ip_send_unicast_reply() no longer changes sk->sk_priority.

- For IPv6, make sure TIME_WAIT sockets pass their tw_priority
  field to tcp_v6_send_response() and tcp_v6_send_ack().

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

ipv6: tcp: provide sk->sk_priority to ctl packets

2019-09-27T10:05:02+00:00

We can populate skb->priority for some ctl packets
instead of always using zero.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller