linux-stable.git/net/core, branch linux-3.4.y

net/core: revert "net: fix __netdev_update_features return.." and add comment

2016-10-26T15:15:46+00:00

commit 17b85d29e82cc3c874a497a8bc5764d6a2b043e2 upstream.

This reverts commit 00ee59271777 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.

CC: Michał Mirosław 
Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: David S. Miller 
Signed-off-by: Zefan Li

net: Fix skb csum races when peeking

2016-10-26T15:15:42+00:00

[ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ]

When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
Signed-off-by: Zefan Li

net: possible use after free in dst_release

2016-10-26T15:15:42+00:00

commit 07a5d38453599052aff0877b16bb9c1585f08609 upstream.

dst_release should not access dst->flags after decrementing
__refcnt to 0. The dst_entry may be in dst_busy_list and
dst_gc_task may dst_destroy it before dst_release gets a chance
to access dst->flags.

Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()")
Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst")
Signed-off-by: Francesco Ruggeri 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li

net/neighbour: fix crash at dumping device-agnostic proxy entries

2016-10-26T15:15:30+00:00

commit 6adc5fd6a142c6e2c80574c1db0c7c17dedaa42e upstream.

Proxy entries could have null pointer to net-device.

Signed-off-by: Konstantin Khlebnikov 
Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: David S. Miller 
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li

net: fix __netdev_update_features return on ndo_set_features failure

2016-10-26T15:15:29+00:00

commit 00ee5927177792a6e139d50b6b7564d35705556a upstream.

If ndo_set_features fails __netdev_update_features() will return -1 but
this is wrong because it is expected to return 0 if no features were
changed (see netdev_update_features()), which will cause a netdev
notifier to be called without any actual changes. Fix this by returning
0 if ndo_set_features fails.

Fixes: 6cb6a27c45ce ("net: Call netdev_features_change() from netdev_update_features()")
CC: Michał Mirosław 
Signed-off-by: Nikolay Aleksandrov 
Signed-off-by: David S. Miller 
Signed-off-by: Zefan Li

net: fix a race in dst_release()

2016-10-26T15:15:24+00:00

commit d69bbf88c8d0b367cf3e3a052f6daadf630ee566 upstream.

Only cpu seeing dst refcount going to 0 can safely
dereference dst->flags.

Otherwise an other cpu might already have freed the dst.

Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst")
Reported-by: Greg Thelen 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li

net: avoid to hang up on sending due to sysctl configuration overflow.

2016-03-21T01:17:56+00:00

commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream.

    I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or
negative.

    This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:

net.core.wmem_default
net.core.rmem_default

net.core.rmem_max
net.core.wmem_max

net.ipv4.udp_rmem_min
net.ipv4.udp_wmem_min

net.ipv4.tcp_wmem
net.ipv4.tcp_rmem

Signed-off-by: Eric Dumazet 
Signed-off-by: Li Yu 
Signed-off-by: David S. Miller 
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li

net: Clone skb before setting peeked flag

2016-03-21T01:17:45+00:00

commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec upstream.

Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.

The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first.  This causes funky races which leads
to double-free.

This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.

Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv")
Reported-by: Konstantin Khlebnikov 
Signed-off-by: Herbert Xu 
Signed-off-by: David S. Miller 
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li

net: call rcu_read_lock early in process_backlog

2016-03-21T01:17:43+00:00

commit 2c17d27c36dcce2b6bf689f41a46b9e909877c21 upstream.

Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:

CPU 1                  CPU 2
                       skb->dev: no reference
                       process_backlog:__skb_dequeue
                       process_backlog:local_irq_enable

on_each_cpu for
flush_backlog =>       IPI(hardirq): flush_backlog
                       - packet not found in backlog

                       CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections

netdev_run_todo,
rcu_barrier: no
ongoing callbacks
                       __netif_receive_skb_core:rcu_read_lock
                       - too late
free dev
                       process packet for freed dev

Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman 
Cc: Stephen Hemminger 
Signed-off-by: Julian Anastasov 
Signed-off-by: David S. Miller 
[lizf: Backported to 3.4:
 - adjust context
 - no need to change "goto unlock" to "goto out"]
Signed-off-by: Zefan Li

net: do not process device backlog during unregistration

2016-03-21T01:17:43+00:00

commit e9e4dd3267d0c5234c5c0f47440456b10875dec9 upstream.

commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.

Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.

Thanks to Eric W. Biederman for the test script and his
valuable feedback!

Reported-by: Vittorio Gambaletta 
Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman 
Cc: Stephen Hemminger 
Signed-off-by: Julian Anastasov 
Signed-off-by: David S. Miller 
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li