linux-stable.git/net/ipv6, branch v3.2.67

ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs

2015-02-20T00:49:24+00:00

commit 4c672e4b42bc8046d63a6eb0a2c6a450a501af32 upstream.

It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():

skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
 
 [] ? skb_put+0x3a/0x3b
 [] ? add_grhead+0x45/0x8e
 [] ? add_grec+0x394/0x3d4
 [] ? mld_ifc_timer_expire+0x195/0x20d
 [] ? mld_dad_timer_expire+0x45/0x45
 [] ? call_timer_fn.isra.29+0x12/0x68
 [] ? run_timer_softirq+0x163/0x182
 [] ? __do_softirq+0xe0/0x21d
 [] ? irq_exit+0x4e/0xd3
 [] ? smp_apic_timer_interrupt+0x3b/0x46
 [] ? apic_timer_interrupt+0x6a/0x70

mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.

However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.

The IGMP case doesn't have this issue as commit 57e1ab6eaddc
("igmp: refine skb allocations") stores the allocation size in
the cb[].

Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().

Reported-by: Wei Liu 
Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann 
Cc: Eric Dumazet 
Cc: Hannes Frederic Sowa 
Cc: David L Stevens 
Acked-by: Eric Dumazet 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

ipv6: Remove all uses of LL_ALLOCATED_SPACE

2015-02-20T00:49:23+00:00

commit a7ae1992248e5cf9dc5bd35695ab846d27efe15f upstream.

ipv6: Remove all uses of LL_ALLOCATED_SPACE

The macro LL_ALLOCATED_SPACE was ill-conceived.  It applies the
alignment to the sum of needed_headroom and needed_tailroom.  As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.

This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.

This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.

Signed-off-by: Herbert Xu 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

tcp: md5: remove spinlock usage in fast path

2015-01-01T01:27:52+00:00

commit 71cea17ed39fdf1c0634f530ddc6a2c2fc601c2b upstream.

TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.

[ tcp_md5sig_pool_lock is locked twice per incoming packet ]

Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.

Next step would be to allow crypto allocations being done in a NUMA
aware way.

Signed-off-by: Eric Dumazet 
Cc: Herbert Xu 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2:
 - Adjust context
 - Conditions for alloc/free are quite different]
Signed-off-by: Ben Hutchings

drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets

2015-01-01T01:27:51+00:00

commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7 upstream.

UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings 
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller 
[bwh: For 3.2, net/ipv6/output_core.c is a completely new file]

tcp: be more strict before accepting ECN negociation

2014-12-14T16:24:00+00:00

commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24 upstream.

It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet 
Cc: Perry Lorier 
Cc: Matt Mathis 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
Cc: Wilmer van der Gaast 
Cc: Ankur Jain 
Cc: Tom Herbert 
Cc: Dave Täht 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
Cc: Florian Westphal

ipv6: reuse ip6_frag_id from ip6_ufo_append_data

2014-11-05T20:27:47+00:00

commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 upstream.

Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings

ipv6: reallocate addrconf router for ipv6 address when lo device up

2014-11-05T20:27:47+00:00

It fix the bug 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951

The patch can't be applied directly, as it' used the function introduced
by "commit 94e187c0" ip6_rt_put(), that patch can't be applied directly
either.

====================

From: Gao feng 

commit 33d99113b1102c2d2f8603b9ba72d89d915c13f5 upstream.

This commit don't have a stable tag, but it fix the bug
no reply after loopback down-up.It's very worthy to be
applied to stable 3.4 kernels.

The bug is 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951


CC: Sabrina Dubroca 
CC: Hannes Frederic Sowa 
Reported-by: Weilong Chen 
Signed-off-by: Weilong Chen 
Signed-off-by: Gao feng 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
[weilong: s/ip6_rt_put/dst_release]
Signed-off-by: Chen Weilong 
Signed-off-by: Ben Hutchings

ip: make IP identifiers less predictable

2014-09-13T22:41:48+00:00

[ Upstream commit 04ca6973f7c1a0d8537f2d9906a0cf8e69886d75 ]

In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target > A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target > A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target > A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet 
Reported-by: Jeffrey Knockel 
Reported-by: Jedidiah R. Crandall 
Cc: Willy Tarreau 
Cc: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

inetpeer: get rid of ip_id_count

2014-09-13T22:41:48+00:00

[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

Revert "net: ip, ipv6: handle gso skbs in forwarding path"

2014-08-06T17:07:31+00:00

This reverts commit caa5344994778a2b4725b2d75c74430f76925e4a, which
was commit fe6cc55f3a9a053482a76f5a6b2257cee51b4663 upstream.  In 3.2,
the transport header length is not calculated in the forwarding path,
so skb_gso_network_seglen() returns an incorrect result.  We also have
problems due to the local_df flag not being set correctly.

Signed-off-by: Ben Hutchings