linux-stable.git/net/ipv6/ip6_output.c, branch linux-3.7.y

ipv6: fix header length calculation in ip6_append_data()

2013-02-14T18:48:04+00:00

[ Upstream commit 7efdba5bd9a2f3e2059beeb45c9fa55eefe1bced ]

Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem)
has introduced a error in the header length calculation that
provokes corrupted packets when non-fragmentable extensions
headers (Destination Option or Routing Header Type 2) are used.

rt->rt6i_nfheader_len is the length of the non-fragmentable
extension header, and it should be substracted to
rt->dst.header_len, and not to exthdrlen, as it was done before
commit 299b0767.

This patch reverts to the original and correct behavior. It has
been successfully tested with and without IPsec on packets
that include non-fragmentable extensions headers.

Signed-off-by: Romain Kuntz 
Acked-by: Steffen Klassert 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: use a per task frag allocator

2012-09-24T20:31:37+00:00

We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet 
Cc: Ben Hutchings 
Cc: Vijay Subramanian 
Cc: Alexander Duyck 
Tested-by: Vijay Subramanian 
Signed-off-by: David S. Miller

ipv6: remove some useless RCU read lock

2012-09-10T20:31:18+00:00

After this commit:
	commit 97cac0821af4474ec4ba3a9e7a36b98ed9b6db88
	Author: David S. Miller 
	Date:   Mon Jul 2 22:43:47 2012 -0700

	    ipv6: Store route neighbour in rt6_info struct.

we no longer use RCU to protect route neighbour.

Cc: "David S. Miller" 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller

netfilter: nf_conntrack_ipv6: improve fragmentation handling

2012-08-30T01:00:10+00:00

The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer .

Signed-off-by: Patrick McHardy

inet: Minimize use of cached route inetpeer.

2012-07-11T05:40:11+00:00

Only use it in the absolutely required cases:

1) COW'ing metrics

2) ipv4 PMTU

3) ipv4 redirects

Signed-off-by: David S. Miller

ipv6: fix a bad cast in ip6_dst_lookup_tail()

2012-07-06T07:23:41+00:00

Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is
"struct dst_entry **", not "struct dst_entry *"

Reported-by: Fengguang Wu 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

ipv6: Store route neighbour in rt6_info struct.

2012-07-05T09:41:58+00:00

This makes for a simplified conversion away from dst_get_neighbour*().

All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().

Signed-off-by: David S. Miller

net: Do delayed neigh confirmation.

2012-07-05T08:03:06+00:00

When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.

Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2012-06-13T04:59:18+00:00

Conflicts:
	MAINTAINERS
	drivers/net/wireless/iwlwifi/pcie/trans.c

The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.

The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.

Signed-off-by: David S. Miller

net-next: add dev_loopback_xmit() to avoid duplicate code

2012-06-13T01:51:09+00:00

Add dev_loopback_xmit() in order to deduplicate functions
ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).

I was about to reinvent the wheel when I noticed that
ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
need and are not IP-only functions, but they were not available to reuse
elsewhere.

ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
understand that this is harmless, and should be in dev_loopback_xmit().

Signed-off-by: Michel Machado 
CC: "David S. Miller" 
CC: Alexey Kuznetsov 
CC: James Morris 
CC: Hideaki YOSHIFUJI 
CC: Patrick McHardy 
CC: Eric Dumazet 
CC: Jiri Pirko 
CC: "Michał Mirosław" 
CC: Ben Hutchings 
Signed-off-by: David S. Miller