<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/ipv6/ip6_output.c, branch linux-3.7.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>ipv6: fix header length calculation in ip6_append_data()</title>
<updated>2013-02-14T18:48:04+00:00</updated>
<author>
<name>Romain KUNTZ</name>
<email>r.kuntz@ipflavors.com</email>
</author>
<published>2013-01-16T12:47:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c716ba51a90f43e58aeda9cc7afb937977592ffc'/>
<id>c716ba51a90f43e58aeda9cc7afb937977592ffc</id>
<content type='text'>
[ Upstream commit 7efdba5bd9a2f3e2059beeb45c9fa55eefe1bced ]

Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem)
has introduced a error in the header length calculation that
provokes corrupted packets when non-fragmentable extensions
headers (Destination Option or Routing Header Type 2) are used.

rt-&gt;rt6i_nfheader_len is the length of the non-fragmentable
extension header, and it should be substracted to
rt-&gt;dst.header_len, and not to exthdrlen, as it was done before
commit 299b0767.

This patch reverts to the original and correct behavior. It has
been successfully tested with and without IPsec on packets
that include non-fragmentable extensions headers.

Signed-off-by: Romain Kuntz &lt;r.kuntz@ipflavors.com&gt;
Acked-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 7efdba5bd9a2f3e2059beeb45c9fa55eefe1bced ]

Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem)
has introduced a error in the header length calculation that
provokes corrupted packets when non-fragmentable extensions
headers (Destination Option or Routing Header Type 2) are used.

rt-&gt;rt6i_nfheader_len is the length of the non-fragmentable
extension header, and it should be substracted to
rt-&gt;dst.header_len, and not to exthdrlen, as it was done before
commit 299b0767.

This patch reverts to the original and correct behavior. It has
been successfully tested with and without IPsec on packets
that include non-fragmentable extensions headers.

Signed-off-by: Romain Kuntz &lt;r.kuntz@ipflavors.com&gt;
Acked-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: use a per task frag allocator</title>
<updated>2012-09-24T20:31:37+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-09-23T23:04:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81'/>
<id>5640f7685831e088fe6c2e1f863a6805962f8e81</id>
<content type='text'>
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk-&gt;sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Tested-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk-&gt;sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Tested-by: Vijay Subramanian &lt;subramanian.vijay@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: remove some useless RCU read lock</title>
<updated>2012-09-10T20:31:18+00:00</updated>
<author>
<name>Amerigo Wang</name>
<email>amwang@redhat.com</email>
</author>
<published>2012-09-10T02:48:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fdd6681d92a70b3db73cdb61c6b4053f2f8003b3'/>
<id>fdd6681d92a70b3db73cdb61c6b4053f2f8003b3</id>
<content type='text'>
After this commit:
	commit 97cac0821af4474ec4ba3a9e7a36b98ed9b6db88
	Author: David S. Miller &lt;davem@davemloft.net&gt;
	Date:   Mon Jul 2 22:43:47 2012 -0700

	    ipv6: Store route neighbour in rt6_info struct.

we no longer use RCU to protect route neighbour.

Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;amwang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After this commit:
	commit 97cac0821af4474ec4ba3a9e7a36b98ed9b6db88
	Author: David S. Miller &lt;davem@davemloft.net&gt;
	Date:   Mon Jul 2 22:43:47 2012 -0700

	    ipv6: Store route neighbour in rt6_info struct.

we no longer use RCU to protect route neighbour.

Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;amwang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack_ipv6: improve fragmentation handling</title>
<updated>2012-08-30T01:00:10+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2012-08-26T17:13:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4cdd34084d539c758d00c5dc7bf95db2e4f2bc70'/>
<id>4cdd34084d539c758d00c5dc7bf95db2e4f2bc70</id>
<content type='text'>
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: Minimize use of cached route inetpeer.</title>
<updated>2012-07-11T05:40:11+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-10T10:58:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1d861aa4b3fb08822055345f480850205ffe6170'/>
<id>1d861aa4b3fb08822055345f480850205ffe6170</id>
<content type='text'>
Only use it in the absolutely required cases:

1) COW'ing metrics

2) ipv4 PMTU

3) ipv4 redirects

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Only use it in the absolutely required cases:

1) COW'ing metrics

2) ipv4 PMTU

3) ipv4 redirects

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: fix a bad cast in ip6_dst_lookup_tail()</title>
<updated>2012-07-06T07:23:41+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-06T07:19:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c56bf6fe785abbd83751a462f0c7067f7145b97a'/>
<id>c56bf6fe785abbd83751a462f0c7067f7145b97a</id>
<content type='text'>
Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is
"struct dst_entry **", not "struct dst_entry *"

Reported-by: Fengguang Wu &lt;wfg@linux.intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is
"struct dst_entry **", not "struct dst_entry *"

Reported-by: Fengguang Wu &lt;wfg@linux.intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: Store route neighbour in rt6_info struct.</title>
<updated>2012-07-05T09:41:58+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-03T05:43:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=97cac0821af4474ec4ba3a9e7a36b98ed9b6db88'/>
<id>97cac0821af4474ec4ba3a9e7a36b98ed9b6db88</id>
<content type='text'>
This makes for a simplified conversion away from dst_get_neighbour*().

All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This makes for a simplified conversion away from dst_get_neighbour*().

All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Do delayed neigh confirmation.</title>
<updated>2012-07-05T08:03:06+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-02T09:21:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5110effee8fde2edfacac9cd12a9960ab2dc39ea'/>
<id>5110effee8fde2edfacac9cd12a9960ab2dc39ea</id>
<content type='text'>
When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2012-06-13T04:59:18+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-06-13T04:59:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=43b03f1f6d6832d744918947d185a7aee89d1e0f'/>
<id>43b03f1f6d6832d744918947d185a7aee89d1e0f</id>
<content type='text'>
Conflicts:
	MAINTAINERS
	drivers/net/wireless/iwlwifi/pcie/trans.c

The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.

The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	MAINTAINERS
	drivers/net/wireless/iwlwifi/pcie/trans.c

The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.

The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net-next: add dev_loopback_xmit() to avoid duplicate code</title>
<updated>2012-06-13T01:51:09+00:00</updated>
<author>
<name>Michel Machado</name>
<email>michel@digirati.com.br</email>
</author>
<published>2012-06-12T10:16:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=95603e2293de556de7e82221649bfd7fd98b64a3'/>
<id>95603e2293de556de7e82221649bfd7fd98b64a3</id>
<content type='text'>
Add dev_loopback_xmit() in order to deduplicate functions
ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).

I was about to reinvent the wheel when I noticed that
ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
need and are not IP-only functions, but they were not available to reuse
elsewhere.

ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
understand that this is harmless, and should be in dev_loopback_xmit().

Signed-off-by: Michel Machado &lt;michel@digirati.com.br&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Alexey Kuznetsov &lt;kuznet@ms2.inr.ac.ru&gt;
CC: James Morris &lt;jmorris@namei.org&gt;
CC: Hideaki YOSHIFUJI &lt;yoshfuji@linux-ipv6.org&gt;
CC: Patrick McHardy &lt;kaber@trash.net&gt;
CC: Eric Dumazet &lt;edumazet@google.com&gt;
CC: Jiri Pirko &lt;jpirko@redhat.com&gt;
CC: "Michał Mirosław" &lt;mirq-linux@rere.qmqm.pl&gt;
CC: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add dev_loopback_xmit() in order to deduplicate functions
ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).

I was about to reinvent the wheel when I noticed that
ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
need and are not IP-only functions, but they were not available to reuse
elsewhere.

ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
understand that this is harmless, and should be in dev_loopback_xmit().

Signed-off-by: Michel Machado &lt;michel@digirati.com.br&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Alexey Kuznetsov &lt;kuznet@ms2.inr.ac.ru&gt;
CC: James Morris &lt;jmorris@namei.org&gt;
CC: Hideaki YOSHIFUJI &lt;yoshfuji@linux-ipv6.org&gt;
CC: Patrick McHardy &lt;kaber@trash.net&gt;
CC: Eric Dumazet &lt;edumazet@google.com&gt;
CC: Jiri Pirko &lt;jpirko@redhat.com&gt;
CC: "Michał Mirosław" &lt;mirq-linux@rere.qmqm.pl&gt;
CC: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
