<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/core/skbuff.c, branch v2.6.38</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>net: add kmemcheck annotation in __alloc_skb()</title>
<updated>2011-01-27T22:41:06+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-01-25T23:18:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c2aa3665cf8510b1665ee2f5a9525cf7be6dec4f'/>
<id>c2aa3665cf8510b1665ee2f5a9525cf7be6dec4f</id>
<content type='text'>
pskb_expand_head() triggers a kmemcheck warning when copy of
skb_shared_info is done in pskb_expand_head()

This is because destructor_arg field is not necessarily initialized at
this point. Add kmemcheck_annotate_variable() call in __alloc_skb() to
instruct kmemcheck this is a normal situation.

Resolves bugzilla.kernel.org 27212

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=27212
Reported-by: Christian Casteyde &lt;casteyde.christian@free.fr&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
CC: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
pskb_expand_head() triggers a kmemcheck warning when copy of
skb_shared_info is done in pskb_expand_head()

This is because destructor_arg field is not necessarily initialized at
this point. Add kmemcheck_annotate_variable() call in __alloc_skb() to
instruct kmemcheck this is a normal situation.

Resolves bugzilla.kernel.org 27212

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=27212
Reported-by: Christian Casteyde &lt;casteyde.christian@free.fr&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
CC: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>GRO: fix merging a paged skb after non-paged skbs</title>
<updated>2011-01-24T22:27:18+00:00</updated>
<author>
<name>Michal Schmidt</name>
<email>mschmidt@redhat.com</email>
</author>
<published>2011-01-24T12:08:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d1dc7abf2fafa34b0ffcd070fd59405aa9c0a4d8'/>
<id>d1dc7abf2fafa34b0ffcd070fd59405aa9c0a4d8</id>
<content type='text'>
Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 -&gt;data_len is not decreased at all.
 -&gt;len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 &gt; /dev/null
 - on machine2: nc machine1 12345 &lt; /dev/zero
 - on machine1:
   echo 1 &gt; /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 &gt; /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

[v2: Modified by Eric Dumazet to avoid advancing skb-&gt;data past the end
     and to use a temporary variable.]

Signed-off-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 -&gt;data_len is not decreased at all.
 -&gt;len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 &gt; /dev/null
 - on machine2: nc machine1 12345 &lt; /dev/zero
 - on machine1:
   echo 1 &gt; /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 &gt; /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

[v2: Modified by Eric Dumazet to avoid advancing skb-&gt;data past the end
     and to use a temporary variable.]

Signed-off-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: fix compilation when conntrack is disabled but tproxy is enabled</title>
<updated>2011-01-12T19:25:08+00:00</updated>
<author>
<name>KOVACS Krisztian</name>
<email>hidden@balabit.hu</email>
</author>
<published>2011-01-12T19:25:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2fc72c7b84002ffb3c66918e2a7b0ee607d8b5aa'/>
<id>2fc72c7b84002ffb3c66918e2a7b0ee607d8b5aa</id>
<content type='text'>
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&amp;m=129010118516341&amp;w=2

Reported-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Acked-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: KOVACS Krisztian &lt;hidden@balabit.hu&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&amp;m=129010118516341&amp;w=2

Reported-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Acked-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: KOVACS Krisztian &lt;hidden@balabit.hu&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Use skb_checksum_start_offset()</title>
<updated>2010-12-16T22:43:14+00:00</updated>
<author>
<name>Michał Mirosław</name>
<email>mirq-linux@rere.qmqm.pl</email>
</author>
<published>2010-12-14T15:24:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=55508d601dab7df5cbcc7a63f4be8620eface204'/>
<id>55508d601dab7df5cbcc7a63f4be8620eface204</id>
<content type='text'>
Replace skb-&gt;csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb-&gt;data - skb-&gt;head == skb_headroom(skb).

Signed-off-by: Michał Mirosław &lt;mirq-linux@rere.qmqm.pl&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace skb-&gt;csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb-&gt;data - skb-&gt;head == skb_headroom(skb).

Signed-off-by: Michał Mirosław &lt;mirq-linux@rere.qmqm.pl&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: don't reallocate skb-&gt;head unless the current one hasn't the needed extra size or is shared</title>
<updated>2010-12-03T18:59:47+00:00</updated>
<author>
<name>Changli Gao</name>
<email>xiaosuo@gmail.com</email>
</author>
<published>2010-11-29T22:48:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ca44ac386181ba710a9ab6db900d6c1e5451b366'/>
<id>ca44ac386181ba710a9ab6db900d6c1e5451b366</id>
<content type='text'>
skb head being allocated by kmalloc(), it might be larger than what
actually requested because of discrete kmem caches sizes. Before
reallocating a new skb head, check if the current one has the needed
extra size.

Do this check only if skb head is not shared.

Signed-off-by: Changli Gao &lt;xiaosuo@gmail.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
skb head being allocated by kmalloc(), it might be larger than what
actually requested because of discrete kmem caches sizes. Before
reallocating a new skb head, check if the current one has the needed
extra size.

Do this check only if skb head is not shared.

Signed-off-by: Changli Gao &lt;xiaosuo@gmail.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6</title>
<updated>2010-10-23T18:47:02+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-10-23T18:47:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f05647dd81c11a6a165ccc8f0c1370b16f3bcb0'/>
<id>5f05647dd81c11a6a165ccc8f0c1370b16f3bcb0</id>
<content type='text'>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
</pre>
</div>
</content>
</entry>
<entry>
<title>net: allocate skbs on local node</title>
<updated>2010-10-16T18:13:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-11T19:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=564824b0c52c34692d804bb6ea214451615b0b50'/>
<id>564824b0c52c34692d804bb6ea214451615b0b50</id>
<content type='text'>
commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.

skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.

In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.

Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.

skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.

In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.

Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core</title>
<updated>2010-09-15T08:27:31+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2010-09-15T08:27:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3aabae7d9dfaed60effe93662f02c19bafc18537'/>
<id>3aabae7d9dfaed60effe93662f02c19bafc18537</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6</title>
<updated>2010-09-10T05:27:33+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2010-09-10T05:27:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e548833df83c3554229eff0672900bfe958b45fd'/>
<id>e548833df83c3554229eff0672900bfe958b45fd</id>
<content type='text'>
Conflicts:
	net/mac80211/main.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	net/mac80211/main.c
</pre>
</div>
</content>
</entry>
<entry>
<title>gro: Re-fix different skb headrooms</title>
<updated>2010-09-08T17:32:15+00:00</updated>
<author>
<name>Jarek Poplawski</name>
<email>jarkao2@gmail.com</email>
</author>
<published>2010-09-04T10:34:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=64289c8e6851bca0e589e064c9a5c9fbd6ae5dd4'/>
<id>64289c8e6851bca0e589e064c9a5c9fbd6ae5dd4</id>
<content type='text'>
The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p-&gt;data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57

Reported-by: Plamen Petrov &lt;pvp-lsts@fs.uni-ruse.bg&gt;
Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
CC: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Tested-by: Plamen Petrov &lt;pvp-lsts@fs.uni-ruse.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p-&gt;data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57

Reported-by: Plamen Petrov &lt;pvp-lsts@fs.uni-ruse.bg&gt;
Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
CC: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Tested-by: Plamen Petrov &lt;pvp-lsts@fs.uni-ruse.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
