<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/include/net/sock.h, branch v2.6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>[PATCH] gfp flags annotations - part 1</title>
<updated>2005-10-08T22:00:57+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@ftp.linux.org.uk</email>
</author>
<published>2005-10-07T06:46:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7'/>
<id>dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7</id>
<content type='text'>
 - added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: speedup inet (tcp/dccp) lookups</title>
<updated>2005-10-03T21:13:38+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2005-10-03T21:13:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=81c3d5470ecc70564eb9209946730fe2be93ad06'/>
<id>81c3d5470ecc70564eb9209946730fe2be93ad06</id>
<content type='text'>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Do not protect sysctl_optmem_max with CONFIG_SYSCTL</title>
<updated>2005-09-06T01:14:11+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@sunset.davemloft.net</email>
</author>
<published>2005-09-06T01:14:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6baf1f417d092bd2de7c8892cecad456024c993f'/>
<id>6baf1f417d092bd2de7c8892cecad456024c993f</id>
<content type='text'>
The ipv4 and ipv6 protocols need to access it unconditionally.
SYSCTL=n build failure reported by Russell King.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ipv4 and ipv6 protocols need to access it unconditionally.
SYSCTL=n build failure reported by Russell King.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: Fix sk_forward_alloc underflow in tcp_sendmsg</title>
<updated>2005-09-02T00:48:59+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2005-09-02T00:48:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ef015786152adaff5a6a8bf0c8ea2f70cee8059d'/>
<id>ef015786152adaff5a6a8bf0c8ea2f70cee8059d</id>
<content type='text'>
I've finally found a potential cause of the sk_forward_alloc underflows
that people have been reporting sporadically.

When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't
check sk_forward_alloc even though a large amount of time may have
elapsed since we allocated the page.  In the mean time someone could've
come along and liberated packets and reclaimed sk_forward_alloc memory.

This patch makes tcp_sendmsg check sk_forward_alloc every time as we
do in do_tcp_sendpages.
 
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I've finally found a potential cause of the sk_forward_alloc underflows
that people have been reporting sporadically.

When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't
check sk_forward_alloc even though a large amount of time may have
elapsed since we allocated the page.  In the mean time someone could've
come along and liberated packets and reclaimed sk_forward_alloc memory.

This patch makes tcp_sendmsg check sk_forward_alloc every time as we
do in do_tcp_sendpages.
 
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Add sk_stream_wmem_schedule</title>
<updated>2005-09-02T00:48:23+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2005-09-02T00:48:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d80d99d643090c3cf2b1f9fb3fadd1256f7e384f'/>
<id>d80d99d643090c3cf2b1f9fb3fadd1256f7e384f</id>
<content type='text'>
This patch introduces sk_stream_wmem_schedule as a short-hand for
the sk_forward_alloc checking on egress.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch introduces sk_stream_wmem_schedule as a short-hand for
the sk_forward_alloc checking on egress.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: fix PROC_FS=n compile</title>
<updated>2005-08-29T23:02:38+00:00</updated>
<author>
<name>Adrian Bunk</name>
<email>bunk@stusta.de</email>
</author>
<published>2005-08-21T00:14:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8cd25c1fcfbf6460983e99091d278187421c1a1d'/>
<id>8cd25c1fcfbf6460983e99091d278187421c1a1d</id>
<content type='text'>
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Implement SKB fast cloning.</title>
<updated>2005-08-29T23:01:54+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2005-08-17T21:57:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d179cd12928443f3ec29cfbc3567439644bd0afc'/>
<id>d179cd12928443f3ec29cfbc3567439644bd0afc</id>
<content type='text'>
Protocols that make extensive use of SKB cloning,
for example TCP, eat at least 2 allocations per
packet sent as a result.

To cut the kmalloc() count in half, we implement
a pre-allocation scheme wherein we allocate
2 sk_buff objects in advance, then use a simple
reference count to free up the memory at the
correct time.

Based upon an initial patch by Thomas Graf and
suggestions from Herbert Xu.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Protocols that make extensive use of SKB cloning,
for example TCP, eat at least 2 allocations per
packet sent as a result.

To cut the kmalloc() count in half, we implement
a pre-allocation scheme wherein we allocate
2 sk_buff objects in advance, then use a simple
reference count to free up the memory at the
correct time.

Based upon an initial patch by Thomas Graf and
suggestions from Herbert Xu.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Fix sparse warnings</title>
<updated>2005-08-29T23:01:32+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-08-16T05:18:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=20380731bc2897f2952ae055420972ded4cd786e'/>
<id>20380731bc2897f2952ae055420972ded4cd786e</id>
<content type='text'>
Of this type, mostly:

CHECK   net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Of this type, mostly:

CHECK   net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Store skb-&gt;timestamp as offset to a base timestamp</title>
<updated>2005-08-29T22:58:24+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2005-08-15T00:24:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a61bbcf28a8cb0ba56f8193d512f7222e711a294'/>
<id>a61bbcf28a8cb0ba56f8193d512f7222e711a294</id>
<content type='text'>
Reduces skb size by 8 bytes on 64-bit.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reduces skb size by 8 bytes on 64-bit.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Make NETDEBUG pure printk wrappers</title>
<updated>2005-08-29T22:56:08+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2005-08-10T03:50:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=64ce207306debd7157f47282be94770407bec01c'/>
<id>64ce207306debd7157f47282be94770407bec01c</id>
<content type='text'>
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
