<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/include/net/inet_hashtables.h, branch v2.6.18</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Don't include linux/config.h from anywhere else in include/</title>
<updated>2006-04-26T11:56:16+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw2@infradead.org</email>
</author>
<published>2006-04-26T11:56:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=62c4f0a2d5a188f73a94f2cb8ea0dba3e7cf0a7f'/>
<id>62c4f0a2d5a188f73a94f2cb8ea0dba3e7cf0a7f</id>
<content type='text'>
Signed-off-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET_SOCK]: Move struct inet_sock &amp; helper functions to net/inet_sock.h</title>
<updated>2006-01-03T21:11:21+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-12-27T04:43:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=14c850212ed8f8cbb5972ad6b8812e08a0bc901c'/>
<id>14c850212ed8f8cbb5972ad6b8812e08a0bc901c</id>
<content type='text'>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: Generalise tcp_v4_hash_connect</title>
<updated>2006-01-03T21:10:55+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-12-14T07:25:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a7f5e7f164788a22eb5d3de8e2d3cee1bf58fdca'/>
<id>a7f5e7f164788a22eb5d3de8e2d3cee1bf58fdca</id>
<content type='text'>
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP/DCCP]: Randomize port selection</title>
<updated>2005-11-05T23:23:15+00:00</updated>
<author>
<name>Stephen Hemminger</name>
<email>shemminger@osdl.org</email>
</author>
<published>2005-11-04T00:33:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6df716340da3a6fdd33d73d7ed4c6f7590ca1c42'/>
<id>6df716340da3a6fdd33d73d7ed4c6f7590ca1c42</id>
<content type='text'>
This patch randomizes the port selected on bind() for connections
to help with possible security attacks. It should also be faster
in most cases because there is no need for a global lock.

Signed-off-by: Stephen Hemminger &lt;shemminger@osdl.org&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch randomizes the port selected on bind() for connections
to help with possible security attacks. It should also be faster
in most cases because there is no need for a global lock.

Signed-off-by: Stephen Hemminger &lt;shemminger@osdl.org&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: Shrink struct inet_ehash_bucket on 32 bits UP</title>
<updated>2005-10-04T22:55:51+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2005-10-04T22:55:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6d2553612fa329979e6423a5f2410fd7be5aa902'/>
<id>6d2553612fa329979e6423a5f2410fd7be5aa902</id>
<content type='text'>
No need to align struct inet_ehash_bucket on a 8 bytes boundary.

On 32 bits Uniprocessor, that's a waste of 4 bytes per struct (50 %)

On other platforms, the attribute is useless, natual alignement is already 8.

platform     | Size before | Size after patch
-------------+-------------+------------------
32 bits, UP  |         8   |     4
32 bits, SMP |         8   |     8
64 bits, UP  |         8   |     8
64 bits, SMP |        16   |    16

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No need to align struct inet_ehash_bucket on a 8 bytes boundary.

On 32 bits Uniprocessor, that's a waste of 4 bytes per struct (50 %)

On other platforms, the attribute is useless, natual alignement is already 8.

platform     | Size before | Size after patch
-------------+-------------+------------------
32 bits, UP  |         8   |     4
32 bits, SMP |         8   |     8
64 bits, UP  |         8   |     8
64 bits, SMP |        16   |    16

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: speedup inet (tcp/dccp) lookups</title>
<updated>2005-10-03T21:13:38+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2005-10-03T21:13:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=81c3d5470ecc70564eb9209946730fe2be93ad06'/>
<id>81c3d5470ecc70564eb9209946730fe2be93ad06</id>
<content type='text'>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[ICSK]: Generalise tcp_listen_{start,stop}</title>
<updated>2005-08-29T22:49:24+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@ghostprotocols.net</email>
</author>
<published>2005-08-10T03:11:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0a5578cf8e5e045aaa68643c17ce885426697c6b'/>
<id>0a5578cf8e5e045aaa68643c17ce885426697c6b</id>
<content type='text'>
This also moved inet_iif from tcp to inet_hashtables.h, as it is
needed by the inet_lookup callers, perhaps this needs a bit of
polishing, but for now seems fine.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This also moved inet_iif from tcp to inet_hashtables.h, as it is
needed by the inet_lookup callers, perhaps this needs a bit of
polishing, but for now seems fine.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Introduce inet_connection_sock</title>
<updated>2005-08-29T22:43:19+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@ghostprotocols.net</email>
</author>
<published>2005-08-10T03:10:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=463c84b97f24010a67cd871746d6a7e4c925a5f9'/>
<id>463c84b97f24010a67cd871746d6a7e4c925a5f9</id>
<content type='text'>
This creates struct inet_connection_sock, moving members out of struct
tcp_sock that are shareable with other INET connection oriented
protocols, such as DCCP, that in my private tree already uses most of
these members.

The functions that operate on these members were renamed, using a
inet_csk_ prefix while not being moved yet to a new file, so as to
ease the review of these changes.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This creates struct inet_connection_sock, moving members out of struct
tcp_sock that are shareable with other INET connection oriented
protocols, such as DCCP, that in my private tree already uses most of
these members.

The functions that operate on these members were renamed, using a
inet_csk_ prefix while not being moved yet to a new file, so as to
ease the review of these changes.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: Generalise the TCP sock ID lookup routines</title>
<updated>2005-08-29T22:42:18+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@ghostprotocols.net</email>
</author>
<published>2005-08-10T03:09:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e48c414ee61f4ac8d5cff2973e66a7cbc8a93aa5'/>
<id>e48c414ee61f4ac8d5cff2973e66a7cbc8a93aa5</id>
<content type='text'>
And also some TIME_WAIT functions.

[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 282955   13122    9312  305389   4a8ed net/ipv4/built-in.o
/tmp/after.size:  281566   13122    9312  304000   4a380 net/ipv4/built-in.o
[acme@toy net-2.6.14]$

I kept them still inlined, will uninline at some point to see what
would be the performance difference.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
And also some TIME_WAIT functions.

[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 282955   13122    9312  305389   4a8ed net/ipv4/built-in.o
/tmp/after.size:  281566   13122    9312  304000   4a380 net/ipv4/built-in.o
[acme@toy net-2.6.14]$

I kept them still inlined, will uninline at some point to see what
would be the performance difference.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: Generalise tcp_tw_bucket, aka TIME_WAIT sockets</title>
<updated>2005-08-29T22:42:13+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@ghostprotocols.net</email>
</author>
<published>2005-08-10T03:09:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8feaf0c0a5488b3d898a9c207eb6678f44ba3f26'/>
<id>8feaf0c0a5488b3d898a9c207eb6678f44ba3f26</id>
<content type='text'>
This paves the way to generalise the rest of the sock ID lookup
routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
kernels (where IPv6 is always built as a module):

[root@qemu ~]# grep tw_sock /proc/slabinfo
tw_sock_TCPv6  0  0  128  31  1
tw_sock_TCP    0  0   96  41  1
[root@qemu ~]#

Now if a protocol wants to use the TIME_WAIT generic infrastructure it
only has to set the sk_prot-&gt;twsk_obj_size field with the size of its
inet_timewait_sock derived sock and proto_register will create
sk_prot-&gt;twsk_slab, for now its only for INET sockets, but we can
introduce timewait_sock later if some non INET transport protocolo
wants to use this stuff.

Next changesets will take advantage of this new infrastructure to
generalise even more TCP code.

[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 188646   11764    5068  205478   322a6 net/ipv4/built-in.o
/tmp/after.size:  188144   11764    5068  204976   320b0 net/ipv4/built-in.o
[acme@toy net-2.6.14]$

Tested with both IPv4 &amp; IPv6 (::1 (localhost) &amp; ::ffff:172.20.0.1
(qemu host)).

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This paves the way to generalise the rest of the sock ID lookup
routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro
kernels (where IPv6 is always built as a module):

[root@qemu ~]# grep tw_sock /proc/slabinfo
tw_sock_TCPv6  0  0  128  31  1
tw_sock_TCP    0  0   96  41  1
[root@qemu ~]#

Now if a protocol wants to use the TIME_WAIT generic infrastructure it
only has to set the sk_prot-&gt;twsk_obj_size field with the size of its
inet_timewait_sock derived sock and proto_register will create
sk_prot-&gt;twsk_slab, for now its only for INET sockets, but we can
introduce timewait_sock later if some non INET transport protocolo
wants to use this stuff.

Next changesets will take advantage of this new infrastructure to
generalise even more TCP code.

[acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size
/tmp/before.size: 188646   11764    5068  205478   322a6 net/ipv4/built-in.o
/tmp/after.size:  188144   11764    5068  204976   320b0 net/ipv4/built-in.o
[acme@toy net-2.6.14]$

Tested with both IPv4 &amp; IPv6 (::1 (localhost) &amp; ::ffff:172.20.0.1
(qemu host)).

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
