<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/dccp, branch v2.6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>[DCCP]: Clear the IPCB area</title>
<updated>2005-10-20T16:49:59+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2005-10-18T02:03:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=49c5bfaffe8ae6e6440dc4bf78b03800960d93f5'/>
<id>49c5bfaffe8ae6e6440dc4bf78b03800960d93f5</id>
<content type='text'>
Turns out the problem has nothing to do with use-after-free or double-free.
It's just that we're not clearing the CB area and DCCP unlike TCP uses a CB
format that's incompatible with IP.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Ian McDonald &lt;imcdnzl@gmail.com&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Turns out the problem has nothing to do with use-after-free or double-free.
It's just that we're not clearing the CB area and DCCP unlike TCP uses a CB
format that's incompatible with IP.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Ian McDonald &lt;imcdnzl@gmail.com&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[DCCP]: Make dccp_write_xmit always free the packet</title>
<updated>2005-10-20T16:44:29+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2005-10-16T11:08:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ffa29347dfbc158d1f47f5925324a6f5713659c1'/>
<id>ffa29347dfbc158d1f47f5925324a6f5713659c1</id>
<content type='text'>
icmp_send doesn't use skb-&gt;sk at all so even if skb-&gt;sk has already
been freed it can't cause crash there (it would've crashed somewhere
else first, e.g., ip_queue_xmit).

I found a double-free on an skb that could explain this though.
dccp_sendmsg and dccp_write_xmit are a little confused as to what
should free the packet when something goes wrong.  Sometimes they
both go for the ball and end up in each other's way.

This patch makes dccp_write_xmit always free the packet no matter
what.  This makes sense since dccp_transmit_skb which in turn comes
from the fact that ip_queue_xmit always frees the packet.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
icmp_send doesn't use skb-&gt;sk at all so even if skb-&gt;sk has already
been freed it can't cause crash there (it would've crashed somewhere
else first, e.g., ip_queue_xmit).

I found a double-free on an skb that could explain this though.
dccp_sendmsg and dccp_write_xmit are a little confused as to what
should free the packet when something goes wrong.  Sometimes they
both go for the ball and end up in each other's way.

This patch makes dccp_write_xmit always free the packet no matter
what.  This makes sense since dccp_transmit_skb which in turn comes
from the fact that ip_queue_xmit always frees the packet.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[DCCP]: Use skb_set_owner_w in dccp_transmit_skb when skb-&gt;sk is NULL</title>
<updated>2005-10-20T16:25:28+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2005-10-14T06:38:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fda0fd6c5b722cc48e904e0daafedca275d332af'/>
<id>fda0fd6c5b722cc48e904e0daafedca275d332af</id>
<content type='text'>
David S. Miller &lt;davem@davemloft.net&gt; wrote:
&gt; One thing you can probably do for this bug is to mark data packets
&gt; explicitly somehow, perhaps in the SKB control block DCCP already
&gt; uses for other data.  Put some boolean in there, set it true for
&gt; data packets.  Then change the test in dccp_transmit_skb() as
&gt; appropriate to test the boolean flag instead of "skb_cloned(skb)".

I agree.  In fact we already have that flag, it's called skb-&gt;sk.
So here is patch to test that instead of skb_cloned().

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Ian McDonald &lt;imcdnzl@gmail.com&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
David S. Miller &lt;davem@davemloft.net&gt; wrote:
&gt; One thing you can probably do for this bug is to mark data packets
&gt; explicitly somehow, perhaps in the SKB control block DCCP already
&gt; uses for other data.  Put some boolean in there, set it true for
&gt; data packets.  Then change the test in dccp_transmit_skb() as
&gt; appropriate to test the boolean flag instead of "skb_cloned(skb)".

I agree.  In fact we already have that flag, it's called skb-&gt;sk.
So here is patch to test that instead of skb_cloned().

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Ian McDonald &lt;imcdnzl@gmail.com&gt;
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[DCCP]: Transition from PARTOPEN to OPEN when receiving DATA packets</title>
<updated>2005-10-11T04:25:00+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@ghostprotocols.net</email>
</author>
<published>2005-10-11T04:25:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2a9bc9bb4d3a4570a8a48aadf071b91e657adb89'/>
<id>2a9bc9bb4d3a4570a8a48aadf071b91e657adb89</id>
<content type='text'>
Noticed by Andrea Bittau, that provided a patch that was modified to
not transition from RESPOND to OPEN when receiving DATA packets.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Noticed by Andrea Bittau, that provided a patch that was modified to
not transition from RESPOND to OPEN when receiving DATA packets.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[CCID]: Check if ccid is NULL in the hc_[tr]x_exit functions</title>
<updated>2005-10-11T04:24:20+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@ghostprotocols.net</email>
</author>
<published>2005-10-11T04:24:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=777b25a2fea7129222eb11fba55c0a67982383ff'/>
<id>777b25a2fea7129222eb11fba55c0a67982383ff</id>
<content type='text'>
For consistency with ccid_exit and to fix a bug when
IP_DCCP_UNLOAD_HACK is enabled as the control sock is not associated
to any CCID.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For consistency with ccid_exit and to fix a bug when
IP_DCCP_UNLOAD_HACK is enabled as the control sock is not associated
to any CCID.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] gfp flags annotations - part 1</title>
<updated>2005-10-08T22:00:57+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@ftp.linux.org.uk</email>
</author>
<published>2005-10-07T06:46:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7'/>
<id>dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7</id>
<content type='text'>
 - added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: speedup inet (tcp/dccp) lookups</title>
<updated>2005-10-03T21:13:38+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2005-10-03T21:13:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=81c3d5470ecc70564eb9209946730fe2be93ad06'/>
<id>81c3d5470ecc70564eb9209946730fe2be93ad06</id>
<content type='text'>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[DCCP]: Introduce CCID getsockopt for the CCIDs</title>
<updated>2005-09-18T07:19:32+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-09-18T07:19:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=88f964db6ef728982734356bf4c406270ea29c1d'/>
<id>88f964db6ef728982734356bf4c406270ea29c1d</id>
<content type='text'>
Allocation for the optnames is similar to the DCCP options, with a
range for rx and tx half connection CCIDs.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allocation for the optnames is similar to the DCCP options, with a
range for rx and tx half connection CCIDs.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[DCCP]: Don't use necessarily the same CCID for tx and rx</title>
<updated>2005-09-18T07:18:52+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-09-18T07:18:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=561713cf475de1f671cc89c437927ec008a20209'/>
<id>561713cf475de1f671cc89c437927ec008a20209</id>
<content type='text'>
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[CCID3]: Introduce include/linux/tfrc.h</title>
<updated>2005-09-18T07:18:32+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-09-18T07:18:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=65299d6c3cfb49cc3eee4fc483e7edd23ea7b2ed'/>
<id>65299d6c3cfb49cc3eee4fc483e7edd23ea7b2ed</id>
<content type='text'>
Moving the TFRC sender and receiver variables to separate structs, so
that we can copy these structs to userspace thru getsockopt,
dccp_diag, etc.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Moving the TFRC sender and receiver variables to separate structs, so
that we can copy these structs to userspace thru getsockopt,
dccp_diag, etc.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
