<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4/tcp_timer.c, branch v2.6.26</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>tcp: Revert 'process defer accept as established' changes.</title>
<updated>2008-06-12T23:34:35+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2008-06-12T23:31:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ec0a196626bd12e0ba108d7daa6d95a4fb25c2c5'/>
<id>ec0a196626bd12e0ba108d7daa6d95a4fb25c2c5</id>
<content type='text'>
This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems.  The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock.  The normal ordering is parent listening
socket --&gt; child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
&gt;--- a/net/ipv4/tcp_timer.c
&gt;+++ b/net/ipv4/tcp_timer.c
&gt;@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
&gt; 		goto death;
&gt; 	}
&gt;
&gt;+	if (tp-&gt;defer_tcp_accept.request &amp;&amp; sk-&gt;sk_state == TCP_ESTABLISHED) {
&gt;+		tcp_send_active_reset(sk, GFP_ATOMIC);
&gt;+		goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.

Ilpo Järvinen first spotted the locking problems.  The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.

Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock.  The normal ordering is parent listening
socket --&gt; child socket, but this code path would require the
reverse lock ordering.

Next is a problem noticed by Vitaliy Gusev, he noted:

----------------------------------------
&gt;--- a/net/ipv4/tcp_timer.c
&gt;+++ b/net/ipv4/tcp_timer.c
&gt;@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
&gt; 		goto death;
&gt; 	}
&gt;
&gt;+	if (tp-&gt;defer_tcp_accept.request &amp;&amp; sk-&gt;sk_state == TCP_ESTABLISHED) {
&gt;+		tcp_send_active_reset(sk, GFP_ATOMIC);
&gt;+		goto death;

Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------

Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:

----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------

So revert this thing for now.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: Format addresses appropriately in debug messages.</title>
<updated>2008-04-14T11:09:36+00:00</updated>
<author>
<name>YOSHIFUJI Hideaki</name>
<email>yoshfuji@linux-ipv6.org</email>
</author>
<published>2008-04-14T11:09:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=569508c964a8b5235e00998523bc3acd3f6aff01'/>
<id>569508c964a8b5235e00998523bc3acd3f6aff01</id>
<content type='text'>
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.</title>
<updated>2008-04-14T11:09:00+00:00</updated>
<author>
<name>YOSHIFUJI Hideaki</name>
<email>yoshfuji@linux-ipv6.org</email>
</author>
<published>2008-04-14T11:09:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a7d632b6b4ad1c92746ed409e41f9dc571ec04e2'/>
<id>a7d632b6b4ad1c92746ed409e41f9dc571ec04e2</id>
<content type='text'>
And use %u to format port.

Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
And use %u to format port.

Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: TCP_DEFER_ACCEPT updates - process as established</title>
<updated>2008-03-21T23:33:01+00:00</updated>
<author>
<name>Patrick McManus</name>
<email>mcmanus@ducksong.com</email>
</author>
<published>2008-03-21T23:33:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ec3c0982a2dd1e671bad8e9d26c28dcba0039d87'/>
<id>ec3c0982a2dd1e671bad8e9d26c28dcba0039d87</id>
<content type='text'>
Change TCP_DEFER_ACCEPT implementation so that it transitions a
connection to ESTABLISHED after handshake is complete instead of
leaving it in SYN-RECV until some data arrvies. Place connection in
accept queue when first data packet arrives from slow path.

Benefits:
  - established connection is now reset if it never makes it
   to the accept queue

 - diagnostic state of established matches with the packet traces
   showing completed handshake

 - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
   enforced with reasonable accuracy instead of rounding up to next
   exponential back-off of syn-ack retry.

Signed-off-by: Patrick McManus &lt;mcmanus@ducksong.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change TCP_DEFER_ACCEPT implementation so that it transitions a
connection to ESTABLISHED after handshake is complete instead of
leaving it in SYN-RECV until some data arrvies. Place connection in
accept queue when first data packet arrives from slow path.

Benefits:
  - established connection is now reset if it never makes it
   to the accept queue

 - diagnostic state of established matches with the packet traces
   showing completed handshake

 - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
   enforced with reasonable accuracy instead of rounding up to next
   exponential back-off of syn-ack retry.

Signed-off-by: Patrick McManus &lt;mcmanus@ducksong.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer().</title>
<updated>2008-01-28T23:01:42+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2008-01-11T05:56:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9993e7d313e80bdc005d09c7def91903e0068f07'/>
<id>9993e7d313e80bdc005d09c7def91903e0068f07</id>
<content type='text'>
Otherwise we beat heavily on the global tcp_memory atomics
when all of the sockets in the system are slowly sending
perioding packet clumps.

Noticed and suggested by Eric Dumazet.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Otherwise we beat heavily on the global tcp_memory atomics
when all of the sockets in the system are slowly sending
perioding packet clumps.

Noticed and suggested by Eric Dumazet.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET] CORE: Introducing new memory accounting interface.</title>
<updated>2008-01-28T23:00:18+00:00</updated>
<author>
<name>Hideo Aoki</name>
<email>haoki@redhat.com</email>
</author>
<published>2007-12-31T08:11:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3ab224be6d69de912ee21302745ea45a99274dbc'/>
<id>3ab224be6d69de912ee21302745ea45a99274dbc</id>
<content type='text'>
This patch introduces new memory accounting functions for each network
protocol. Most of them are renamed from memory accounting functions
for stream protocols. At the same time, some stream memory accounting
functions are removed since other functions do same thing.

Renaming:
	sk_stream_free_skb()		-&gt;	sk_wmem_free_skb()
	__sk_stream_mem_reclaim()	-&gt;	__sk_mem_reclaim()
	sk_stream_mem_reclaim()		-&gt;	sk_mem_reclaim()
	sk_stream_mem_schedule 		-&gt;    	__sk_mem_schedule()
	sk_stream_pages()      		-&gt;	sk_mem_pages()
	sk_stream_rmem_schedule()	-&gt;	sk_rmem_schedule()
	sk_stream_wmem_schedule()	-&gt;	sk_wmem_schedule()
	sk_charge_skb()			-&gt;	sk_mem_charge()

Removeing
	sk_stream_rfree():	consolidates into sock_rfree()
	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
	sk_stream_mem_schedule()

The following functions are added.
    	sk_has_account(): check if the protocol supports accounting
	sk_mem_uncharge(): do the opposite of sk_mem_charge()

In addition, to achieve consolidation, updating sk_wmem_queued is
removed from sk_mem_charge().

Next, to consolidate memory accounting functions, this patch adds
memory accounting calls to network core functions. Moreover, present
memory accounting call is renamed to new accounting call.

Finally we replace present memory accounting calls with new interface
in TCP and SCTP.

Signed-off-by: Takahiro Yasui &lt;tyasui@redhat.com&gt;
Signed-off-by: Hideo Aoki &lt;haoki@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch introduces new memory accounting functions for each network
protocol. Most of them are renamed from memory accounting functions
for stream protocols. At the same time, some stream memory accounting
functions are removed since other functions do same thing.

Renaming:
	sk_stream_free_skb()		-&gt;	sk_wmem_free_skb()
	__sk_stream_mem_reclaim()	-&gt;	__sk_mem_reclaim()
	sk_stream_mem_reclaim()		-&gt;	sk_mem_reclaim()
	sk_stream_mem_schedule 		-&gt;    	__sk_mem_schedule()
	sk_stream_pages()      		-&gt;	sk_mem_pages()
	sk_stream_rmem_schedule()	-&gt;	sk_rmem_schedule()
	sk_stream_wmem_schedule()	-&gt;	sk_wmem_schedule()
	sk_charge_skb()			-&gt;	sk_mem_charge()

Removeing
	sk_stream_rfree():	consolidates into sock_rfree()
	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
	sk_stream_mem_schedule()

The following functions are added.
    	sk_has_account(): check if the protocol supports accounting
	sk_mem_uncharge(): do the opposite of sk_mem_charge()

In addition, to achieve consolidation, updating sk_wmem_queued is
removed from sk_mem_charge().

Next, to consolidate memory accounting functions, this patch adds
memory accounting calls to network core functions. Moreover, present
memory accounting call is renamed to new accounting call.

Finally we replace present memory accounting calls with new interface
in TCP and SCTP.

Signed-off-by: Takahiro Yasui &lt;tyasui@redhat.com&gt;
Signed-off-by: Hideo Aoki &lt;haoki@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: Avoid a divide in tcp_mtu_probing()</title>
<updated>2008-01-28T23:00:00+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2007-12-21T13:58:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8beb5c5f12c8484c59edf9b691f2c4bb4d31f3a0'/>
<id>8beb5c5f12c8484c59edf9b691f2c4bb4d31f3a0</id>
<content type='text'>
tcp_mtu_to_mss() being signed, compiler might emit an integer divide
to compute tcp_mtu_to_mss()/2 .

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tcp_mtu_to_mss() being signed, compiler might emit an integer divide
to compute tcp_mtu_to_mss()/2 .

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: Move mss variable in tcp_mtu_probing()</title>
<updated>2008-01-28T22:59:59+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@sunset.davemloft.net</email>
</author>
<published>2007-12-21T12:29:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=829942c18704250fce4d5eca787065a3ee7c685d'/>
<id>829942c18704250fce4d5eca787065a3ee7c685d</id>
<content type='text'>
Down into the only scope where it is used.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Down into the only scope where it is used.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: tcp_write_timeout.c cleanup</title>
<updated>2008-01-28T22:59:58+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2007-12-21T09:50:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ce55dd3610f7ac29bf8d159c2e2ace9aaf2c3038'/>
<id>ce55dd3610f7ac29bf8d159c2e2ace9aaf2c3038</id>
<content type='text'>
Before submiting a patch to change a divide to a right shift, I felt
necessary to create a helper function tcp_mtu_probing() to reduce length of
lines exceeding 100 chars in tcp_write_timeout().

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before submiting a patch to change a divide to a right shift, I felt
necessary to create a helper function tcp_mtu_probing() to reduce length of
lines exceeding 100 chars in tcp_write_timeout().

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[TCP]: Move sack_ok access to obviously named funcs &amp; cleanup</title>
<updated>2007-10-10T23:48:00+00:00</updated>
<author>
<name>Ilpo Järvinen</name>
<email>ilpo.jarvinen@helsinki.fi</email>
</author>
<published>2007-08-09T12:14:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e60402d0a909ca2e6e2fbdf9ed004ef0fae36d33'/>
<id>e60402d0a909ca2e6e2fbdf9ed004ef0fae36d33</id>
<content type='text'>
Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has user elsewhere too
for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.

Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
  enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack =&gt; I dropped it

Signed-off-by: Ilpo Järvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has user elsewhere too
for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.

Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
  enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack =&gt; I dropped it

Signed-off-by: Ilpo Järvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
