<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4/tcp_timer.c, branch v3.7-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>tcp: TCP Fast Open Server - support TFO listeners</title>
<updated>2012-09-01T00:02:19+00:00</updated>
<author>
<name>Jerry Chu</name>
<email>hkchu@google.com</email>
</author>
<published>2012-08-31T12:29:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8336886f786fdacbc19b719c1f7ea91eb70706d4'/>
<id>8336886f786fdacbc19b719c1f7ea91eb70706d4</id>
<content type='text'>
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fix possible socket refcount problem</title>
<updated>2012-08-21T21:42:23+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-08-20T00:22:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=144d56e91044181ec0ef67aeca91e9a8b5718348'/>
<id>144d56e91044181ec0ef67aeca91e9a8b5718348</id>
<content type='text'>
Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&amp;task-&gt;u.tk_work)){+.+.+.}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&amp;icsk-&gt;icsk_retransmit_timer){+.-...}, at: [&lt;ffffffff81078017&gt;] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  &lt;IRQ&gt;  [&lt;ffffffff810bc527&gt;] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] ? __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff811549fa&gt;] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff818a08c0&gt;] sk_free+0x1c/0x1e
[ 2866.132281]  [&lt;ffffffff81911e1c&gt;] tcp_write_timer+0x51/0x56
[ 2866.132281]  [&lt;ffffffff81078082&gt;] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [&lt;ffffffff81078017&gt;] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [&lt;ffffffff810f5831&gt;] ? rb_commit+0x58/0x85
[ 2866.132281]  [&lt;ffffffff81911dcb&gt;] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [&lt;ffffffff81070bd6&gt;] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [&lt;ffffffff81a0a00c&gt;] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [&lt;ffffffff81a1227c&gt;] call_softirq+0x1c/0x30
[ 2866.132281]  [&lt;ffffffff81039f38&gt;] do_softirq+0x4a/0xa6
[ 2866.132281]  [&lt;ffffffff81070f2b&gt;] irq_exit+0x51/0xad
[ 2866.132281]  [&lt;ffffffff81a129cd&gt;] do_IRQ+0x9d/0xb4
[ 2866.132281]  [&lt;ffffffff81a0a3ef&gt;] common_interrupt+0x6f/0x6f
[ 2866.132281]  &lt;EOI&gt;  [&lt;ffffffff8109d006&gt;] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [&lt;ffffffff81a0a172&gt;] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [&lt;ffffffff81078692&gt;] mod_timer+0x178/0x1a9
[ 2866.132281]  [&lt;ffffffff818a00aa&gt;] sk_reset_timer+0x19/0x26
[ 2866.132281]  [&lt;ffffffff8190b2cc&gt;] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [&lt;ffffffff8190dfba&gt;] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [&lt;ffffffff8190f7ea&gt;] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [&lt;ffffffff818a565d&gt;] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [&lt;ffffffff8190f952&gt;] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [&lt;ffffffff81904122&gt;] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [&lt;ffffffff819229c2&gt;] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [&lt;ffffffff81922918&gt;] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189adab&gt;] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [&lt;ffffffff810f5de6&gt;] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [&lt;ffffffff8103e6a9&gt;] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [&lt;ffffffff8103e6f8&gt;] ? sched_clock+0x9/0xd
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189ae03&gt;] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [&lt;ffffffff8199ce49&gt;] xs_send_kvec+0x77/0x80
[ 2866.132281]  [&lt;ffffffff8199cec1&gt;] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [&lt;ffffffff8107826d&gt;] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [&lt;ffffffff8199d0d2&gt;] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [&lt;ffffffff8199bb90&gt;] xprt_transmit+0x89/0x1db
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff81999d92&gt;] call_transmit+0x1c5/0x20e
[ 2866.132281]  [&lt;ffffffff819a0d55&gt;] __rpc_execute+0x6f/0x225
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff819a0f33&gt;] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [&lt;ffffffff810835d6&gt;] process_one_work+0x24d/0x47f
[ 2866.132281]  [&lt;ffffffff81083567&gt;] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [&lt;ffffffff819a0f0b&gt;] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [&lt;ffffffff81083a6d&gt;] worker_thread+0x236/0x317
[ 2866.132281]  [&lt;ffffffff81083837&gt;] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [&lt;ffffffff8108b7b8&gt;] kthread+0x9a/0xa2
[ 2866.132281]  [&lt;ffffffff81a12184&gt;] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [&lt;ffffffff81a0a4b0&gt;] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [&lt;ffffffff8108b71e&gt;] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [&lt;ffffffff81a12180&gt;] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Tested-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&amp;task-&gt;u.tk_work)){+.+.+.}, at: [&lt;ffffffff81083567&gt;] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [&lt;ffffffff81903619&gt;] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&amp;icsk-&gt;icsk_retransmit_timer){+.-...}, at: [&lt;ffffffff81078017&gt;] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  &lt;IRQ&gt;  [&lt;ffffffff810bc527&gt;] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] ? __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff811549fa&gt;] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [&lt;ffffffff818a0839&gt;] __sk_free+0xfd/0x114
[ 2866.132281]  [&lt;ffffffff818a08c0&gt;] sk_free+0x1c/0x1e
[ 2866.132281]  [&lt;ffffffff81911e1c&gt;] tcp_write_timer+0x51/0x56
[ 2866.132281]  [&lt;ffffffff81078082&gt;] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [&lt;ffffffff81078017&gt;] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [&lt;ffffffff810f5831&gt;] ? rb_commit+0x58/0x85
[ 2866.132281]  [&lt;ffffffff81911dcb&gt;] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [&lt;ffffffff81070bd6&gt;] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [&lt;ffffffff81a0a00c&gt;] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [&lt;ffffffff81a1227c&gt;] call_softirq+0x1c/0x30
[ 2866.132281]  [&lt;ffffffff81039f38&gt;] do_softirq+0x4a/0xa6
[ 2866.132281]  [&lt;ffffffff81070f2b&gt;] irq_exit+0x51/0xad
[ 2866.132281]  [&lt;ffffffff81a129cd&gt;] do_IRQ+0x9d/0xb4
[ 2866.132281]  [&lt;ffffffff81a0a3ef&gt;] common_interrupt+0x6f/0x6f
[ 2866.132281]  &lt;EOI&gt;  [&lt;ffffffff8109d006&gt;] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [&lt;ffffffff81a0a172&gt;] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [&lt;ffffffff81078692&gt;] mod_timer+0x178/0x1a9
[ 2866.132281]  [&lt;ffffffff818a00aa&gt;] sk_reset_timer+0x19/0x26
[ 2866.132281]  [&lt;ffffffff8190b2cc&gt;] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [&lt;ffffffff8190dfba&gt;] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [&lt;ffffffff8190f7ea&gt;] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [&lt;ffffffff818a565d&gt;] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [&lt;ffffffff8190f952&gt;] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [&lt;ffffffff81904122&gt;] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [&lt;ffffffff819229c2&gt;] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [&lt;ffffffff81922918&gt;] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189adab&gt;] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [&lt;ffffffff810f5de6&gt;] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [&lt;ffffffff8103e6a9&gt;] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [&lt;ffffffff8103e6f8&gt;] ? sched_clock+0x9/0xd
[ 2866.132281]  [&lt;ffffffff810ee7f1&gt;] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [&lt;ffffffff8189ae03&gt;] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [&lt;ffffffff8199ce49&gt;] xs_send_kvec+0x77/0x80
[ 2866.132281]  [&lt;ffffffff8199cec1&gt;] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [&lt;ffffffff8107826d&gt;] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [&lt;ffffffff8199d0d2&gt;] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [&lt;ffffffff8199bb90&gt;] xprt_transmit+0x89/0x1db
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff81999d92&gt;] call_transmit+0x1c5/0x20e
[ 2866.132281]  [&lt;ffffffff819a0d55&gt;] __rpc_execute+0x6f/0x225
[ 2866.132281]  [&lt;ffffffff81999bcd&gt;] ? call_connect+0x3c/0x3c
[ 2866.132281]  [&lt;ffffffff819a0f33&gt;] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [&lt;ffffffff810835d6&gt;] process_one_work+0x24d/0x47f
[ 2866.132281]  [&lt;ffffffff81083567&gt;] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [&lt;ffffffff819a0f0b&gt;] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [&lt;ffffffff81083a6d&gt;] worker_thread+0x236/0x317
[ 2866.132281]  [&lt;ffffffff81083837&gt;] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [&lt;ffffffff8108b7b8&gt;] kthread+0x9a/0xa2
[ 2866.132281]  [&lt;ffffffff81a12184&gt;] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [&lt;ffffffff81a0a4b0&gt;] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [&lt;ffffffff8108b71e&gt;] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [&lt;ffffffff81a12180&gt;] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Tested-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: improve latencies of timer triggered events</title>
<updated>2012-07-20T17:59:41+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-20T05:45:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6f458dfb409272082c9bfa412f77ff2fc21c626f'/>
<id>6f458dfb409272082c9bfa412f77ff2fc21c626f</id>
<content type='text'>
Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: John Heffner &lt;johnwheffner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Modern TCP stack highly depends on tcp_write_timer() having a small
latency, but current implementation doesn't exactly meet the
expectations.

When a timer fires but finds the socket is owned by the user, it rearms
itself for an additional delay hoping next run will be more
successful.

tcp_write_timer() for example uses a 50ms delay for next try, and it
defeats many attempts to get predictable TCP behavior in term of
latencies.

Use the recently introduced tcp_release_cb(), so that the user owning
the socket will call various handlers right before socket release.

This will permit us to post a followup patch to address the
tcp_tso_should_defer() syndrome (some deferred packets have to wait
RTO timer to be transmitted, while cwnd should allow us to send them
sooner)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: John Heffner &lt;johnwheffner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: early retransmit: delayed fast retransmit</title>
<updated>2012-05-03T00:56:10+00:00</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2012-05-02T13:30:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=750ea2bafa55aaed208b2583470ecd7122225634'/>
<id>750ea2bafa55aaed208b2583470ecd7122225634</id>
<content type='text'>
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: ipv4: Standardize prefixes for message logging</title>
<updated>2012-03-13T00:05:21+00:00</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2012-03-12T07:03:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=afd465030acb4098abcb6b965a5aebc7ea2209e0'/>
<id>afd465030acb4098abcb6b965a5aebc7ea2209e0</id>
<content type='text'>
Add #define pr_fmt(fmt) as appropriate.

Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".

Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.

Add missing "\n" to pr_info in gre.c.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add #define pr_fmt(fmt) as appropriate.

Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".

Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.

Add missing "\n" to pr_info in gre.c.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Disambiguate kernel message</title>
<updated>2012-02-01T19:41:50+00:00</updated>
<author>
<name>Arun Sharma</name>
<email>asharma@fb.com</email>
</author>
<published>2012-01-30T22:16:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=efcdbf24fd5daa88060869e51ed49f68b7ac8708'/>
<id>efcdbf24fd5daa88060869e51ed49f68b7ac8708</id>
<content type='text'>
Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Suggested-by: Mohan Srinivasan &lt;mohan@fb.com&gt;
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Suggested-by: Mohan Srinivasan &lt;mohan@fb.com&gt;
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix assignment of 0/1 to bool variables.</title>
<updated>2011-12-20T03:27:29+00:00</updated>
<author>
<name>Rusty Russell</name>
<email>rusty@rustcorp.com.au</email>
</author>
<published>2011-12-19T13:56:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3db1cd5c05f35fb43eb134df6f321de4e63141f2'/>
<id>3db1cd5c05f35fb43eb134df6f321de4e63141f2</id>
<content type='text'>
DaveM said:
   Please, this kind of stuff rots forever and not using bool properly
   drives me crazy.

Joe Perches &lt;joe@perches.com&gt; gave me the spatch script:

	@@
	bool b;
	@@
	-b = 0
	+b = false
	@@
	bool b;
	@@
	-b = 1
	+b = true

I merely installed coccinelle, read the documentation and took credit.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DaveM said:
   Please, this kind of stuff rots forever and not using bool properly
   drives me crazy.

Joe Perches &lt;joe@perches.com&gt; gave me the spatch script:

	@@
	bool b;
	@@
	-b = 0
	+b = false
	@@
	bool b;
	@@
	-b = 1
	+b = true

I merely installed coccinelle, read the documentation and took credit.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>foundations of per-cgroup memory pressure controlling.</title>
<updated>2011-12-13T00:04:10+00:00</updated>
<author>
<name>Glauber Costa</name>
<email>glommer@parallels.com</email>
</author>
<published>2011-12-11T21:47:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=180d8cd942ce336b2c869d324855c40c5db478ad'/>
<id>180d8cd942ce336b2c869d324855c40c5db478ad</id>
<content type='text'>
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa &lt;glommer@parallels.com&gt;
Reviewed-by: Hiroyouki Kamezawa &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
CC: David S. Miller &lt;davem@davemloft.net&gt;
CC: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
CC: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa &lt;glommer@parallels.com&gt;
Reviewed-by: Hiroyouki Kamezawa &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
CC: David S. Miller &lt;davem@davemloft.net&gt;
CC: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
CC: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: use IS_ENABLED(CONFIG_IPV6)</title>
<updated>2011-12-11T23:25:16+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-12-10T09:48:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dfd56b8b38fff3586f36232db58e1e9f7885a605'/>
<id>dfd56b8b38fff3586f36232db58e1e9f7885a605</id>
<content type='text'>
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>TCP: remove TCP_DEBUG</title>
<updated>2011-10-24T21:36:08+00:00</updated>
<author>
<name>Flavio Leitner</name>
<email>fbl@redhat.com</email>
</author>
<published>2011-10-24T08:15:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=78d81d15b74246c7cedf84894434890b33da3907'/>
<id>78d81d15b74246c7cedf84894434890b33da3907</id>
<content type='text'>
It was enabled by default and the messages guarded
by the define are useful.

Signed-off-by: Flavio Leitner &lt;fbl@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It was enabled by default and the messages guarded
by the define are useful.

Signed-off-by: Flavio Leitner &lt;fbl@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
