<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/Documentation/networking/ip-sysctl.txt, branch linux-3.3.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>tcp: change tcp_adv_win_scale and tcp_rmem[2]</title>
<updated>2012-05-12T16:32:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-05-02T02:28:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4f3f1e28d64084bae113cf9d3ca516bdc43046ad'/>
<id>4f3f1e28d64084bae113cf9d3ca516bdc43046ad</id>
<content type='text'>
[ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb-&gt;len / skb-&gt;truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 &lt; 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb-&gt;len / skb-&gt;truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 &lt; 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2011-12-07T02:10:05+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-12-07T02:10:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=959327c7842e8621e28b89acea7d57ff02b60972'/>
<id>959327c7842e8621e28b89acea7d57ff02b60972</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4:correct description for tcp_max_syn_backlog</title>
<updated>2011-12-06T18:02:28+00:00</updated>
<author>
<name>Peter Pan(潘卫平)</name>
<email>panweiping3@gmail.com</email>
</author>
<published>2011-12-05T21:39:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=99b53bdd810611cc178e1a86bc112d8f4f56a1e9'/>
<id>99b53bdd810611cc178e1a86bc112d8f4f56a1e9</id>
<content type='text'>
Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning),
sysctl_max_syn_backlog is determined by tcp_hashinfo-&gt;ehash_mask,
and the minimal value is 128, and it will increase in proportion to the
memory of machine.
The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog
are out of date.

Changelog:
V2: update description for sysctl_max_syn_backlog

Signed-off-by: Weiping Pan &lt;panweiping3@gmail.com&gt;
Reviewed-by: Shan Wei &lt;shanwei88@gmail.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning),
sysctl_max_syn_backlog is determined by tcp_hashinfo-&gt;ehash_mask,
and the minimal value is 128, and it will increase in proportion to the
memory of machine.
The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog
are out of date.

Changelog:
V2: update description for sysctl_max_syn_backlog

Signed-off-by: Weiping Pan &lt;panweiping3@gmail.com&gt;
Reviewed-by: Shan Wei &lt;shanwei88@gmail.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: inherit listener congestion control for passive cnx</title>
<updated>2011-11-30T21:55:26+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-11-30T01:02:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d8a6e65f8b6b6b0142ebab578472906d89d63657'/>
<id>d8a6e65f8b6b6b0142ebab578472906d89d63657</id>
<content type='text'>
Rick Jones reported that TCP_CONGESTION sockopt performed on a listener
was ignored for its children sockets : right after accept() the
congestion control for new socket is the system default one.

This seems an oversight of the initial design (quoted from Stephen)

Based on prior investigation and patch from Rick.

Reported-by: Rick Jones &lt;rick.jones2@hp.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
CC: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
CC: Yuchung Cheng &lt;ycheng@google.com&gt;
Tested-by: Rick Jones &lt;rick.jones2@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rick Jones reported that TCP_CONGESTION sockopt performed on a listener
was ignored for its children sockets : right after accept() the
congestion control for new socket is the system default one.

This seems an oversight of the initial design (quoted from Stephen)

Based on prior investigation and patch from Rick.

Reported-by: Rick Jones &lt;rick.jones2@hp.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
CC: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
CC: Yuchung Cheng &lt;ycheng@google.com&gt;
Tested-by: Rick Jones &lt;rick.jones2@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>neigh: new unresolved queue limits</title>
<updated>2011-11-14T05:47:54+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-11-09T12:07:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8b5c171bb3dc0686b2647a84e990199c5faa9ef8'/>
<id>8b5c171bb3dc0686b2647a84e990199c5faa9ef8</id>
<content type='text'>
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
&gt; From: David Miller &lt;davem@davemloft.net&gt;
&gt; Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
&gt;
&gt; &gt; From: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
&gt; &gt; Date: Wed, 09 Nov 2011 12:14:09 +0100
&gt; &gt;
&gt; &gt;&gt; unres_qlen is the number of frames we are able to queue per unresolved
&gt; &gt;&gt; neighbour. Its default value (3) was never changed and is responsible
&gt; &gt;&gt; for strange drops, especially if IP fragments are used, or multiple
&gt; &gt;&gt; sessions start in parallel. Even a single tcp flow can hit this limit.
&gt; &gt;  ...
&gt; &gt;
&gt; &gt; Ok, I've applied this, let's see what happens :-)
&gt;
&gt; Early answer, build fails.
&gt;
&gt; Please test build this patch with DECNET enabled and resubmit.  The
&gt; decnet neigh layer still refers to the removed -&gt;queue_len member.
&gt;
&gt; Thanks.

Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.

[PATCH V5 net-next] neigh: new unresolved queue limits

unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.

$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
&gt; From: David Miller &lt;davem@davemloft.net&gt;
&gt; Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
&gt;
&gt; &gt; From: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
&gt; &gt; Date: Wed, 09 Nov 2011 12:14:09 +0100
&gt; &gt;
&gt; &gt;&gt; unres_qlen is the number of frames we are able to queue per unresolved
&gt; &gt;&gt; neighbour. Its default value (3) was never changed and is responsible
&gt; &gt;&gt; for strange drops, especially if IP fragments are used, or multiple
&gt; &gt;&gt; sessions start in parallel. Even a single tcp flow can hit this limit.
&gt; &gt;  ...
&gt; &gt;
&gt; &gt; Ok, I've applied this, let's see what happens :-)
&gt;
&gt; Early answer, build fails.
&gt;
&gt; Please test build this patch with DECNET enabled and resubmit.  The
&gt; decnet neigh layer still refers to the removed -&gt;queue_len member.
&gt;
&gt; Thanks.

Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.

[PATCH V5 net-next] neigh: new unresolved queue limits

unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.

$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: min_pmtu default is 552</title>
<updated>2011-11-08T19:21:44+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-11-08T19:21:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=20db93c34095553a01a9c31136658917bf1fa5d5'/>
<id>20db93c34095553a01a9c31136658917bf1fa5d5</id>
<content type='text'>
Small fix in Documentation, since min_pmtu is 512 + 20 + 20 = 552

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Small fix in Documentation, since min_pmtu is 512 + 20 + 20 = 552

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of github.com:davem330/net</title>
<updated>2011-10-07T17:38:43+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-10-07T17:38:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=88c5100c28b02c4b2b2c6f6fafbbd76d90f698b9'/>
<id>88c5100c28b02c4b2b2c6f6fafbbd76d90f698b9</id>
<content type='text'>
Conflicts:
	net/batman-adv/soft-interface.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	net/batman-adv/soft-interface.c
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Documentation: Fix type of variables</title>
<updated>2011-09-29T18:57:19+00:00</updated>
<author>
<name>Roy.Li</name>
<email>rongqing.li@windriver.com</email>
</author>
<published>2011-09-28T19:51:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=605b91c8f619047ed6d88e24c531d4bdddad46f4'/>
<id>605b91c8f619047ed6d88e24c531d4bdddad46f4</id>
<content type='text'>
Signed-off-by: Roy.Li &lt;rongqing.li@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Roy.Li &lt;rongqing.li@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of github.com:davem330/net</title>
<updated>2011-09-22T07:23:13+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-09-22T07:23:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8decf868790b48a727d7e7ca164f2bcd3c1389c0'/>
<id>8decf868790b48a727d7e7ca164f2bcd3c1389c0</id>
<content type='text'>
Conflicts:
	MAINTAINERS
	drivers/net/Kconfig
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
	drivers/net/ethernet/broadcom/tg3.c
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
	drivers/net/wireless/rt2x00/rt2800usb.c
	drivers/net/wireless/wl12xx/main.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	MAINTAINERS
	drivers/net/Kconfig
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
	drivers/net/ethernet/broadcom/tg3.c
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
	drivers/net/wireless/rt2x00/rt2800usb.c
	drivers/net/wireless/wl12xx/main.c
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: Send ICMPv6 RSes only when RAs are accepted</title>
<updated>2011-09-16T23:14:41+00:00</updated>
<author>
<name>Tore Anderson</name>
<email>tore@fud.no</email>
</author>
<published>2011-08-28T23:47:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=026359bc6eddfdc2d2e684bf0b51691649b90f33'/>
<id>026359bc6eddfdc2d2e684bf0b51691649b90f33</id>
<content type='text'>
This patch improves the logic determining when to send ICMPv6 Router
Solicitations, so that they are 1) always sent when the kernel is
accepting Router Advertisements, and 2) never sent when the kernel is
not accepting RAs. In other words, the operational setting of the
"accept_ra" sysctl is used.

The change also makes the special "Hybrid Router" forwarding mode
("forwarding" sysctl set to 2) operate exactly the same as the standard
Router mode (forwarding=1). The only difference between the two was
that RSes was being sent in the Hybrid Router mode only. The sysctl
documentation describing the special Hybrid Router mode has therefore
been removed.

Rationale for the change:

Currently, the value of forwarding sysctl is the only thing determining
whether or not to send RSes. If it has the value 0 or 2, they are sent,
otherwise they are not. This leads to inconsistent behaviour in the
following cases:

* accept_ra=0, forwarding=0
* accept_ra=0, forwarding=2
* accept_ra=1, forwarding=2
* accept_ra=2, forwarding=1

In the first three cases, the kernel will send RSes, even though it will
not accept any RAs received in reply. In the last case, it will not send
any RSes, even though it will accept and process any RAs received. (Most
routers will send unsolicited RAs periodically, so suppressing RSes in
the last case will merely delay auto-configuration, not prevent it.)

Also, it is my opinion that having the forwarding sysctl control RS
sending behaviour (completely independent of whether RAs are being
accepted or not) is simply not what most users would intuitively expect
to be the case.

Signed-off-by: Tore Anderson &lt;tore@fud.no&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch improves the logic determining when to send ICMPv6 Router
Solicitations, so that they are 1) always sent when the kernel is
accepting Router Advertisements, and 2) never sent when the kernel is
not accepting RAs. In other words, the operational setting of the
"accept_ra" sysctl is used.

The change also makes the special "Hybrid Router" forwarding mode
("forwarding" sysctl set to 2) operate exactly the same as the standard
Router mode (forwarding=1). The only difference between the two was
that RSes was being sent in the Hybrid Router mode only. The sysctl
documentation describing the special Hybrid Router mode has therefore
been removed.

Rationale for the change:

Currently, the value of forwarding sysctl is the only thing determining
whether or not to send RSes. If it has the value 0 or 2, they are sent,
otherwise they are not. This leads to inconsistent behaviour in the
following cases:

* accept_ra=0, forwarding=0
* accept_ra=0, forwarding=2
* accept_ra=1, forwarding=2
* accept_ra=2, forwarding=1

In the first three cases, the kernel will send RSes, even though it will
not accept any RAs received in reply. In the last case, it will not send
any RSes, even though it will accept and process any RAs received. (Most
routers will send unsolicited RAs periodically, so suppressing RSes in
the last case will merely delay auto-configuration, not prevent it.)

Also, it is my opinion that having the forwarding sysctl control RS
sending behaviour (completely independent of whether RAs are being
accepted or not) is simply not what most users would intuitively expect
to be the case.

Signed-off-by: Tore Anderson &lt;tore@fud.no&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
