<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4/inetpeer.c, branch v2.6.39</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>inetpeer: reduce stack usage</title>
<updated>2011-04-12T20:58:33+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-04-11T22:39:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=66944e1c5797562cebe2d1857d46dff60bf9a69e'/>
<id>66944e1c5797562cebe2d1857d46dff60bf9a69e</id>
<content type='text'>
On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Scot Doyle &lt;lkml@scotdoyle.com&gt;
Cc: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
Cc: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Reviewed-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Scot Doyle &lt;lkml@scotdoyle.com&gt;
Cc: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
Cc: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Reviewed-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: should use call_rcu() variant</title>
<updated>2011-03-14T06:22:23+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-03-14T06:22:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e75db2e8ff2c97762e87f61f54d7cdeaab1a6b0'/>
<id>4e75db2e8ff2c97762e87f61f54d7cdeaab1a6b0</id>
<content type='text'>
After commit 7b46ac4e77f3224a (inetpeer: Don't disable BH for initial
fast RCU lookup.), we should use call_rcu() to wait proper RCU grace
period.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 7b46ac4e77f3224a (inetpeer: Don't disable BH for initial
fast RCU lookup.), we should use call_rcu() to wait proper RCU grace
period.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Fix PMTU update.</title>
<updated>2011-03-14T01:37:49+00:00</updated>
<author>
<name>Hiroaki SHIMODA</name>
<email>shimoda.hiroaki@gmail.com</email>
</author>
<published>2011-03-09T20:09:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=46af31800b6916c92fffa529dc3c357008da957d'/>
<id>46af31800b6916c92fffa529dc3c357008da957d</id>
<content type='text'>
On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4
(Destination unreachable (Fragmentation needed)),

  icmp_unreach
    -&gt; ip_rt_frag_needed
         (peer-&gt;pmtu_expires is set here)
    -&gt; tcp_v4_err
         -&gt; do_pmtu_discovery
              -&gt; ip_rt_update_pmtu
                   (peer-&gt;pmtu_expires is already set,
                    so check_peer_pmtu is skipped.)
                   -&gt; check_peer_pmtu

check_peer_pmtu is skipped and MTU is not updated.

To fix this, let check_peer_pmtu execute unconditionally.
And some minor fixes
1) Avoid potential peer-&gt;pmtu_expires set to be zero.
2) In check_peer_pmtu, argument of time_before is reversed.
3) check_peer_pmtu expects peer-&gt;pmtu_orig is initialized as zero,
   but not initialized.

Signed-off-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4
(Destination unreachable (Fragmentation needed)),

  icmp_unreach
    -&gt; ip_rt_frag_needed
         (peer-&gt;pmtu_expires is set here)
    -&gt; tcp_v4_err
         -&gt; do_pmtu_discovery
              -&gt; ip_rt_update_pmtu
                   (peer-&gt;pmtu_expires is already set,
                    so check_peer_pmtu is skipped.)
                   -&gt; check_peer_pmtu

check_peer_pmtu is skipped and MTU is not updated.

To fix this, let check_peer_pmtu execute unconditionally.
And some minor fixes
1) Avoid potential peer-&gt;pmtu_expires set to be zero.
2) In check_peer_pmtu, argument of time_before is reversed.
3) check_peer_pmtu expects peer-&gt;pmtu_orig is initialized as zero,
   but not initialized.

Signed-off-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Don't disable BH for initial fast RCU lookup.</title>
<updated>2011-03-08T22:59:28+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-03-08T22:59:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7b46ac4e77f3224a1befe032c77f1df31d1b42c4'/>
<id>7b46ac4e77f3224a1befe032c77f1df31d1b42c4</id>
<content type='text'>
If modifications on other cpus are ok, then modifications to
the tree during lookup done by the local cpu are ok too.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If modifications on other cpus are ok, then modifications to
the tree during lookup done by the local cpu are ok too.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: seqlock optimization</title>
<updated>2011-03-04T22:33:59+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-03-04T22:33:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=65e8354ec13a45414045084166cb340c0d7ffe8a'/>
<id>65e8354ec13a45414045084166cb340c0d7ffe8a</id>
<content type='text'>
David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Add redirect and PMTU discovery cached info.</title>
<updated>2011-02-10T21:29:30+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-09T23:36:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ddd4aa424b866a08ceba7ddf38e61542c91b93a0'/>
<id>ddd4aa424b866a08ceba7ddf38e61542c91b93a0</id>
<content type='text'>
Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst-&gt;expires.

The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.

In this way PMTU expiration does not kill off the cached route as is
done currently.

Redirect information is permanent, or at least until another redirect
is received.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst-&gt;expires.

The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.

In this way PMTU expiration does not kill off the cached route as is
done currently.

Redirect information is permanent, or at least until another redirect
is received.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Abstract address representation further.</title>
<updated>2011-02-10T21:22:28+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-09T22:30:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7a71ed899e77cc822abb863e24a422dcf7e9fa33'/>
<id>7a71ed899e77cc822abb863e24a422dcf7e9fa33</id>
<content type='text'>
Future changes will add caching information, and some of
these new elements will be addresses.

Since the family is implicit via the -&gt;daddr.family member,
replicating the family in ever address we store is entirely
redundant.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Future changes will add caching information, and some of
these new elements will be addresses.

Since the family is implicit via the -&gt;daddr.family member,
replicating the family in ever address we store is entirely
redundant.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Move ICMP rate limiting state into inet_peer entries.</title>
<updated>2011-02-04T23:59:53+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-04T23:55:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=92d8682926342d2b6aa5b2ecc02221e00e1573a0'/>
<id>92d8682926342d2b6aa5b2ecc02221e00e1573a0</id>
<content type='text'>
Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Mark metrics as "new" in fresh inetpeer entries.</title>
<updated>2011-01-27T21:52:16+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-01-27T21:52:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=144001bddcb4db62c2261f1d703d835851031577'/>
<id>144001bddcb4db62c2261f1d703d835851031577</id>
<content type='text'>
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.

This way code can determine if default metrics have been loaded
in from a routing table entry already.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.

This way code can determine if default metrics have been loaded
in from a routing table entry already.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Use correct AVL tree base pointer in inet_getpeer().</title>
<updated>2011-01-24T22:38:09+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-01-24T22:37:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3408404a4c2a4eead9d73b0bbbfe3f225b65f492'/>
<id>3408404a4c2a4eead9d73b0bbbfe3f225b65f492</id>
<content type='text'>
Family was hard-coded to AF_INET but should be daddr-&gt;family.

This fixes crashes when unlinking ipv6 peer entries, since the
unlink code was looking up the base pointer properly.

Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Family was hard-coded to AF_INET but should be daddr-&gt;family.

This fixes crashes when unlinking ipv6 peer entries, since the
unlink code was looking up the base pointer properly.

Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
