<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/core, branch v2.6.31</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>net: sk_free() should be allowed right after sk_alloc()</title>
<updated>2009-09-02T00:49:00+00:00</updated>
<author>
<name>Jarek Poplawski</name>
<email>jarkao2@gmail.com</email>
</author>
<published>2009-08-30T23:15:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d66ee0587c3927aea5178a822976c7c853d815fe'/>
<id>d66ee0587c3927aea5178a822976c7c853d815fe</id>
<content type='text'>
After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
sk_free() frees socks conditionally and depends
on sk_wmem_alloc being set e.g. in sock_init_data(). But in some
cases sk_free() is called earlier, usually after other alloc errors.

Fix is to move sk_wmem_alloc initialization from sock_init_data()
to sk_alloc() itself.

Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
sk_free() frees socks conditionally and depends
on sk_wmem_alloc being set e.g. in sock_init_data(). But in some
cases sk_free() is called earlier, usually after other alloc errors.

Fix is to move sk_wmem_alloc initialization from sock_init_data()
to sk_alloc() itself.

Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netpoll: warning for ndo_start_xmit returns with interrupts enabled</title>
<updated>2009-08-24T02:50:59+00:00</updated>
<author>
<name>Dongdong Deng</name>
<email>dongdong.deng@windriver.com</email>
</author>
<published>2009-08-21T03:33:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=79b1bee888d43b14cf0c08fb8e5aa6cb161e48f8'/>
<id>79b1bee888d43b14cf0c08fb8e5aa6cb161e48f8</id>
<content type='text'>
WARN_ONCE for ndo_start_xmit() enable interrupts in netpoll_send_skb(),
because the NETPOLL API requires that interrupts remain disabled in
netpoll_send_skb().

Signed-off-by: Dongdong Deng &lt;dongdong.deng@windriver.com&gt;
Acked-by: Matt Mackall &lt;mpm@selenic.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
WARN_ONCE for ndo_start_xmit() enable interrupts in netpoll_send_skb(),
because the NETPOLL API requires that interrupts remain disabled in
netpoll_send_skb().

Signed-off-by: Dongdong Deng &lt;dongdong.deng@windriver.com&gt;
Acked-by: Matt Mackall &lt;mpm@selenic.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: restore gnet_stats_basic to previous definition</title>
<updated>2009-08-18T04:33:49+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-08-16T09:36:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c1a8f1f1c8e01eab5862c8db39b49ace814e6c66'/>
<id>c1a8f1f1c8e01eab5862c8db39b49ace814e6c66</id>
<content type='text'>
In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.

Restoring old behavior is not welcome, for performance reason.

Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)

Based on a report and initial patch from Michael Spang.

Reported-by: Michael Spang &lt;mspang@csclub.uwaterloo.ca&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.

Restoring old behavior is not welcome, for performance reason.

Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)

Based on a report and initial patch from Michael Spang.

Reported-by: Michael Spang &lt;mspang@csclub.uwaterloo.ca&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix spinlock use in alloc_netdev_mq()</title>
<updated>2009-08-05T15:35:11+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2009-08-04T21:16:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0bf52b981770cbf006323bab5177f2858a196766'/>
<id>0bf52b981770cbf006323bab5177f2858a196766</id>
<content type='text'>
-tip testing found this lockdep warning:

[    2.272010] calling  net_dev_init+0x0/0x164 @ 1
[    2.276033] device class 'net': registering
[    2.280191] INFO: trying to register non-static key.
[    2.284005] the code is fine but needs lockdep annotation.
[    2.284005] turning off the locking correctness validator.
[    2.284005] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip #1145
[    2.284005] Call Trace:
[    2.284005]  [&lt;7958eb4e&gt;] ? printk+0xf/0x11
[    2.284005]  [&lt;7904f83c&gt;] __lock_acquire+0x11b/0x622
[    2.284005]  [&lt;7908c9b7&gt;] ? alloc_debug_processing+0xf9/0x144
[    2.284005]  [&lt;7904e2be&gt;] ? mark_held_locks+0x3a/0x52
[    2.284005]  [&lt;7908dbc4&gt;] ? kmem_cache_alloc+0xa8/0x13f
[    2.284005]  [&lt;7904e475&gt;] ? trace_hardirqs_on_caller+0xa2/0xc3
[    2.284005]  [&lt;7904fdf6&gt;] lock_acquire+0xb3/0xd0
[    2.284005]  [&lt;79489678&gt;] ? alloc_netdev_mq+0xf5/0x1ad
[    2.284005]  [&lt;79591514&gt;] _spin_lock_bh+0x2d/0x5d
[    2.284005]  [&lt;79489678&gt;] ? alloc_netdev_mq+0xf5/0x1ad
[    2.284005]  [&lt;79489678&gt;] alloc_netdev_mq+0xf5/0x1ad
[    2.284005]  [&lt;793a38f2&gt;] ? loopback_setup+0x0/0x74
[    2.284005]  [&lt;798eecd0&gt;] loopback_net_init+0x20/0x5d
[    2.284005]  [&lt;79483efb&gt;] register_pernet_device+0x23/0x4b
[    2.284005]  [&lt;798f5c9f&gt;] net_dev_init+0x115/0x164
[    2.284005]  [&lt;7900104f&gt;] do_one_initcall+0x4a/0x11a
[    2.284005]  [&lt;798f5b8a&gt;] ? net_dev_init+0x0/0x164
[    2.284005]  [&lt;79066f6d&gt;] ? register_irq_proc+0x8c/0xa8
[    2.284005]  [&lt;798cc29a&gt;] do_basic_setup+0x42/0x52
[    2.284005]  [&lt;798cc30a&gt;] kernel_init+0x60/0xa1
[    2.284005]  [&lt;798cc2aa&gt;] ? kernel_init+0x0/0xa1
[    2.284005]  [&lt;79003e03&gt;] kernel_thread_helper+0x7/0x10
[    2.284078] device: 'lo': device_add
[    2.288248] initcall net_dev_init+0x0/0x164 returned 0 after 11718 usecs
[    2.292010] calling  neigh_init+0x0/0x66 @ 1
[    2.296010] initcall neigh_init+0x0/0x66 returned 0 after 0 usecs

it's using an zero-initialized spinlock. This is a side-effect of:

        dev_unicast_init(dev);

in alloc_netdev_mq() making use of dev-&gt;addr_list_lock.

The device has just been allocated freshly, it's not accessible
anywhere yet so no locking is needed at all - in fact it's wrong
to lock it here (the lock isnt initialized yet).

This bug was introduced via:

| commit a6ac65db2329e7685299666f5f7b6093c7b0f3a0
| Date:   Thu Jul 30 01:06:12 2009 +0000
|
|     net: restore the original spinlock to protect unicast list

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Jiri Pirko &lt;jpirko@redhat.com&gt;
Tested-by: Mark Brown &lt;broonie@opensource.wolfsonmicro.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
-tip testing found this lockdep warning:

[    2.272010] calling  net_dev_init+0x0/0x164 @ 1
[    2.276033] device class 'net': registering
[    2.280191] INFO: trying to register non-static key.
[    2.284005] the code is fine but needs lockdep annotation.
[    2.284005] turning off the locking correctness validator.
[    2.284005] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip #1145
[    2.284005] Call Trace:
[    2.284005]  [&lt;7958eb4e&gt;] ? printk+0xf/0x11
[    2.284005]  [&lt;7904f83c&gt;] __lock_acquire+0x11b/0x622
[    2.284005]  [&lt;7908c9b7&gt;] ? alloc_debug_processing+0xf9/0x144
[    2.284005]  [&lt;7904e2be&gt;] ? mark_held_locks+0x3a/0x52
[    2.284005]  [&lt;7908dbc4&gt;] ? kmem_cache_alloc+0xa8/0x13f
[    2.284005]  [&lt;7904e475&gt;] ? trace_hardirqs_on_caller+0xa2/0xc3
[    2.284005]  [&lt;7904fdf6&gt;] lock_acquire+0xb3/0xd0
[    2.284005]  [&lt;79489678&gt;] ? alloc_netdev_mq+0xf5/0x1ad
[    2.284005]  [&lt;79591514&gt;] _spin_lock_bh+0x2d/0x5d
[    2.284005]  [&lt;79489678&gt;] ? alloc_netdev_mq+0xf5/0x1ad
[    2.284005]  [&lt;79489678&gt;] alloc_netdev_mq+0xf5/0x1ad
[    2.284005]  [&lt;793a38f2&gt;] ? loopback_setup+0x0/0x74
[    2.284005]  [&lt;798eecd0&gt;] loopback_net_init+0x20/0x5d
[    2.284005]  [&lt;79483efb&gt;] register_pernet_device+0x23/0x4b
[    2.284005]  [&lt;798f5c9f&gt;] net_dev_init+0x115/0x164
[    2.284005]  [&lt;7900104f&gt;] do_one_initcall+0x4a/0x11a
[    2.284005]  [&lt;798f5b8a&gt;] ? net_dev_init+0x0/0x164
[    2.284005]  [&lt;79066f6d&gt;] ? register_irq_proc+0x8c/0xa8
[    2.284005]  [&lt;798cc29a&gt;] do_basic_setup+0x42/0x52
[    2.284005]  [&lt;798cc30a&gt;] kernel_init+0x60/0xa1
[    2.284005]  [&lt;798cc2aa&gt;] ? kernel_init+0x0/0xa1
[    2.284005]  [&lt;79003e03&gt;] kernel_thread_helper+0x7/0x10
[    2.284078] device: 'lo': device_add
[    2.288248] initcall net_dev_init+0x0/0x164 returned 0 after 11718 usecs
[    2.292010] calling  neigh_init+0x0/0x66 @ 1
[    2.296010] initcall neigh_init+0x0/0x66 returned 0 after 0 usecs

it's using an zero-initialized spinlock. This is a side-effect of:

        dev_unicast_init(dev);

in alloc_netdev_mq() making use of dev-&gt;addr_list_lock.

The device has just been allocated freshly, it's not accessible
anywhere yet so no locking is needed at all - in fact it's wrong
to lock it here (the lock isnt initialized yet).

This bug was introduced via:

| commit a6ac65db2329e7685299666f5f7b6093c7b0f3a0
| Date:   Thu Jul 30 01:06:12 2009 +0000
|
|     net: restore the original spinlock to protect unicast list

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Jiri Pirko &lt;jpirko@redhat.com&gt;
Tested-by: Mark Brown &lt;broonie@opensource.wolfsonmicro.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: restore the original spinlock to protect unicast list</title>
<updated>2009-08-02T19:20:46+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jpirko@redhat.com</email>
</author>
<published>2009-07-30T01:06:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a6ac65db2329e7685299666f5f7b6093c7b0f3a0'/>
<id>a6ac65db2329e7685299666f5f7b6093c7b0f3a0</id>
<content type='text'>
There is a path when an assetion in dev_unicast_sync() appears.

igmp6_group_added -&gt; dev_mc_add -&gt; __dev_set_rx_mode -&gt;
-&gt; vlan_dev_set_rx_mode -&gt; dev_unicast_sync

Therefore we cannot protect this list with rtnl. This patch restores the
original protecting this list with spinlock.

Signed-off-by: Jiri Pirko &lt;jpirko@redhat.com&gt;
Tested-by: Meelis Roos &lt;mroos@linux.ee&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a path when an assetion in dev_unicast_sync() appears.

igmp6_group_added -&gt; dev_mc_add -&gt; __dev_set_rx_mode -&gt;
-&gt; vlan_dev_set_rx_mode -&gt; dev_unicast_sync

Therefore we cannot protect this list with rtnl. This patch restores the
original protecting this list with spinlock.

Signed-off-by: Jiri Pirko &lt;jpirko@redhat.com&gt;
Tested-by: Meelis Roos &lt;mroos@linux.ee&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: net_assign_generic() fix</title>
<updated>2009-08-02T19:20:36+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-07-28T02:36:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=144586301f6af5ae5943a002f030d8c626fa4fdd'/>
<id>144586301f6af5ae5943a002f030d8c626fa4fdd</id>
<content type='text'>
memcpy() should take into account size of pointers,
not only number of pointers to copy.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
memcpy() should take into account size of pointers,
not only number of pointers to copy.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix error return for setsockopt(SO_TIMESTAMPING)</title>
<updated>2009-07-20T15:23:36+00:00</updated>
<author>
<name>Rémi Denis-Courmont</name>
<email>remi.denis-courmont@nokia.com</email>
</author>
<published>2009-07-20T00:47:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f249fb783092471a4808e5fc5bda071d2724810d'/>
<id>f249fb783092471a4808e5fc5bda071d2724810d</id>
<content type='text'>
I guess it should be -EINVAL rather than EINVAL. I have not checked
when the bug came in. Perhaps a candidate for -stable?

Signed-off-by: Rémi Denis-Courmont &lt;remi.denis-courmont@nokia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I guess it should be -EINVAL rather than EINVAL. I have not checked
when the bug came in. Perhaps a candidate for -stable?

Signed-off-by: Rémi Denis-Courmont &lt;remi.denis-courmont@nokia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sock_copy() fixes</title>
<updated>2009-07-17T01:05:26+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-07-15T23:13:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4dc6dc7162c08b9965163c9ab3f9375d4adff2c7'/>
<id>4dc6dc7162c08b9965163c9ab3f9375d4adff2c7</id>
<content type='text'>
Commit e912b1142be8f1e2c71c71001dc992c6e5eb2ec1
(net: sk_prot_alloc() should not blindly overwrite memory)
took care of not zeroing whole new socket at allocation time.

sock_copy() is another spot where we should be very careful.
We should not set refcnt to a non null value, until
we are sure other fields are correctly setup, or
a lockless reader could catch this socket by mistake,
while not fully (re)initialized.

This patch puts sk_node &amp; sk_refcnt to the very beginning
of struct sock to ease sock_copy() &amp; sk_prot_alloc() job.

We add appropriate smp_wmb() before sk_refcnt initializations
to match our RCU requirements (changes to sock keys should
be committed to memory before sk_refcnt setting)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit e912b1142be8f1e2c71c71001dc992c6e5eb2ec1
(net: sk_prot_alloc() should not blindly overwrite memory)
took care of not zeroing whole new socket at allocation time.

sock_copy() is another spot where we should be very careful.
We should not set refcnt to a non null value, until
we are sure other fields are correctly setup, or
a lockless reader could catch this socket by mistake,
while not fully (re)initialized.

This patch puts sk_node &amp; sk_refcnt to the very beginning
of struct sock to ease sock_copy() &amp; sk_prot_alloc() job.

We add appropriate smp_wmb() before sk_refcnt initializations
to match our RCU requirements (changes to sock keys should
be committed to memory before sk_refcnt setting)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sk_prot_alloc() should not blindly overwrite memory</title>
<updated>2009-07-12T03:26:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-07-08T19:36:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e912b1142be8f1e2c71c71001dc992c6e5eb2ec1'/>
<id>e912b1142be8f1e2c71c71001dc992c6e5eb2ec1</id>
<content type='text'>
Some sockets use SLAB_DESTROY_BY_RCU, and our RCU code correctness
depends on sk-&gt;sk_nulls_node.next being always valid. A NULL
value is not allowed as it might fault a lockless reader.

Current sk_prot_alloc() implementation doesnt respect this hypothesis,
calling kmem_cache_alloc() with __GFP_ZERO. Just call memset() around
the forbidden field.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some sockets use SLAB_DESTROY_BY_RCU, and our RCU code correctness
depends on sk-&gt;sk_nulls_node.next being always valid. A NULL
value is not allowed as it might fault a lockless reader.

Current sk_prot_alloc() implementation doesnt respect this hypothesis,
calling kmem_cache_alloc() with __GFP_ZERO. Just call memset() around
the forbidden field.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: adding memory barrier to the poll and receive callbacks</title>
<updated>2009-07-10T00:06:57+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@redhat.com</email>
</author>
<published>2009-07-08T12:09:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a57de0b4336e48db2811a2030bb68dba8dd09d88'/>
<id>a57de0b4336e48db2811a2030bb68dba8dd09d88</id>
<content type='text'>
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp-&gt;rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp-&gt;rcv_nxt
  ...                        ...
  tp-&gt;rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk-&gt;sk_sleep &amp;&amp; waitqueue_active(sk-&gt;sk_sleep))
                                        wake_up_interruptible(sk-&gt;sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp-&gt;rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp-&gt;rcv_nxt check and sleeps, or will get the new value for
tp-&gt;rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp-&gt;rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp-&gt;rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp-&gt;rcv_nxt
  ...                        ...
  tp-&gt;rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk-&gt;sk_sleep &amp;&amp; waitqueue_active(sk-&gt;sk_sleep))
                                        wake_up_interruptible(sk-&gt;sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp-&gt;rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp-&gt;rcv_nxt check and sleeps, or will get the new value for
tp-&gt;rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp-&gt;rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
