<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/eventpoll.c, branch v2.6.22</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>epoll: move kfree inside ep_free</title>
<updated>2007-05-15T15:54:00+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-15T08:40:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f0ee9aabb0520adea5937855a9575c08a97b16e7'/>
<id>f0ee9aabb0520adea5937855a9575c08a97b16e7</id>
<content type='text'>
Move the kfree() call inside the ep_free() function.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the kfree() call inside the ep_free() function.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll: fix some comments</title>
<updated>2007-05-15T15:54:00+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-15T08:40:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=67647d0fb8bc03609d045a9cce85f7ef6d763036'/>
<id>67647d0fb8bc03609d045a9cce85f7ef6d763036</id>
<content type='text'>
Fixes some epoll code comments.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes some epoll code comments.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll locks changes and cleanups</title>
<updated>2007-05-15T15:53:59+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-15T08:40:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c7ea76302547f81e4583d0d7c52a1c37c6747f5d'/>
<id>c7ea76302547f81e4583d0d7c52a1c37c6747f5d</id>
<content type='text'>
Changes the rwlock to a spinlock, and drops the use-count variable.
Operations are always bound by the mutex now, so the use-count is no more
needed.  For the same reason, the rwlock can become a simple spinlock.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Changes the rwlock to a spinlock, and drops the use-count variable.
Operations are always bound by the mutex now, so the use-count is no more
needed.  For the same reason, the rwlock can become a simple spinlock.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fix epoll single pass code and add wait-exclusive flag</title>
<updated>2007-05-15T15:53:59+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-15T08:40:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d47de16c7221968d3eab899d7540efa5ba77af5a'/>
<id>d47de16c7221968d3eab899d7540efa5ba77af5a</id>
<content type='text'>
Fixes the epoll single pass code.  During the unlocked event delivery (to
userspace) code, the poll callback can re-issue new events, and we must
receive them correctly.  Since we loop in a lockless fashion, we want to be
O(nready), and we don't want to flash on/off the spinlock for every event, we
have the poll callback to use a secondary list to queue events while we're
inside the event delivery loop.  The rw_semaphore has been turned into a
mutex.  This patch also adds the wait-exclusive flag, as suggested by Davi
Arnaut.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes the epoll single pass code.  During the unlocked event delivery (to
userspace) code, the poll callback can re-issue new events, and we must
receive them correctly.  Since we loop in a lockless fashion, we want to be
O(nready), and we don't want to flash on/off the spinlock for every event, we
have the poll callback to use a secondary list to queue events while we're
inside the event delivery loop.  The rw_semaphore has been turned into a
mutex.  This patch also adds the wait-exclusive flag, as suggested by Davi
Arnaut.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll cleanups: epoll remove static pre-declarations and akpm-ize the code</title>
<updated>2007-05-11T15:29:37+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-11T05:23:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7699acd1341c63fdbcfc56994fb2989ec59d2a43'/>
<id>7699acd1341c63fdbcfc56994fb2989ec59d2a43</id>
<content type='text'>
Re-arrange epoll code to avoid static functions pre-declarations, and apply
akpm-filter on it.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Re-arrange epoll code to avoid static functions pre-declarations, and apply
akpm-filter on it.

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll cleanups: epoll no module</title>
<updated>2007-05-11T15:29:37+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-11T05:23:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cea69241870e55638156a026814551d6c575fd7f'/>
<id>cea69241870e55638156a026814551d6c575fd7f</id>
<content type='text'>
Epoll is either compiled it, or not (if EMBEDDED). Remove the module code
and use fs_initcall().

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Epoll is either compiled it, or not (if EMBEDDED). Remove the module code
and use fs_initcall().

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll: use anonymous inodes</title>
<updated>2007-05-11T15:29:37+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-11T05:23:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=da66f7cb0f69ab27dbf5b9d0b85c4b97716c44d1'/>
<id>da66f7cb0f69ab27dbf5b9d0b85c4b97716c44d1</id>
<content type='text'>
Cut out lots of code from epoll, by reusing the anonymous inode source
patch (fs/anon_inodes.c).

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cut out lots of code from epoll, by reusing the anonymous inode source
patch (fs/anon_inodes.c).

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Introduce a handy list_first_entry macro</title>
<updated>2007-05-08T18:15:11+00:00</updated>
<author>
<name>Pavel Emelianov</name>
<email>xemul@sw.ru</email>
</author>
<published>2007-05-08T07:30:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b5e618181a927210f8be1d3d2249d31904ba358d'/>
<id>b5e618181a927210f8be1d3d2249d31904ba358d</id>
<content type='text'>
There are many places in the kernel where the construction like

   foo = list_entry(head-&gt;next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

   list_first_entry(head, type, member) \
             list_entry((head)-&gt;next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
 If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov &lt;xemul@openvz.org&gt;
Signed-off-by: Kirill Korotaev &lt;dev@openvz.org&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: John McCutchan &lt;ttb@tentacle.dhs.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: john stultz &lt;johnstul@us.ibm.com&gt;
Cc: Ram Pai &lt;linuxram@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are many places in the kernel where the construction like

   foo = list_entry(head-&gt;next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

   list_first_entry(head, type, member) \
             list_entry((head)-&gt;next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
 If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov &lt;xemul@openvz.org&gt;
Signed-off-by: Kirill Korotaev &lt;dev@openvz.org&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Cc: John McCutchan &lt;ttb@tentacle.dhs.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: john stultz &lt;johnstul@us.ibm.com&gt;
Cc: Ram Pai &lt;linuxram@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>header cleaning: don't include smp_lock.h when not used</title>
<updated>2007-05-08T18:15:07+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>randy.dunlap@oracle.com</email>
</author>
<published>2007-05-08T07:28:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e63340ae6b6205fef26b40a75673d1c9c0c8bb90'/>
<id>e63340ae6b6205fef26b40a75673d1c9c0c8bb90</id>
<content type='text'>
Remove includes of &lt;linux/smp_lock.h&gt; where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove includes of &lt;linux/smp_lock.h&gt; where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>epoll: optimizations and cleanups</title>
<updated>2007-05-08T18:15:01+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-08T07:25:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6192bd536f96c6a0d969081bc71ae24f9319bfdc'/>
<id>6192bd536f96c6a0d969081bc71ae24f9319bfdc</id>
<content type='text'>
Epoll is doing multiple passes over the ready set at the moment, because of
the constraints over the f_op-&gt;poll() call.  Looking at the code again, I
noticed that we already hold the epoll semaphore in read, and this
(together with other locking conditions that hold while doing an
epoll_wait()) can lead to a smarter way [1] to "ship" events to userspace
(in a single pass).

This is a stress application that can be used to test the new code.  It
spwans multiple thread and call epoll_wait() and epoll_ctl() from many
threads.  Stress tested on my dual Opteron 254 w/out any problems.

http://www.xmailserver.org/totalmess.c

This is not a benchmark, just something that tries to stress and exploit
possible problems with the new code.
Also, I made a stupid micro-benchmark:

http://www.xmailserver.org/epwbench.c

[1] Considering that epoll must be thread-safe, there are five ways we can
    be hit during an epoll_wait() transfer loop (ep_send_events()):

    1) The epoll fd going away and calling ep_free
       This just can't happen, since we did an fget() in sys_epoll_wait

    2) An epoll_ctl(EPOLL_CTL_DEL)
       This can't happen because epoll_ctl() gets ep-&gt;sem in write, and
       we're holding it in read during ep_send_events()

    3) An fd stored inside the epoll fd going away
       This can't happen because in eventpoll_release_file() we get
       ep-&gt;sem in write, and we're holding it in read during
       ep_send_events()

    4) Another epoll_wait() happening on another thread
       They both can be inside ep_send_events() at the same time, we get
       (splice) the ready-list under the spinlock, so each one will get
       its own ready list. Note that an fd cannot be at the same time
       inside more than one ready list, because ep_poll_callback() will
       not re-queue it if it sees it already linked:

       if (ep_is_linked(&amp;epi-&gt;rdllink))
                goto is_linked;

       Another case that can happen, is two concurrent epoll_wait(),
       coming in with a userspace event buffer of size, say, ten.
       Suppose there are 50 event ready in the list. The first
       epoll_wait() will "steal" the whole list, while the second, seeing
       no events, will go to sleep. But at the end of ep_send_events() in
       the first epoll_wait(), we will re-inject surplus ready fds, and we
       will trigger the proper wake_up to the second epoll_wait().

    5) ep_poll_callback() hitting us asyncronously
       This is the tricky part. As I said above, the ep_is_linked() test
       done inside ep_poll_callback(), will guarantee us that until the
       item will result linked to a list, ep_poll_callback() will not try
       to re-queue it again (read, write data on any of its members). When
       we do a list_del() in ep_send_events(), the item will still satisfy
       the ep_is_linked() test (whatever data is written in prev/next,
       it'll never be its own pointer), so ep_poll_callback() will still
       leave us alone. It's only after the eventual smp_mb()+INIT_LIST_HEAD(&amp;epi-&gt;rdllink)
       that it'll become visible to ep_poll_callback(), but at the point
       we're already past it.

[akpm@osdl.org: 80 cols]
Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Epoll is doing multiple passes over the ready set at the moment, because of
the constraints over the f_op-&gt;poll() call.  Looking at the code again, I
noticed that we already hold the epoll semaphore in read, and this
(together with other locking conditions that hold while doing an
epoll_wait()) can lead to a smarter way [1] to "ship" events to userspace
(in a single pass).

This is a stress application that can be used to test the new code.  It
spwans multiple thread and call epoll_wait() and epoll_ctl() from many
threads.  Stress tested on my dual Opteron 254 w/out any problems.

http://www.xmailserver.org/totalmess.c

This is not a benchmark, just something that tries to stress and exploit
possible problems with the new code.
Also, I made a stupid micro-benchmark:

http://www.xmailserver.org/epwbench.c

[1] Considering that epoll must be thread-safe, there are five ways we can
    be hit during an epoll_wait() transfer loop (ep_send_events()):

    1) The epoll fd going away and calling ep_free
       This just can't happen, since we did an fget() in sys_epoll_wait

    2) An epoll_ctl(EPOLL_CTL_DEL)
       This can't happen because epoll_ctl() gets ep-&gt;sem in write, and
       we're holding it in read during ep_send_events()

    3) An fd stored inside the epoll fd going away
       This can't happen because in eventpoll_release_file() we get
       ep-&gt;sem in write, and we're holding it in read during
       ep_send_events()

    4) Another epoll_wait() happening on another thread
       They both can be inside ep_send_events() at the same time, we get
       (splice) the ready-list under the spinlock, so each one will get
       its own ready list. Note that an fd cannot be at the same time
       inside more than one ready list, because ep_poll_callback() will
       not re-queue it if it sees it already linked:

       if (ep_is_linked(&amp;epi-&gt;rdllink))
                goto is_linked;

       Another case that can happen, is two concurrent epoll_wait(),
       coming in with a userspace event buffer of size, say, ten.
       Suppose there are 50 event ready in the list. The first
       epoll_wait() will "steal" the whole list, while the second, seeing
       no events, will go to sleep. But at the end of ep_send_events() in
       the first epoll_wait(), we will re-inject surplus ready fds, and we
       will trigger the proper wake_up to the second epoll_wait().

    5) ep_poll_callback() hitting us asyncronously
       This is the tricky part. As I said above, the ep_is_linked() test
       done inside ep_poll_callback(), will guarantee us that until the
       item will result linked to a list, ep_poll_callback() will not try
       to re-queue it again (read, write data on any of its members). When
       we do a list_del() in ep_send_events(), the item will still satisfy
       the ep_is_linked() test (whatever data is written in prev/next,
       it'll never be its own pointer), so ep_poll_callback() will still
       leave us alone. It's only after the eventual smp_mb()+INIT_LIST_HEAD(&amp;epi-&gt;rdllink)
       that it'll become visible to ep_poll_callback(), but at the point
       we're already past it.

[akpm@osdl.org: 80 cols]
Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
