<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/aio.c, branch v2.6.22</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>signal/timer/event: KAIO eventfd support example</title>
<updated>2007-05-11T15:29:37+00:00</updated>
<author>
<name>Davide Libenzi</name>
<email>davidel@xmailserver.org</email>
</author>
<published>2007-05-11T05:23:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9c3060bedd84144653a2ad7bea32389f65598d40'/>
<id>9c3060bedd84144653a2ad7bea32389f65598d40</id>
<content type='text'>
This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd (hence
compatible with POSIX select/poll).  The KAIO code simply signals the eventfd
fd when events are ready, and this triggers a POLLIN in the fd.  This patch
uses a reserved for future use member of the struct iocb to pass an eventfd
file descriptor, that KAIO will use to post events every time a request
completes.  At that point, an aio_getevents() will return the completed result
to a struct io_event.  I made a quick test program to verify the patch, and it
runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and epoll
too.

This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll.  In a typical
scenario, an application would submit KAIO request using aio_submit(), and
will also use epoll_ctl() on the whole other class of devices (that with the
addition of signals, timers and user events, now it's pretty much complete),
and then would:

	epoll_wait(...);
	for_each_event {
		if (curr_event_is_kaiofd) {
			aio_getevents();
			dispatch_aio_events();
		} else {
			dispatch_epoll_event();
		}
	}

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd (hence
compatible with POSIX select/poll).  The KAIO code simply signals the eventfd
fd when events are ready, and this triggers a POLLIN in the fd.  This patch
uses a reserved for future use member of the struct iocb to pass an eventfd
file descriptor, that KAIO will use to post events every time a request
completes.  At that point, an aio_getevents() will return the completed result
to a struct io_event.  I made a quick test program to verify the patch, and it
runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and epoll
too.

This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll.  In a typical
scenario, an application would submit KAIO request using aio_submit(), and
will also use epoll_ctl() on the whole other class of devices (that with the
addition of signals, timers and user events, now it's pretty much complete),
and then would:

	epoll_wait(...);
	for_each_event {
		if (curr_event_is_kaiofd) {
			aio_getevents();
			dispatch_aio_events();
		} else {
			dispatch_epoll_event();
		}
	}

Signed-off-by: Davide Libenzi &lt;davidel@xmailserver.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>unify flush_work/flush_work_keventd and rename it to cancel_work_sync</title>
<updated>2007-05-09T19:30:53+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2007-05-09T09:34:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28e53bddf814485699a4142bc056fd37d4e11dd4'/>
<id>28e53bddf814485699a4142bc056fd37d4e11dd4</id>
<content type='text'>
flush_work(wq, work) doesn't need the first parameter, we can use cwq-&gt;wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Jeff Garzik &lt;jeff@garzik.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Tejun Heo &lt;htejun@gmail.com&gt;
Cc: Auke Kok &lt;auke-jan.h.kok@intel.com&gt;,
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
flush_work(wq, work) doesn't need the first parameter, we can use cwq-&gt;wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Jeff Garzik &lt;jeff@garzik.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Tejun Heo &lt;htejun@gmail.com&gt;
Cc: Auke Kok &lt;auke-jan.h.kok@intel.com&gt;,
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: use flush_work()</title>
<updated>2007-05-09T19:30:51+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@osdl.org</email>
</author>
<published>2007-05-09T09:33:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a9df62c7585e6caa1e7d2425b2b14460ec3afc20'/>
<id>a9df62c7585e6caa1e7d2425b2b14460ec3afc20</id>
<content type='text'>
Migrate AIO over to use flush_work().

Cc: "Maciej W. Rozycki" &lt;macro@linux-mips.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Migrate AIO over to use flush_work().

Cc: "Maciej W. Rozycki" &lt;macro@linux-mips.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KMEM_CACHE(): simplify slab cache creation</title>
<updated>2007-05-07T19:12:55+00:00</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2007-05-06T21:49:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0a31bd5f2bbb6473ef9d24f0063ca91cfa678b64'/>
<id>0a31bd5f2bbb6473ef9d24f0063ca91cfa678b64</id>
<content type='text'>
This patch provides a new macro

KMEM_CACHE(&lt;struct&gt;, &lt;flags&gt;)

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
	int a,b,c;
	struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab.  If it fails then we
panic.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch provides a new macro

KMEM_CACHE(&lt;struct&gt;, &lt;flags&gt;)

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
	int a,b,c;
	struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab.  If it fails then we
panic.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] aio: remove bare user-triggerable error printk</title>
<updated>2007-03-28T00:53:25+00:00</updated>
<author>
<name>Zach Brown</name>
<email>zach.brown@oracle.com</email>
</author>
<published>2007-03-27T22:44:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28defbea64622f69d65a6079bf800cedb9915a5f'/>
<id>28defbea64622f69d65a6079bf800cedb9915a5f</id>
<content type='text'>
The user can generate console output if they cause do_mmap() to fail
during sys_io_setup().  This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown &lt;zach.brown@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The user can generate console output if they cause do_mmap() to fail
during sys_io_setup().  This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown &lt;zach.brown@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Transform kmem_cache_alloc()+memset(0) -&gt; kmem_cache_zalloc().</title>
<updated>2007-02-11T18:51:27+00:00</updated>
<author>
<name>Robert P. J. Day</name>
<email>rpjday@mindspring.com</email>
</author>
<published>2007-02-10T09:45:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c376222960ae91d5ffb9197ee36771aaed1d9f90'/>
<id>c376222960ae91d5ffb9197ee36771aaed1d9f90</id>
<content type='text'>
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day &lt;rpjday@mindspring.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Andi Kleen &lt;ak@muc.de&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: James Bottomley &lt;James.Bottomley@steeleye.com&gt;
Cc: Greg KH &lt;greg@kroah.com&gt;
Acked-by: Joel Becker &lt;Joel.Becker@oracle.com&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Cc: Michael Halcrow &lt;mhalcrow@us.ibm.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day &lt;rpjday@mindspring.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Andi Kleen &lt;ak@muc.de&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: James Bottomley &lt;James.Bottomley@steeleye.com&gt;
Cc: Greg KH &lt;greg@kroah.com&gt;
Acked-by: Joel Becker &lt;Joel.Becker@oracle.com&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: Jan Kara &lt;jack@ucw.cz&gt;
Cc: Michael Halcrow &lt;mhalcrow@us.ibm.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Remove final references to deprecated "MAP_ANON" page protection flag</title>
<updated>2007-02-11T18:51:17+00:00</updated>
<author>
<name>Robert P. J. Day</name>
<email>rpjday@mindspring.com</email>
</author>
<published>2007-02-10T09:42:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e10a4437cb37c85f2df95432025b392d98aac2aa'/>
<id>e10a4437cb37c85f2df95432025b392d98aac2aa</id>
<content type='text'>
Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.

Signed-off-by: Robert P. J. Day &lt;rpjday@mindspring.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.

Signed-off-by: Robert P. J. Day &lt;rpjday@mindspring.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] aio: fix buggy put_ioctx call in aio_complete - v2</title>
<updated>2007-02-03T19:26:06+00:00</updated>
<author>
<name>Ken Chen</name>
<email>kenchen@google.com</email>
</author>
<published>2007-02-03T09:13:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dee11c2364f51cac53df17d742a0c69097e29a4e'/>
<id>dee11c2364f51cac53df17d742a0c69097e29a4e</id>
<content type='text'>
An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
     [&lt;a000000100577b00&gt;] __mutex_lock_slowpath+0x640/0x6c0
     [&lt;a000000100577ba0&gt;] mutex_lock+0x20/0x40
     [&lt;a0000001000a25b0&gt;] flush_workqueue+0xb0/0x1a0
     [&lt;a00000010018c0c0&gt;] __put_ioctx+0xc0/0x240
     [&lt;a00000010018d470&gt;] aio_complete+0x2f0/0x420
     [&lt;a00000010019cc80&gt;] finished_one_bio+0x200/0x2a0
     [&lt;a00000010019d1c0&gt;] dio_bio_complete+0x1c0/0x200
     [&lt;a00000010019d260&gt;] dio_bio_end_aio+0x60/0x80
     [&lt;a00000010014acd0&gt;] bio_endio+0x110/0x1c0
     [&lt;a0000001002770e0&gt;] __end_that_request_first+0x180/0xba0
     [&lt;a000000100277b90&gt;] end_that_request_chunk+0x30/0x60
     [&lt;a0000002073c0c70&gt;] scsi_end_request+0x50/0x300 [scsi_mod]
     [&lt;a0000002073c1240&gt;] scsi_io_completion+0x200/0x8a0 [scsi_mod]
     [&lt;a0000002074729b0&gt;] sd_rw_intr+0x330/0x860 [sd_mod]
     [&lt;a0000002073b3ac0&gt;] scsi_finish_command+0x100/0x1c0 [scsi_mod]
     [&lt;a0000002073c2910&gt;] scsi_softirq_done+0x230/0x300 [scsi_mod]
     [&lt;a000000100277d20&gt;] blk_done_softirq+0x160/0x1c0
     [&lt;a000000100083e00&gt;] __do_softirq+0x200/0x240
     [&lt;a000000100083eb0&gt;] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=116599593200888&amp;w=2

flush_workqueue() is not allowed to be called in the softirq context.
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug.  It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0                               cpu1
io_destroy                         aio_complete
  wait_for_all_aios {                __aio_put_req
     ...                                 ctx-&gt;reqs_active--;
     if (!ctx-&gt;reqs_active)
        return;
  }
  ...
  put_ioctx(ioctx)

                                     put_ioctx(ctx);
                                        __put_ioctx
                                          bam! Bug trigger!

The real problem is that the condition check of ctx-&gt;reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx.  This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.

Signed-off-by: "Ken Chen" &lt;kenchen@google.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Suparna Bhattacharya &lt;suparna@in.ibm.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Badari Pulavarty &lt;pbadari@us.ibm.com&gt;
Cc: &lt;stable@kernel.org&gt;
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
     [&lt;a000000100577b00&gt;] __mutex_lock_slowpath+0x640/0x6c0
     [&lt;a000000100577ba0&gt;] mutex_lock+0x20/0x40
     [&lt;a0000001000a25b0&gt;] flush_workqueue+0xb0/0x1a0
     [&lt;a00000010018c0c0&gt;] __put_ioctx+0xc0/0x240
     [&lt;a00000010018d470&gt;] aio_complete+0x2f0/0x420
     [&lt;a00000010019cc80&gt;] finished_one_bio+0x200/0x2a0
     [&lt;a00000010019d1c0&gt;] dio_bio_complete+0x1c0/0x200
     [&lt;a00000010019d260&gt;] dio_bio_end_aio+0x60/0x80
     [&lt;a00000010014acd0&gt;] bio_endio+0x110/0x1c0
     [&lt;a0000001002770e0&gt;] __end_that_request_first+0x180/0xba0
     [&lt;a000000100277b90&gt;] end_that_request_chunk+0x30/0x60
     [&lt;a0000002073c0c70&gt;] scsi_end_request+0x50/0x300 [scsi_mod]
     [&lt;a0000002073c1240&gt;] scsi_io_completion+0x200/0x8a0 [scsi_mod]
     [&lt;a0000002074729b0&gt;] sd_rw_intr+0x330/0x860 [sd_mod]
     [&lt;a0000002073b3ac0&gt;] scsi_finish_command+0x100/0x1c0 [scsi_mod]
     [&lt;a0000002073c2910&gt;] scsi_softirq_done+0x230/0x300 [scsi_mod]
     [&lt;a000000100277d20&gt;] blk_done_softirq+0x160/0x1c0
     [&lt;a000000100083e00&gt;] __do_softirq+0x200/0x240
     [&lt;a000000100083eb0&gt;] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=116599593200888&amp;w=2

flush_workqueue() is not allowed to be called in the softirq context.
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug.  It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0                               cpu1
io_destroy                         aio_complete
  wait_for_all_aios {                __aio_put_req
     ...                                 ctx-&gt;reqs_active--;
     if (!ctx-&gt;reqs_active)
        return;
  }
  ...
  put_ioctx(ioctx)

                                     put_ioctx(ctx);
                                        __put_ioctx
                                          bam! Bug trigger!

The real problem is that the condition check of ctx-&gt;reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx.  This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.

Signed-off-by: "Ken Chen" &lt;kenchen@google.com&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: Suparna Bhattacharya &lt;suparna@in.ibm.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Badari Pulavarty &lt;pbadari@us.ibm.com&gt;
Cc: &lt;stable@kernel.org&gt;
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Fix lock inversion aio_kick_handler()</title>
<updated>2006-12-30T18:55:54+00:00</updated>
<author>
<name>Zach Brown</name>
<email>zach.brown@oracle.com</email>
</author>
<published>2006-12-30T00:47:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1ebb1101c556b1915ff041655e629a072e64dcda'/>
<id>1ebb1101c556b1915ff041655e629a072e64dcda</id>
<content type='text'>
lockdep found a AB BC CA lock inversion in retry-based AIO:

1) The task struct's alloc_lock (A) is acquired in process context with
   interrupts enabled.  An interrupt might arrive and call wake_up() which
   grabs the wait queue's q-&gt;lock (B).

2) When performing retry-based AIO the AIO core registers
   aio_wake_function() as the wake funtion for iocb-&gt;ki_wait.  It is called
   with the wait queue's q-&gt;lock (B) held and then tries to add the iocb to
   the run list after acquiring the ctx_lock (C).

3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
   alloc_lock (A) via lock_task() and unuse_mm().  Lockdep emits a warning
   saying that we're trying to connect the irq-safe q-&gt;lock to the
   irq-unsafe alloc_lock via ctx_lock.

This fixes the inversion by calling unuse_mm() in the AIO kick handing path
after we've released the ctx_lock.  As Ben LaHaise pointed out __put_ioctx
could set ctx-&gt;mm to NULL, so we must only access ctx-&gt;mm while we have the
lock.

Signed-off-by: Zach Brown &lt;zach.brown@oracle.com&gt;
Signed-off-by: Suparna Bhattacharya &lt;suparna@in.ibm.com&gt;
Acked-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: "Chen, Kenneth W" &lt;kenneth.w.chen@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
lockdep found a AB BC CA lock inversion in retry-based AIO:

1) The task struct's alloc_lock (A) is acquired in process context with
   interrupts enabled.  An interrupt might arrive and call wake_up() which
   grabs the wait queue's q-&gt;lock (B).

2) When performing retry-based AIO the AIO core registers
   aio_wake_function() as the wake funtion for iocb-&gt;ki_wait.  It is called
   with the wait queue's q-&gt;lock (B) held and then tries to add the iocb to
   the run list after acquiring the ctx_lock (C).

3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
   alloc_lock (A) via lock_task() and unuse_mm().  Lockdep emits a warning
   saying that we're trying to connect the irq-safe q-&gt;lock to the
   irq-unsafe alloc_lock via ctx_lock.

This fixes the inversion by calling unuse_mm() in the AIO kick handing path
after we've released the ctx_lock.  As Ben LaHaise pointed out __put_ioctx
could set ctx-&gt;mm to NULL, so we must only access ctx-&gt;mm while we have the
lock.

Signed-off-by: Zach Brown &lt;zach.brown@oracle.com&gt;
Signed-off-by: Suparna Bhattacharya &lt;suparna@in.ibm.com&gt;
Acked-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: "Chen, Kenneth W" &lt;kenneth.w.chen@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] Use activate_mm() in fs/aio.c:use_mm()</title>
<updated>2006-12-13T17:05:51+00:00</updated>
<author>
<name>Jeremy Fitzhardinge</name>
<email>jeremy@goop.org</email>
</author>
<published>2006-12-13T08:34:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=90aef12e6dd609e1ad7fb70044eedc78ca55ee5e'/>
<id>90aef12e6dd609e1ad7fb70044eedc78ca55ee5e</id>
<content type='text'>
activate_mm() is not the right thing to be using in use_mm().  It should be
switch_mm().

On normal x86, they're synonymous, but for the Xen patches I'm adding a
hook which assumes that activate_mm is only used the first time a new mm
is used after creation (I have another hook for dealing with dup_mm).  I
think this use of activate_mm() is the only place where it could be used
a second time on an mm.

&gt;From a quick look at the other architectures I think this is OK (most
simply implement one in terms of the other), but some are doing some
subtly different stuff between the two.

Acked-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
activate_mm() is not the right thing to be using in use_mm().  It should be
switch_mm().

On normal x86, they're synonymous, but for the Xen patches I'm adding a
hook which assumes that activate_mm is only used the first time a new mm
is used after creation (I have another hook for dealing with dup_mm).  I
think this use of activate_mm() is the only place where it could be used
a second time on an mm.

&gt;From a quick look at the other architectures I think this is OK (most
simply implement one in terms of the other), but some are doing some
subtly different stuff between the two.

Acked-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
