linux.git/fs/aio.c, branch v2.6.22

signal/timer/event: KAIO eventfd support example

2007-05-11T15:29:37+00:00

This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd (hence
compatible with POSIX select/poll).  The KAIO code simply signals the eventfd
fd when events are ready, and this triggers a POLLIN in the fd.  This patch
uses a reserved for future use member of the struct iocb to pass an eventfd
file descriptor, that KAIO will use to post events every time a request
completes.  At that point, an aio_getevents() will return the completed result
to a struct io_event.  I made a quick test program to verify the patch, and it
runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and epoll
too.

This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll.  In a typical
scenario, an application would submit KAIO request using aio_submit(), and
will also use epoll_ctl() on the whole other class of devices (that with the
addition of signals, timers and user events, now it's pretty much complete),
and then would:

	epoll_wait(...);
	for_each_event {
		if (curr_event_is_kaiofd) {
			aio_getevents();
			dispatch_aio_events();
		} else {
			dispatch_epoll_event();
		}
	}

Signed-off-by: Davide Libenzi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

unify flush_work/flush_work_keventd and rename it to cancel_work_sync

2007-05-09T19:30:53+00:00

flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov 
Cc: Jeff Garzik 
Cc: "David S. Miller" 
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Auke Kok ,
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

aio: use flush_work()

2007-05-09T19:30:51+00:00

Migrate AIO over to use flush_work().

Cc: "Maciej W. Rozycki" 
Cc: David Howells 
Cc: Zach Brown 
Cc: Benjamin LaHaise 
Cc: Oleg Nesterov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

KMEM_CACHE(): simplify slab cache creation

2007-05-07T19:12:55+00:00

This patch provides a new macro

KMEM_CACHE(, )

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
	int a,b,c;
	struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab.  If it fails then we
panic.

Signed-off-by: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] aio: remove bare user-triggerable error printk

2007-03-28T00:53:25+00:00

The user can generate console output if they cause do_mmap() to fail
during sys_io_setup().  This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown 
Signed-off-by: Linus Torvalds

[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().

2007-02-11T18:51:27+00:00

Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day 
Cc: "Luck, Tony" 
Cc: Andi Kleen 
Cc: Roland McGrath 
Cc: James Bottomley 
Cc: Greg KH 
Acked-by: Joel Becker 
Cc: Steven Whitehouse 
Cc: Jan Kara 
Cc: Michael Halcrow 
Cc: "David S. Miller" 
Cc: Stephen Smalley 
Cc: James Morris 
Cc: Chris Wright 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Remove final references to deprecated "MAP_ANON" page protection flag

2007-02-11T18:51:17+00:00

Remove the last vestiges of the long-deprecated "MAP_ANON" page protection
flag: use "MAP_ANONYMOUS" instead.

Signed-off-by: Robert P. J. Day 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] aio: fix buggy put_ioctx call in aio_complete - v2

2007-02-03T19:26:06+00:00

An AIO bug was reported that sleeping function is being called in softirq
context:

BUG: warning at kernel/mutex.c:132/__mutex_lock_common()
Call Trace:
     [] __mutex_lock_slowpath+0x640/0x6c0
     [] mutex_lock+0x20/0x40
     [] flush_workqueue+0xb0/0x1a0
     [] __put_ioctx+0xc0/0x240
     [] aio_complete+0x2f0/0x420
     [] finished_one_bio+0x200/0x2a0
     [] dio_bio_complete+0x1c0/0x200
     [] dio_bio_end_aio+0x60/0x80
     [] bio_endio+0x110/0x1c0
     [] __end_that_request_first+0x180/0xba0
     [] end_that_request_chunk+0x30/0x60
     [] scsi_end_request+0x50/0x300 [scsi_mod]
     [] scsi_io_completion+0x200/0x8a0 [scsi_mod]
     [] sd_rw_intr+0x330/0x860 [sd_mod]
     [] scsi_finish_command+0x100/0x1c0 [scsi_mod]
     [] scsi_softirq_done+0x230/0x300 [scsi_mod]
     [] blk_done_softirq+0x160/0x1c0
     [] __do_softirq+0x200/0x240
     [] do_softirq+0x70/0xc0

See report: http://marc.theaimsgroup.com/?l=linux-kernel&m=116599593200888&w=2

flush_workqueue() is not allowed to be called in the softirq context.
However, aio_complete() called from I/O interrupt can potentially call
put_ioctx with last ref count on ioctx and triggers bug.  It is simply
incorrect to perform ioctx freeing from aio_complete.

The bug is trigger-able from a race between io_destroy() and aio_complete().
A possible scenario:

cpu0                               cpu1
io_destroy                         aio_complete
  wait_for_all_aios {                __aio_put_req
     ...                                 ctx->reqs_active--;
     if (!ctx->reqs_active)
        return;
  }
  ...
  put_ioctx(ioctx)

                                     put_ioctx(ctx);
                                        __put_ioctx
                                          bam! Bug trigger!

The real problem is that the condition check of ctx->reqs_active in
wait_for_all_aios() is incorrect that access to reqs_active is not
being properly protected by spin lock.

This patch adds that protective spin lock, and at the same time removes
all duplicate ref counting for each kiocb as reqs_active is already used
as a ref count for each active ioctx.  This also ensures that buggy call
to flush_workqueue() in softirq context is eliminated.

Signed-off-by: "Ken Chen" 
Cc: Zach Brown 
Cc: Suparna Bhattacharya 
Cc: Benjamin LaHaise 
Cc: Badari Pulavarty 
Cc: 
Acked-by: Jeff Moyer 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Fix lock inversion aio_kick_handler()

2006-12-30T18:55:54+00:00

lockdep found a AB BC CA lock inversion in retry-based AIO:

1) The task struct's alloc_lock (A) is acquired in process context with
   interrupts enabled.  An interrupt might arrive and call wake_up() which
   grabs the wait queue's q->lock (B).

2) When performing retry-based AIO the AIO core registers
   aio_wake_function() as the wake funtion for iocb->ki_wait.  It is called
   with the wait queue's q->lock (B) held and then tries to add the iocb to
   the run list after acquiring the ctx_lock (C).

3) aio_kick_handler() holds the ctx_lock (C) while acquiring the
   alloc_lock (A) via lock_task() and unuse_mm().  Lockdep emits a warning
   saying that we're trying to connect the irq-safe q->lock to the
   irq-unsafe alloc_lock via ctx_lock.

This fixes the inversion by calling unuse_mm() in the AIO kick handing path
after we've released the ctx_lock.  As Ben LaHaise pointed out __put_ioctx
could set ctx->mm to NULL, so we must only access ctx->mm while we have the
lock.

Signed-off-by: Zach Brown 
Signed-off-by: Suparna Bhattacharya 
Acked-by: Benjamin LaHaise 
Cc: "Chen, Kenneth W" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] Use activate_mm() in fs/aio.c:use_mm()

2006-12-13T17:05:51+00:00

activate_mm() is not the right thing to be using in use_mm().  It should be
switch_mm().

On normal x86, they're synonymous, but for the Xen patches I'm adding a
hook which assumes that activate_mm is only used the first time a new mm
is used after creation (I have another hook for dealing with dup_mm).  I
think this use of activate_mm() is the only place where it could be used
a second time on an mm.

>From a quick look at the other architectures I think this is OK (most
simply implement one in terms of the other), but some are doing some
subtly different stuff between the two.

Acked-by: David Miller 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds