linux.git/fs/aio.c, branch v2.6.37

new helper: ihold()

2010-10-26T01:26:11+00:00

Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro

aio: bump i_count instead of using igrab

2010-10-26T01:18:23+00:00

The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch.  igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle.  It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.

Signed-off-by: Chris Mason

aio: do not return ERESTARTSYS as a result of AIO

2010-09-23T00:22:39+00:00

OCFS2 can return ERESTARTSYS from its write function when the process is
signalled while waiting for a cluster lock (and the filesystem is mounted
with intr mount option).  Generally, it seems reasonable to allow
filesystems to return this error code from its IO functions.  As we must
not leak ERESTARTSYS (and similar error codes) to userspace as a result of
an AIO operation, we have to properly convert it to EINTR inside AIO code
(restarting the syscall isn't really an option because other AIO could
have been already submitted by the same io_submit syscall).

Signed-off-by: Jan Kara 
Reviewed-by: Jeff Moyer 
Cc: Christoph Hellwig 
Cc: Zach Brown 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

aio: check for multiplication overflow in do_io_submit

2010-09-15T00:02:37+00:00

Tavis Ormandy pointed out that do_io_submit does not do proper bounds
checking on the passed-in iocb array:

       if (unlikely(nr < 0))
               return -EINVAL;

       if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))
               return -EFAULT;                      ^^^^^^^^^^^^^^^^^^

The attached patch checks for overflow, and if it is detected, the
number of iocbs submitted is scaled down to a number that will fit in
the long.  This is an ok thing to do, as sys_io_submit is documented as
returning the number of iocbs submitted, so callers should handle a
return value of less than the 'nr' argument passed in.

Reported-by: Tavis Ormandy 
Signed-off-by: Jeff Moyer 
Signed-off-by: Linus Torvalds

aio: fix wrong subsystem comments

2010-08-05T20:21:23+00:00

 - sys_io_destroy(): acutually return -EINVAL if the context pointed to
   is invalidIndex: linux-2.6.33-rc4/fs/aio.c
 - sys_io_getevents(): An argument specifying timeout is not `when',
   but `timeout'.
 - sys_io_getevents(): Should describe what is returned if this syscall
   succeeds.

Signed-off-by: Satoru Takeuchi 
Signed-off-by: Randy Dunlap 
Reviewed-by: Jeff Moyer 
Signed-off-by: Linus Torvalds

get rid of the magic around f_count in aio

2010-05-28T02:03:07+00:00

__aio_put_req() plays sick games with file refcount.  What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one.  Current code decrements f_count and if it hasn't
hit 0, everything is fine.  Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test().  IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was.  And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic().  Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that.  aio.c converted to it, __fput()
use is gone.  req->ki_file *always* contributes to refcount
now.  And __fput() became static.

Signed-off-by: Al Viro

aio: fix the compat vectored operations

2010-05-27T16:12:53+00:00

The aio compat code was not converting the struct iovecs from 32bit to
64bit pointers, causing either EINVAL to be returned from io_getevents, or
EFAULT as the result of the I/O.  This patch passes a compat flag to
io_submit to signal that pointer conversion is necessary for a given iocb
array.

A variant of this was tested by Michael Tokarev.  I have also updated the
libaio test harness to exercise this code path with good success.
Further, I grabbed a copy of ltp and ran the
testcases/kernel/syscall/readv and writev tests there (compiled with -m32
on my 64bit system).  All seems happy, but extra eyes on this would be
welcome.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
Signed-off-by: Jeff Moyer 
Reported-by: Michael Tokarev 
Cc: Zach Brown 
Cc: 		[2.6.35.1]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

aio: remove unused field

2009-12-16T15:20:13+00:00

Don't know the reason, but it appears ki_wait field of iocb never gets used.

Signed-off-by: Shaohua Li 
Cc: Jeff Moyer 
Cc: Benjamin LaHaise 
Cc: Zach Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

block: move bdi/address_space unplug functions to backing-dev.h

2009-10-29T12:59:26+00:00

There's nothing block related about them, the backing device
is used by things like NFS etc as well. This gets rid of the
need to protect such calls by CONFIG_BLOCK.

Signed-off-by: Jens Axboe

aio: implement request batching

2009-10-28T08:29:25+00:00

Hi,

Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb.  Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).

Signed-off-by: Jeff Moyer 
Signed-off-by: Jens Axboe