<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/aio.c, branch v2.6.37</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>new helper: ihold()</title>
<updated>2010-10-26T01:26:11+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2010-10-23T15:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7de9c6ee3ecffd99e1628e81a5ea5468f7581a1f'/>
<id>7de9c6ee3ecffd99e1628e81a5ea5468f7581a1f</id>
<content type='text'>
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: bump i_count instead of using igrab</title>
<updated>2010-10-26T01:18:23+00:00</updated>
<author>
<name>Chris Mason</name>
<email>chris.mason@oracle.com</email>
</author>
<published>2010-08-23T14:47:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=306fb0979443419288594446a348155a8027dcf2'/>
<id>306fb0979443419288594446a348155a8027dcf2</id>
<content type='text'>
The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch.  igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle.  It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.

Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch.  igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle.  It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.

Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: do not return ERESTARTSYS as a result of AIO</title>
<updated>2010-09-23T00:22:39+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2010-09-22T20:05:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a0c42bac79731276c9b2f28d54f9e658fcf843a2'/>
<id>a0c42bac79731276c9b2f28d54f9e658fcf843a2</id>
<content type='text'>
OCFS2 can return ERESTARTSYS from its write function when the process is
signalled while waiting for a cluster lock (and the filesystem is mounted
with intr mount option).  Generally, it seems reasonable to allow
filesystems to return this error code from its IO functions.  As we must
not leak ERESTARTSYS (and similar error codes) to userspace as a result of
an AIO operation, we have to properly convert it to EINTR inside AIO code
(restarting the syscall isn't really an option because other AIO could
have been already submitted by the same io_submit syscall).

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
OCFS2 can return ERESTARTSYS from its write function when the process is
signalled while waiting for a cluster lock (and the filesystem is mounted
with intr mount option).  Generally, it seems reasonable to allow
filesystems to return this error code from its IO functions.  As we must
not leak ERESTARTSYS (and similar error codes) to userspace as a result of
an AIO operation, we have to properly convert it to EINTR inside AIO code
(restarting the syscall isn't really an option because other AIO could
have been already submitted by the same io_submit syscall).

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: check for multiplication overflow in do_io_submit</title>
<updated>2010-09-15T00:02:37+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2010-09-10T21:16:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=75e1c70fc31490ef8a373ea2a4bea2524099b478'/>
<id>75e1c70fc31490ef8a373ea2a4bea2524099b478</id>
<content type='text'>
Tavis Ormandy pointed out that do_io_submit does not do proper bounds
checking on the passed-in iocb array:

       if (unlikely(nr &lt; 0))
               return -EINVAL;

       if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))
               return -EFAULT;                      ^^^^^^^^^^^^^^^^^^

The attached patch checks for overflow, and if it is detected, the
number of iocbs submitted is scaled down to a number that will fit in
the long.  This is an ok thing to do, as sys_io_submit is documented as
returning the number of iocbs submitted, so callers should handle a
return value of less than the 'nr' argument passed in.

Reported-by: Tavis Ormandy &lt;taviso@cmpxchg8b.com&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Tavis Ormandy pointed out that do_io_submit does not do proper bounds
checking on the passed-in iocb array:

       if (unlikely(nr &lt; 0))
               return -EINVAL;

       if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))
               return -EFAULT;                      ^^^^^^^^^^^^^^^^^^

The attached patch checks for overflow, and if it is detected, the
number of iocbs submitted is scaled down to a number that will fit in
the long.  This is an ok thing to do, as sys_io_submit is documented as
returning the number of iocbs submitted, so callers should handle a
return value of less than the 'nr' argument passed in.

Reported-by: Tavis Ormandy &lt;taviso@cmpxchg8b.com&gt;
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: fix wrong subsystem comments</title>
<updated>2010-08-05T20:21:23+00:00</updated>
<author>
<name>Satoru Takeuchi</name>
<email>takeuchi_satoru@jp.fujitsu.com</email>
</author>
<published>2010-08-05T18:23:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=642b5123ac5ec40a28575e930a3e2ff595473e9d'/>
<id>642b5123ac5ec40a28575e930a3e2ff595473e9d</id>
<content type='text'>
 - sys_io_destroy(): acutually return -EINVAL if the context pointed to
   is invalidIndex: linux-2.6.33-rc4/fs/aio.c
 - sys_io_getevents(): An argument specifying timeout is not `when',
   but `timeout'.
 - sys_io_getevents(): Should describe what is returned if this syscall
   succeeds.

Signed-off-by: Satoru Takeuchi &lt;takeuchi_satoru@jp.fujitsu.com&gt;
Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - sys_io_destroy(): acutually return -EINVAL if the context pointed to
   is invalidIndex: linux-2.6.33-rc4/fs/aio.c
 - sys_io_getevents(): An argument specifying timeout is not `when',
   but `timeout'.
 - sys_io_getevents(): Should describe what is returned if this syscall
   succeeds.

Signed-off-by: Satoru Takeuchi &lt;takeuchi_satoru@jp.fujitsu.com&gt;
Signed-off-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>get rid of the magic around f_count in aio</title>
<updated>2010-05-28T02:03:07+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2010-05-26T19:13:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d7065da038227a4d09a244e6014e0186a6bd21d0'/>
<id>d7065da038227a4d09a244e6014e0186a6bd21d0</id>
<content type='text'>
__aio_put_req() plays sick games with file refcount.  What
it wants is fput() from atomic context; it's almost always
done with f_count &gt; 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one.  Current code decrements f_count and if it hasn't
hit 0, everything is fine.  Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test().  IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was.  And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic().  Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that.  aio.c converted to it, __fput()
use is gone.  req-&gt;ki_file *always* contributes to refcount
now.  And __fput() became static.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__aio_put_req() plays sick games with file refcount.  What
it wants is fput() from atomic context; it's almost always
done with f_count &gt; 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one.  Current code decrements f_count and if it hasn't
hit 0, everything is fine.  Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.

Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test().  IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was.  And use normal fput() in delayed work.

I've made that atomic_long_add_unless call a new helper -
fput_atomic().  Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that.  aio.c converted to it, __fput()
use is gone.  req-&gt;ki_file *always* contributes to refcount
now.  And __fput() became static.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: fix the compat vectored operations</title>
<updated>2010-05-27T16:12:53+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2010-05-26T21:44:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9d85cba718efeef9ca00ce3f7f34f5880737aa9b'/>
<id>9d85cba718efeef9ca00ce3f7f34f5880737aa9b</id>
<content type='text'>
The aio compat code was not converting the struct iovecs from 32bit to
64bit pointers, causing either EINVAL to be returned from io_getevents, or
EFAULT as the result of the I/O.  This patch passes a compat flag to
io_submit to signal that pointer conversion is necessary for a given iocb
array.

A variant of this was tested by Michael Tokarev.  I have also updated the
libaio test harness to exercise this code path with good success.
Further, I grabbed a copy of ltp and ran the
testcases/kernel/syscall/readv and writev tests there (compiled with -m32
on my 64bit system).  All seems happy, but extra eyes on this would be
welcome.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Reported-by: Michael Tokarev &lt;mjt@tls.msk.ru&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.35.1]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The aio compat code was not converting the struct iovecs from 32bit to
64bit pointers, causing either EINVAL to be returned from io_getevents, or
EFAULT as the result of the I/O.  This patch passes a compat flag to
io_submit to signal that pointer conversion is necessary for a given iocb
array.

A variant of this was tested by Michael Tokarev.  I have also updated the
libaio test harness to exercise this code path with good success.
Further, I grabbed a copy of ltp and ran the
testcases/kernel/syscall/readv and writev tests there (compiled with -m32
on my 64bit system).  All seems happy, but extra eyes on this would be
welcome.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Reported-by: Michael Tokarev &lt;mjt@tls.msk.ru&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;		[2.6.35.1]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: remove unused field</title>
<updated>2009-12-16T15:20:13+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2009-12-16T00:47:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fac046ad0b1ee2c4244ebf43a26433ef0ea29ae4'/>
<id>fac046ad0b1ee2c4244ebf43a26433ef0ea29ae4</id>
<content type='text'>
Don't know the reason, but it appears ki_wait field of iocb never gets used.

Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Don't know the reason, but it appears ki_wait field of iocb never gets used.

Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Zach Brown &lt;zach.brown@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: move bdi/address_space unplug functions to backing-dev.h</title>
<updated>2009-10-29T12:59:26+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-10-29T12:59:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b9d128f1088ea5245109dfc9bbceb128b6371a77'/>
<id>b9d128f1088ea5245109dfc9bbceb128b6371a77</id>
<content type='text'>
There's nothing block related about them, the backing device
is used by things like NFS etc as well. This gets rid of the
need to protect such calls by CONFIG_BLOCK.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's nothing block related about them, the backing device
is used by things like NFS etc as well. This gets rid of the
need to protect such calls by CONFIG_BLOCK.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: implement request batching</title>
<updated>2009-10-28T08:29:25+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2009-10-02T22:57:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cfb1e33eed48165763edc7a4a067cf5f74898d0b'/>
<id>cfb1e33eed48165763edc7a4a067cf5f74898d0b</id>
<content type='text'>
Hi,

Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb.  Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hi,

Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb.  Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
