<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/fs.h, branch linux-2.6.34.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: prevent concurrent unmap_mapping_range() on the same inode</title>
<updated>2012-05-17T15:21:08+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@suse.cz</email>
</author>
<published>2011-02-23T12:49:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6fb7c40b505e5a2fd57164a7d5bd7f9e2a6a5ede'/>
<id>6fb7c40b505e5a2fd57164a7d5bd7f9e2a6a5ede</id>
<content type='text'>
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular -&gt;d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Reported-by: Michael Leun &lt;lkml20101129@newton.leun.net&gt;
Reported-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Tested-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[PG: Some chunks dropped, since no ebdfed4dc5 in 34; came in at 2.6.37]
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular -&gt;d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Reported-by: Michael Leun &lt;lkml20101129@newton.leun.net&gt;
Reported-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Tested-by: Gurudas Pai &lt;gurudas.pai@oracle.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[PG: Some chunks dropped, since no ebdfed4dc5 in 34; came in at 2.6.37]
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix sget() race with failing mount</title>
<updated>2011-04-17T20:15:34+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2010-08-09T16:05:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0068c623f6fd0630176e224bf929dff4852b5580'/>
<id>0068c623f6fd0630176e224bf929dff4852b5580</id>
<content type='text'>
commit 7a4dec53897ecd3367efb1e12fe8a4edc47dc0e9 upstream.

If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount.  That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock.  deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.

What we need is a way to tell if superblock has been successfully
set up.  Unfortunately, neither -&gt;s_root nor the check for
MS_ACTIVE quite fit.  Cheap and easy way, suitable for backport:
new flag set by the (only) caller of -&gt;get_sb().  If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).

Longer term we want to set that flag in -&gt;get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking -&gt;s_root as we do now).

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7a4dec53897ecd3367efb1e12fe8a4edc47dc0e9 upstream.

If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount.  That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock.  deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.

What we need is a way to tell if superblock has been successfully
set up.  Unfortunately, neither -&gt;s_root nor the check for
MS_ACTIVE quite fit.  Cheap and easy way, suitable for backport:
new flag set by the (only) caller of -&gt;get_sb().  If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).

Longer term we want to set that flag in -&gt;get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking -&gt;s_root as we do now).

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bio, fs: update RWA_MASK, READA and SWRITE to match the corresponding BIO_RW_* bits</title>
<updated>2010-08-13T20:27:23+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-08-03T11:14:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e720aace59112a58fcdbb95234fd61cba2843fe2'/>
<id>e720aace59112a58fcdbb95234fd61cba2843fe2</id>
<content type='text'>
commit aca27ba9618276dd2f777bcd5a1419589ccf1ca8 upstream.

Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits.  READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.

This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again.  A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.

Neil also spotted missing RWA_MASK conversion.

Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-bisected-by: Vladislav Bolkhovitin &lt;vst@vlnb.net&gt;
Root-caused-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit aca27ba9618276dd2f777bcd5a1419589ccf1ca8 upstream.

Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits.  READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.

This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again.  A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.

Neil also spotted missing RWA_MASK conversion.

Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-bisected-by: Vladislav Bolkhovitin &lt;vst@vlnb.net&gt;
Root-caused-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>wrong type for 'magic' argument in simple_fill_super()</title>
<updated>2010-07-05T18:22:50+00:00</updated>
<author>
<name>Roberto Sassu</name>
<email>roberto.sassu@polito.it</email>
</author>
<published>2010-06-03T09:58:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a293c6c6ad7bf4e99a16d43e80cf4e91c8fb2a41'/>
<id>a293c6c6ad7bf4e99a16d43e80cf4e91c8fb2a41</id>
<content type='text'>
commit 7d683a09990ff095a91b6e724ecee0ff8733274a upstream.

It's used to superblock -&gt;s_magic, which is unsigned long.

Signed-off-by: Roberto Sassu &lt;roberto.sassu@polito.it&gt;
Reviewed-by: Mimi Zohar &lt;zohar@us.ibm.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7d683a09990ff095a91b6e724ecee0ff8733274a upstream.

It's used to superblock -&gt;s_magic, which is unsigned long.

Signed-off-by: Roberto Sassu &lt;roberto.sassu@polito.it&gt;
Reviewed-by: Mimi Zohar &lt;zohar@us.ibm.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Cleanup generic block based fiemap</title>
<updated>2010-04-23T17:39:48+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2010-04-23T16:17:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3a3076f4d6e2fa31338a0b007df42a3b32f079e0'/>
<id>3a3076f4d6e2fa31338a0b007df42a3b32f079e0</id>
<content type='text'>
This cleans up a few of the complaints of __generic_block_fiemap.  I've
fixed all the typing stuff, used inline functions instead of macros,
gotten rid of a couple of variables, and made sure the size and block
requests are all block aligned.  It also fixes a problem where sometimes
FIEMAP_EXTENT_LAST wasn't being set properly.

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This cleans up a few of the complaints of __generic_block_fiemap.  I've
fixed all the typing stuff, used inline functions instead of macros,
gotten rid of a couple of variables, and made sure the size and block
requests are all block aligned.  It also fixes a problem where sometimes
FIEMAP_EXTENT_LAST wasn't being set properly.

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: rename block_fsync() to blkdev_fsync()</title>
<updated>2010-04-07T15:38:04+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2010-04-06T21:35:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b1dd3b2843b3b73b7fc2ee47d96310cd1c051371'/>
<id>b1dd3b2843b3b73b7fc2ee47d96310cd1c051371</id>
<content type='text'>
Requested by hch, for consistency now it is exported.

Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Requested by hch, for consistency now it is exported.

Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raw: fsync method is now required</title>
<updated>2010-04-07T15:38:04+00:00</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2010-04-06T21:34:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=55ab3a1ff843e3f0e24d2da44e71bffa5d853010'/>
<id>55ab3a1ff843e3f0e24d2da44e71bffa5d853010</id>
<content type='text'>
Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke
the raw driver.

We now call through generic_file_aio_write -&gt; generic_write_sync -&gt;
vfs_fsync_range.  vfs_fsync_range has:

        if (!fop || !fop-&gt;fsync) {
                ret = -EINVAL;
                goto out;
        }

But drivers/char/raw.c doesn't set an fsync method.

We have two options: fix it or remove the raw driver completely.  I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.

The patch below adds an fsync method to the raw driver.  My knowledge of
the block layer is pretty sketchy so this could do with a once over.

If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Tested-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke
the raw driver.

We now call through generic_file_aio_write -&gt; generic_write_sync -&gt;
vfs_fsync_range.  vfs_fsync_range has:

        if (!fop || !fop-&gt;fsync) {
                ret = -EINVAL;
                goto out;
        }

But drivers/char/raw.c doesn't set an fsync method.

We have two options: fix it or remove the raw driver completely.  I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.

The patch below adds an fsync method to the raw driver.  My knowledge of
the block layer is pretty sketchy so this could do with a once over.

If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Tested-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>include/linux/fs.h: convert FMODE_* constants to hex</title>
<updated>2010-03-06T19:26:25+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2010-03-05T21:42:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=19adf9c5d5793657118f2002237c0ee49c3b6185'/>
<id>19adf9c5d5793657118f2002237c0ee49c3b6185</id>
<content type='text'>
It was tolerable until Eric went and added 8388608.

Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It was tolerable until Eric went and added 8388608.

Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM</title>
<updated>2010-03-06T19:26:25+00:00</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2010-03-05T21:42:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0141450f66c3c12a3aaa869748caa64241885cdf'/>
<id>0141450f66c3c12a3aaa869748caa64241885cdf</id>
<content type='text'>
This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably.  And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;			[2.6.33.x]
Cc: &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably.  And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
Cc: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Cc: &lt;stable@kernel.org&gt;			[2.6.33.x]
Cc: &lt;qbarnes+nfs@yahoo-inc.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pass writeback_control to -&gt;write_inode</title>
<updated>2010-03-05T18:25:52+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-03-05T08:21:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a9185b41a4f84971b930c519f0c63bd450c4810d'/>
<id>a9185b41a4f84971b930c519f0c63bd450c4810d</id>
<content type='text'>
This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
