linux-stable.git/include/linux/fs.h, branch linux-2.6.34.y

mm: prevent concurrent unmap_mapping_range() on the same inode

2012-05-17T15:21:08+00:00

commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi 
Reported-by: Michael Leun 
Reported-by: Gurudas Pai 
Tested-by: Gurudas Pai 
Acked-by: Hugh Dickins 
Signed-off-by: Linus Torvalds 
[PG: Some chunks dropped, since no ebdfed4dc5 in 34; came in at 2.6.37]
Signed-off-by: Paul Gortmaker

Fix sget() race with failing mount

2011-04-17T20:15:34+00:00

commit 7a4dec53897ecd3367efb1e12fe8a4edc47dc0e9 upstream.

If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount.  That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock.  deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.

What we need is a way to tell if superblock has been successfully
set up.  Unfortunately, neither ->s_root nor the check for
MS_ACTIVE quite fit.  Cheap and easy way, suitable for backport:
new flag set by the (only) caller of ->get_sb().  If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).

Longer term we want to set that flag in ->get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking ->s_root as we do now).

Signed-off-by: Al Viro 
Signed-off-by: Paul Gortmaker

bio, fs: update RWA_MASK, READA and SWRITE to match the corresponding BIO_RW_* bits

2010-08-13T20:27:23+00:00

commit aca27ba9618276dd2f777bcd5a1419589ccf1ca8 upstream.

Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits.  READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.

This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again.  A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.

Neil also spotted missing RWA_MASK conversion.

Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.

Signed-off-by: Tejun Heo 
Reported-and-bisected-by: Vladislav Bolkhovitin 
Root-caused-by: Neil Brown 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

wrong type for 'magic' argument in simple_fill_super()

2010-07-05T18:22:50+00:00

commit 7d683a09990ff095a91b6e724ecee0ff8733274a upstream.

It's used to superblock ->s_magic, which is unsigned long.

Signed-off-by: Roberto Sassu 
Reviewed-by: Mimi Zohar 
Signed-off-by: Eric Paris 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

Cleanup generic block based fiemap

2010-04-23T17:39:48+00:00

This cleans up a few of the complaints of __generic_block_fiemap.  I've
fixed all the typing stuff, used inline functions instead of macros,
gotten rid of a couple of variables, and made sure the size and block
requests are all block aligned.  It also fixes a problem where sometimes
FIEMAP_EXTENT_LAST wasn't being set properly.

Signed-off-by: Josef Bacik 
Signed-off-by: Linus Torvalds

vfs: rename block_fsync() to blkdev_fsync()

2010-04-07T15:38:04+00:00

Requested by hch, for consistency now it is exported.

Cc: Alexander Viro 
Cc: Anton Blanchard 
Cc: Christoph Hellwig 
Cc: Jan Kara 
Cc: Jeff Moyer 
Cc: Jens Axboe 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

raw: fsync method is now required

2010-04-07T15:38:04+00:00

Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke
the raw driver.

We now call through generic_file_aio_write -> generic_write_sync ->
vfs_fsync_range.  vfs_fsync_range has:

        if (!fop || !fop->fsync) {
                ret = -EINVAL;
                goto out;
        }

But drivers/char/raw.c doesn't set an fsync method.

We have two options: fix it or remove the raw driver completely.  I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.

The patch below adds an fsync method to the raw driver.  My knowledge of
the block layer is pretty sketchy so this could do with a once over.

If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.

Signed-off-by: Anton Blanchard 
Reviewed-by: Jan Kara 
Cc: Christoph Hellwig 
Cc: Alexander Viro 
Cc: Jens Axboe 
Reviewed-by: Jeff Moyer 
Tested-by: Jeff Moyer 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

include/linux/fs.h: convert FMODE_* constants to hex

2010-03-06T19:26:25+00:00

It was tolerable until Eric went and added 8388608.

Cc: Eric Paris 
Cc: Wu Fengguang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM

2010-03-06T19:26:25+00:00

This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably.  And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes 
Signed-off-by: Wu Fengguang 
Cc: Nick Piggin 
Cc: Andi Kleen 
Cc: Steven Whitehouse 
Cc: David Howells 
Cc: Jonathan Corbet 
Cc: Al Viro 
Cc: Christoph Hellwig 
Cc: Trond Myklebust 
Cc: Chuck Lever 
Cc: 			[2.6.33.x]
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

pass writeback_control to ->write_inode

2010-03-05T18:25:52+00:00

This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Al Viro