linux-stable.git/include/linux/fs.h, branch linux-2.6.37.y

Fix over-zealous flush_disk when changing device size.

2011-03-07T23:05:12+00:00

commit 93b270f76e7ef3b81001576860c2701931cdc78b upstream.

There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Acked-by: Jeff Moyer 
Cc: Andrew Patterson 
Cc: Jens Axboe 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

mm: prevent concurrent unmap_mapping_range() on the same inode

2011-03-07T23:05:08+00:00

commit 2aa15890f3c191326678f1bd68af61ec6b8753ec upstream.

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]

Signed-off-by: Miklos Szeredi 
Reported-by: Michael Leun 
Reported-by: Gurudas Pai 
Tested-by: Gurudas Pai 
Acked-by: Hugh Dickins 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

Call the filesystem back whenever a page is removed from the page cache

2010-12-02T14:55:21+00:00

NFS needs to be able to release objects that are stored in the page
cache once the page itself is no longer visible from the page cache.

This patch adds a callback to the address space operations that allows
filesystems to perform page cleanups once the page has been removed
from the page cache.

Original patch by: Linus Torvalds 
[trondmy: cover the cases of invalidate_inode_pages2() and
          truncate_inode_pages()]
Signed-off-by: Trond Myklebust

include/linux/fs.h: fix userspace build

2010-11-24T21:50:38+00:00

dpkg uses fiemap but didn't particularly need to include stdint.h so far.
Since 367a51a33902 ("fs: Add FITRIM ioctl"), build of linux/fs.h failed in
dpkg with:

  In file included from ../../src/filesdb.c:27:0:
  /usr/include/linux/fs.h:37:2: error: expected specifier-qualifier-list before 'uint64_t'

Use exportable type __u64 to avoid the dependency on stdint.h.

b31d42a5af18 ("Fix compile brekage with !CONFIG_BLOCK") fixed only the
kernel build by including linux/types.h, but this also fixed "make
headers_check", so don't revert it.

Signed-off-by: Loïc Minier 
Tested-by: Arnd Bergmann 
Cc: Lukas Czerner 
Cc: Dmitry Monakhov 
Cc: Theodore Ts'o 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs: Do not dispatch FITRIM through separate super_operation

2010-11-20T02:18:35+00:00

There was concern that FITRIM ioctl is not common enough to be included
in core vfs ioctl, as Christoph Hellwig pointed out there's no real point
in dispatching this out to a separate vector instead of just through
->ioctl.

So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs
from super_operation structure.

Signed-off-by: Lukas Czerner 
Signed-off-by: "Theodore Ts'o"

locks: remove fl_copy_lock lock_manager operation

2010-10-31T13:35:15+00:00

This one was only used for a nasty hack in nfsd, which has recently
been removed.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Linus Torvalds

locks: fix setlease methods to free passed-in lock

2010-10-31T01:08:15+00:00

We modified setlease to require the caller to allocate the new lease in
the case of creating a new lease, but forgot to fix up the filesystem
methods.

Cc: Steven Whitehouse 
Cc: Steve French 
Cc: Trond Myklebust 
Signed-off-by: J. Bruce Fields 
Acked-by: Arnd Bergmann 
Signed-off-by: Linus Torvalds

readv/writev: do the same MAX_RW_COUNT truncation that read/write does

2010-10-29T17:36:49+00:00

We used to protect against overflow, but rather than return an error, do
what read/write does, namely to limit the total size to MAX_RW_COUNT.
This is not only more consistent, but it also means that any broken
low-level read/write routine that still keeps counts in 'int' can't
break.

Signed-off-by: Linus Torvalds

switch get_sb_ns() users

2010-10-29T08:17:03+00:00

Signed-off-by: Al Viro

convert get_sb_pseudo() users

2010-10-29T08:16:33+00:00

Signed-off-by: Al Viro