linux.git/fs, branch v3.1-rc5

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

2011-09-02T15:25:23+00:00

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix ->write_inode return values
  xfs: fix xfs_mark_inode_dirty during umount
  xfs: deprecate the nodelaylog mount option

xfs: fix ->write_inode return values

2011-09-01T14:46:11+00:00

Currently we always redirty an inode that was attempted to be written out
synchronously but has been cleaned by an AIL pushed internall, which is
rather bogus.  Fix that by doing the i_update_core check early on and
return 0 for it.  Also include async calls for it, as doing any work for
those is just as pointless.  While we're at it also fix the sign for the
EIO return in case of a filesystem shutdown, and fix the completely
non-sensical locking around xfs_log_inode.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Signed-off-by: Alex Elder 
(cherry picked from commit 297db93bb74cf687510313eb235a7aec14d67e97)

Signed-off-by: Alex Elder

xfs: fix xfs_mark_inode_dirty during umount

2011-08-31T22:59:39+00:00

During umount we do not add a dirty inode to the lru and wait for it to
become clean first, but force writeback of data and metadata with
I_WILL_FREE set.  Currently there is no way for XFS to detect that the
inode has been redirtied for metadata operations, as we skip the
mark_inode_dirty call during teardown.  Fix this by setting i_update_core
nanually in that case, so that the inode gets flushed during inode reclaim.

Alternatively we could enable calling mark_inode_dirty for inodes in
I_WILL_FREE state, and let the VFS dirty tracking handle this.  I decided
against this as we will get better I/O patterns from reclaim compared to
the synchronous writeout in write_inode_now, and always marking the inode
dirty in some way from xfs_mark_inode_dirty is a better safetly net in
either case.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Signed-off-by: Alex Elder 
(cherry picked from commit da6742a5a4cc844a9982fdd936ddb537c0747856)

Signed-off-by: Alex Elder

Merge tag 'for_linus-20110831' of git://github.com/tytso/ext4

2011-08-31T22:08:19+00:00

* tag 'for_linus-20110831' of git://github.com/tytso/ext4:
  ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining

ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining

2011-08-31T15:50:51+00:00

The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places.  In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode().  Instead, we take a different approach to resolve
the software lockup that commit 2581fdc810 intends to fix.  Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero.  However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue.  As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock.  Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Signed-off-by: Jiaying Zhang 
Signed-off-by: "Theodore Ts'o"

All Arch: remove linkage for sys_nfsservctl system call

2011-08-26T22:09:58+00:00

The nfsservctl system call is now gone, so we should remove all
linkage for it.

Signed-off-by: NeilBrown 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Linus Torvalds

lockdep: Add helper function for dir vs file i_mutex annotation

2011-08-25T17:50:18+00:00

Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists.  As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes.  Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

    find/645 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [] might_fault+0x5c/0xac

    but task is already holding lock:
     (&sb->s_type->i_mutex_key#15){+.+.+.}, at: []
    vfs_readdir+0x5b/0xb4

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
          [] lock_acquire+0xbf/0x103
          [] __mutex_lock_common+0x4c/0x361
          [] mutex_lock_nested+0x40/0x45
          [] hugetlbfs_file_mmap+0x82/0x110
          [] mmap_region+0x258/0x432
          [] do_mmap_pgoff+0x2ac/0x306
          [] sys_mmap_pgoff+0x118/0x16a
          [] sys_mmap+0x22/0x24
          [] system_call_fastpath+0x16/0x1b

    -> #0 (&mm->mmap_sem){++++++}:
          [] __lock_acquire+0xa1a/0xcf7
          [] lock_acquire+0xbf/0x103
          [] might_fault+0x89/0xac
          [] filldir+0x6f/0xc7
          [] dcache_readdir+0x67/0x205
          [] vfs_readdir+0x7b/0xb4
          [] sys_getdents+0x7e/0xd1
          [] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.

Signed-off-by: Josh Boyer 
Acked-by: Peter Zijlstra 
Signed-off-by: Linus Torvalds

xfs: deprecate the nodelaylog mount option

2011-08-25T15:30:05+00:00

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Signed-off-by: Alex Elder

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

2011-08-24T16:14:42+00:00

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
  fuse: mark pages accessed when written to
  fuse: delete dead .write_begin and .write_end aops
  fuse: fix flock
  fuse: fix non-ANSI void function notation

fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message

2011-08-24T08:20:17+00:00

FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the
message processing could overrun and result in a "kernel BUG at
fs/fuse/dev.c:629!"

Reported-by: Han-Wen Nienhuys 
Signed-off-by: Miklos Szeredi 
CC: stable@kernel.org