linux-stable.git/fs/dcache.c, branch linux-3.16.y

fs/dcache: move security_d_instantiate() behind attaching dentry to inode

2019-12-19T15:59:03+00:00

During backport 1e2e547a93a "do d_instantiate/unlock_new_inode
combinations safely", there was a error instantiating sequence of
attaching dentry to inode and calling security_d_instantiate().

Before commit ce23e640133 "->getxattr(): pass dentry and inode as
separate arguments" and b96809173e9 "security_d_instantiate(): move to
the point prior to attaching dentry to inode", security_d_instantiate()
should be called beind __d_instantiate(), otherwise it will trigger
below problem when CONFIG_SECURITY_SMACK on ext4 was enabled because
d_inode(dentry) used by ->getxattr() is NULL before __d_instantiate()
instantiate inode.

[   31.858026] BUG: unable to handle kernel paging request at ffffffffffffff70
...
[   31.882024] Call Trace:
[   31.882378]  [] ext4_xattr_get+0x8c/0x3e0
[   31.883195]  [] ext4_xattr_security_get+0x24/0x40
[   31.884086]  [] generic_getxattr+0x5b/0x90
[   31.884907]  [] smk_fetch+0xb4/0x150
[   31.885634]  [] smack_d_instantiate+0x1c2/0x550
[   31.886508]  [] security_d_instantiate+0x3a/0x80
[   31.887389]  [] d_instantiate_new+0x36/0x130
[   31.888223]  [] ext4_mkdir+0x4af/0x6a0
[   31.888928]  [] vfs_mkdir+0x100/0x280
[   31.889536]  [] SyS_mkdir+0xb6/0x170
[   31.890255]  [] ? trace_do_page_fault+0x95/0x2b0
[   31.891134]  [] entry_SYSCALL_64_fastpath+0x18/0x73

Cc:  # 3.16, 4.4
Signed-off-by: zhangyi (F) 
Signed-off-by: Ben Hutchings

fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()

2019-05-02T20:41:33+00:00

commit 1dbd449c9943e3145148cc893c2461b72ba6fef0 upstream.

The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.

The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused.  This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().

To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.

Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
Signed-off-by: Waiman Long 
Reviewed-by: Dave Chinner 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

make sure that __dentry_kill() always invalidates d_seq, unhashed or not

2018-11-20T18:05:56+00:00

commit 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab upstream.

RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.

Reported-by: "Dae R. Jeong" 
Signed-off-by: Al Viro 
Signed-off-by: Ben Hutchings

unify dentry_iput() and dentry_unlink_inode()

2018-11-20T18:05:56+00:00

commit 550dce01dd606c88a837138aa448ccd367fb0cbb upstream.

There is a lot of duplication between dentry_unlink_inode() and dentry_iput().
The only real difference is that dentry_unlink_inode() bumps ->d_seq and
dentry_iput() doesn't.  The argument of the latter is known to have been
unhashed, so anybody who might've found it in RCU lookup would already be
doomed to a ->d_seq mismatch.  And we want to avoid pointless smp_rmb() there.

This patch makes dentry_unlink_inode() bump ->d_seq only for hashed dentries.
It's safe (d_delete() calls that sucker only if we are holding the only
reference to dentry, so rehash is not going to happen) and it allows
to use dentry_unlink_inode() in __dentry_kill() and get rid of dentry_iput().

The interesting question here is profiling; it *is* a hot path, and extra
conditional jumps in there might or might not be painful.

Signed-off-by: Al Viro 
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings

use ->d_seq to get coherency between ->d_inode and ->d_flags

2018-11-20T18:05:56+00:00

commit a528aca7f359f4b0b1d72ae406097e491a5ba9ea upstream.

Games with ordering and barriers are way too brittle.  Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.

Signed-off-by: Al Viro 
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings

VFS: Impose ordering on accesses of d_inode and d_flags

2018-11-20T18:05:55+00:00

commit 4bf46a272647d89e780126b52eda04737defd9f4 upstream.

Impose ordering on accesses of d_inode and d_flags to avoid the need to do
this:

	if (!dentry->d_inode || d_is_negative(dentry)) {

when this:

	if (d_is_negative(dentry)) {

should suffice.

This check is especially problematic if a dentry can have its type field set
to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in
unionmount).

What we really need to do is stick a write barrier between setting d_inode and
setting d_flags and a read barrier between reading d_flags and reading
d_inode.

Signed-off-by: David Howells 
Signed-off-by: Al Viro 
[bwh: Backported to 3.16:
 - Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()
 - There's no DCACHE_FALLTHRU flag]
Signed-off-by: Ben Hutchings

root dentries need RCU-delayed freeing

2018-11-20T18:05:54+00:00

commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream.

Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.

It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock().  If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.

Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Signed-off-by: Al Viro 
Signed-off-by: Ben Hutchings

do d_instantiate/unlock_new_inode combinations safely

2018-10-21T07:46:03+00:00

commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream.

For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Tested-by: Mike Marshall 
Reviewed-by: Andreas Dilger 
Signed-off-by: Al Viro 
[bwh: Backported to 3.16:
 - Drop changes in orangefs
 - Apply similar change to ext3
 - Adjust context]
Signed-off-by: Ben Hutchings

lock_parent() needs to recheck if dentry got __dentry_kill'ed under it

2018-06-16T21:22:24+00:00

commit 3b821409632ab778d46e807516b457dfa72736ed upstream.

In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent.  We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point.  Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.

Signed-off-by: Al Viro 
Signed-off-by: Ben Hutchings

fs/dcache.c: fix spin lockup issue on nlru->lock

2017-10-12T14:28:03+00:00

commit b17c070fb624cf10162cf92ea5e1ec25cd8ac176 upstream.

__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer
duration if there are more number of items in the lru list.  As per the
current code, it can hold the spin lock for upto maximum UINT_MAX
entries at a time.  So if there are more number of items in the lru
list, then "BUG: spinlock lockup suspected" is observed in the below
path:

  spin_bug+0x90
  do_raw_spin_lock+0xfc
  _raw_spin_lock+0x28
  list_lru_add+0x28
  dput+0x1c8
  path_put+0x20
  terminate_walk+0x3c
  path_lookupat+0x100
  filename_lookup+0x6c
  user_path_at_empty+0x54
  SyS_faccessat+0xd0
  el0_svc_naked+0x24

This nlru->lock is acquired by another CPU in this path -

  d_lru_shrink_move+0x34
  dentry_lru_isolate_shrink+0x48
  __list_lru_walk_one.isra.10+0x94
  list_lru_walk_node+0x40
  shrink_dcache_sb+0x60
  do_remount_sb+0xbc
  do_emergency_remount+0xb0
  process_one_work+0x228
  worker_thread+0x2e0
  kthread+0xf4
  ret_from_fork+0x10

Fix this lockup by reducing the number of entries to be shrinked from
the lru list to 1024 at once.  Also, add cond_resched() before
processing the lru list again.

Link: http://marc.info/?t=149722864900001&r=1&w=2
Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala 
Suggested-by: Jan Kara 
Suggested-by: Vladimir Davydov 
Acked-by: Vladimir Davydov 
Cc: Alexander Polakov 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings