linux-stable.git/fs/dcache.c, branch linux-3.4.y

dcache: Handle escaped paths in prepend_path

2015-10-22T01:20:08+00:00

commit cde93be45a8a90d8c264c776fab63487b5038a65 upstream.

A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root.  For lack of a better term
I call this an escaped path.

prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.

__d_path only wants to see paths are connected to the root it passes
in.  So __d_path needs prepend_path to return an error.

d_absolute_path similarly wants to see paths that are connected to
some root.  Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1.  So escaped paths will be treated like paths on lazily
unmounted mounts.

getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.

d_path is the interesting hold out.  d_path just wants to print
something, and does not care about the weird cases.  Which raises
the question what should be printed?

Given that / should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths.  As there are not really any meaninful path components when
considered from the perspective of a mount tree.

So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.

Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Al Viro 
Signed-off-by: Zefan Li

d_walk() might skip too much

2015-09-18T01:20:43+00:00

commit 2159184ea01e4ae7d15f2017e296d4bc82d5aeb0 upstream.

when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.

Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.

Signed-off-by: Al Viro 
Signed-off-by: Zefan Li

deal with deadlock in d_walk()

2015-04-14T09:33:58+00:00

commit ca5358ef75fc69fee5322a38a340f5739d997c10 upstream.

... by not hitting rename_retry for reasons other than rename having
happened.  In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill().  Skip the killed siblings instead...

Signed-off-by: Al Viro 
[bwh: Backported to 3.2:
 - As we only have try_to_ascend() and not d_walk(), apply this
   change to all callers of try_to_ascend()
 - Adjust context to make __dentry_kill() apply to d_kill()]
Signed-off-by: Ben Hutchings 
[lizf: Backported to 3.4: fold the fix 2d5a2e6775fa in 3.2.y into this patch]
Signed-off-by: Zefan Li

move d_rcu from overlapping d_child to overlapping d_alias

2015-04-14T09:33:58+00:00

commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro 
[bwh: Backported to 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: Ben Hutchings 
[lizf: Backported to 3.4:
 - adjust context
 - need one more name change in debugfs]

Nest rename_lock inside vfsmount_lock

2013-11-29T18:50:33+00:00

commit 7ea600b5314529f9d1b9d6d3c41cb26fce6a7a4a upstream.

... lest we get livelocks between path_is_under() and d_path() and friends.

The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.

As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.

Spotted-by: Brad Spengler 
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro 
[ zhj: backport to 3.4:
  - Adjust context
  - s/&vfsmount_lock/vfsmount_lock/]
Signed-off-by: Zhao Hongjiang 
Signed-off-by: Greg Kroah-Hartman

vfs: d_obtain_alias() needs to use "/" as default name.

2013-08-15T05:57:08+00:00

commit b911a6bdeef5848c468597d040e3407e0aee04ce upstream.

NFS appears to use d_obtain_alias() to create the root dentry rather than
d_make_root.  This can cause 'prepend_path()' to complain that the root
has a weird name if an NFS filesystem is lazily unmounted.  e.g.  if
"/mnt" is an NFS mount then

 { cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; }

will cause a WARN message like
   WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0()
   ...
   Root dentry has weird name <>

to appear in kernel logs.

So change d_obtain_alias() to use "/" rather than "" as the anonymous
name.

Signed-off-by: NeilBrown 
Cc: Trond Myklebust 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Al Viro 
[bwh: Backported to 3.2: use named initialisers instead of QSTR_INIT()]
Signed-off-by: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman

fs/dcache.c: add cond_resched() to shrink_dcache_parent()

2013-05-08T02:51:56+00:00

commit 421348f1ca0bf17769dee0aed4d991845ae0536d upstream.

Call cond_resched() in shrink_dcache_parent() to maintain interactivity.

Before this patch:

	void shrink_dcache_parent(struct dentry * parent)
	{
		while ((found = select_parent(parent, &dispose)) != 0)
			shrink_dentry_list(&dispose);
	}

select_parent() populates the dispose list with dentries which
shrink_dentry_list() then deletes.  select_parent() carefully uses
need_resched() to avoid doing too much work at once.  But neither
shrink_dcache_parent() nor its called functions call cond_resched().  So
once need_resched() is set select_parent() will return single dentry
dispose list which is then deleted by shrink_dentry_list().  This is
inefficient when there are a lot of dentry to process.  This can cause
softlockup and hurts interactivity on non preemptable kernels.

This change adds cond_resched() in shrink_dcache_parent().  The benefit
of this is that need_resched() is quickly cleared so that future calls
to select_parent() are able to efficiently return a big batch of dentry.

These additional cond_resched() do not seem to impact performance, at
least for the workload below.

Here is a program which can cause soft lockup if other system activity
sets need_resched().

	int main()
	{
	        struct rlimit rlim;
	        int i;
	        int f[100000];
	        char buf[20];
	        struct timeval t1, t2;
	        double diff;

	        /* cleanup past run */
	        system("rm -rf x");

	        /* boost nfile rlimit */
	        rlim.rlim_cur = 200000;
	        rlim.rlim_max = 200000;
	        if (setrlimit(RLIMIT_NOFILE, &rlim))
	                err(1, "setrlimit");

	        /* make directory for files */
	        if (mkdir("x", 0700))
	                err(1, "mkdir");

	        if (gettimeofday(&t1, NULL))
	                err(1, "gettimeofday");

	        /* populate directory with open files */
	        for (i = 0; i < 100000; i++) {
	                snprintf(buf, sizeof(buf), "x/%d", i);
	                f[i] = open(buf, O_CREAT);
	                if (f[i] == -1)
	                        err(1, "open");
	        }

	        /* close some of the files */
	        for (i = 0; i < 85000; i++)
	                close(f[i]);

	        /* unlink all files, even open ones */
	        system("rm -rf x");

	        if (gettimeofday(&t2, NULL))
	                err(1, "gettimeofday");

	        diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) -
	                ((double)t1.tv_sec * 1000000 + t1.tv_usec));

	        printf("done: %g elapsed\n", diff/1e6);
	        return 0;
	}

Signed-off-by: Greg Thelen 
Signed-off-by: Dave Chinner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

vfs: dcache: fix deadlock in tree traversal

2012-10-07T15:32:22+00:00

commit 8110e16d42d587997bcaee0c864179e6d93603fe upstream.

IBM reported a deadlock in select_parent().  This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.

There are two cases when the traversal needs to be restarted:

 1) concurrent d_move(); this can only happen when not already locked,
    since taking rename_lock protects against concurrent d_move().

 2) racing with final d_put() on child just at the moment of ascending
    to parent; rename_lock doesn't protect against this rare race, so it
    can happen when already locked.

Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held.  This patch fixes all three callers of
try_to_ascend().

IBM reported that the deadlock is gone with this patch.

[ I rewrote the patch to be smaller and just do the "goto again" if the
  lock was already held, but credit goes to Miklos for the real work.
   - Linus ]

Signed-off-by: Miklos Szeredi 
Cc: Al Viro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()

2012-10-02T17:29:51+00:00

commit b161dfa6937ae46d50adce8a7c6b12233e96e7bd upstream.

IBM reported a soft lockup after applying the fix for the rename_lock
deadlock.  Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.

The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed.  This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.

This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.

IBM reported successful test results with this patch.

Signed-off-by: Miklos Szeredi 
Cc: Trond Myklebust 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

vfs: make word-at-a-time accesses handle a non-existing page

2012-05-03T21:01:40+00:00

It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that
can have holes in the kernel address space: it seems to happen easily
with Xen, and it looks like the AMD gart64 code will also punch holes
dynamically.

Actually hitting that case is still very unlikely, so just do the
access, and take an exception and fix it up for the very unlikely case
of it being a page-crosser with no next page.

And hey, this abstraction might even help other architectures that have
other issues with unaligned word accesses than the possible missing next
page.  IOW, this could do the byte order magic too.

Peter Anvin fixed a thinko in the shifting for the exception case.

Reported-and-tested-by: Jana Saout 
Cc:  Peter Anvin 
Signed-off-by: Linus Torvalds