linux-stable.git/fs/xfs, branch linux-3.17.y

xfs: track bulkstat progress by agino

2014-11-14T18:10:41+00:00

commit 002758992693ae63c04122603ea9261a0a58d728 upstream.

The bulkstat main loop progress is tracked by the "lastino"
variable, which is a full 64 bit inode. However, the loop actually
works on agno/agino pairs, and so there's a significant disconnect
between the rest of the loop and the main cursor. Convert this to
use the agino, and pass the agino into the chunk formatting function
and convert it too.

This gets rid of the inconsistency in the loop processing, and
finally makes it simple for us to skip inodes at any point in the
loop simply by incrementing the agino cursor.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: bulkstat error handling is broken

2014-11-14T18:10:40+00:00

commit febe3cbe38b0bc0a925906dc90e8d59048851f87 upstream.

The error propagation is a horror - xfs_bulkstat() returns
a rval variable which is only set if there are formatter errors. Any
sort of btree walk error or corruption will cause the bulkstat walk
to terminate but will not pass an error back to userspace. Worse
is the fact that formatter errors will also be ignored if any inodes
were correctly formatted into the user buffer.

Hence bulkstat can fail badly yet still report success to userspace.
This causes significant issues with xfsdump not dumping everything
in the filesystem yet reporting success. It's not until a restore
fails that there is any indication that the dump was bad and tha
bulkstat failed. This patch now triggers xfsdump to fail with
bulkstat errors rather than silently missing files in the dump.

This now causes bulkstat to fail when the lastino cookie does not
fall inside an existing inode chunk. The pre-3.17 code tolerated
that error by allowing the code to move to the next inode chunk
as the agino target is guaranteed to fall into the next btree
record.

With the fixes up to this point in the series, xfsdump now passes on
the troublesome filesystem image that exposes all these bugs.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Greg Kroah-Hartman

xfs: bulkstat main loop logic is a mess

2014-11-14T18:10:40+00:00

commit 6e57c542cb7e0e580eb53ae76a77875c7d92b4b1 upstream.

There are a bunch of variables tha tare more wildy scoped than they
need to be, obfuscated user buffer checks and tortured "next inode"
tracking. This all needs cleaning up to expose the real issues that
need fixing.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: bulkstat chunk-formatter has issues

2014-11-14T18:10:40+00:00

commit 2b831ac6bc87d3cbcbb1a8816827b6923403e461 upstream.

The loop construct has issues:
	- clustidx is completely unused, so remove it.
	- the loop tries to be smart by terminating when the
	  "freecount" tells it that all inodes are free. Just drop
	  it as in most cases we have to scan all inodes in the
	  chunk anyway.
	- move the "user buffer left" condition check to the only
	  point where we consume space int eh user buffer.
	- move the initialisation of agino out of the loop, leaving
	  just a simple loop control logic using the clusteridx.

Also, double handling of the user buffer variables leads to problems
tracking the current state - use the cursor variables directly
rather than keeping local copies and then having to update the
cursor before returning.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: bulkstat chunk formatting cursor is broken

2014-11-14T18:10:40+00:00

commit bf4a5af20d25ecc8876978ad34b8db83b4235f3c upstream.

The xfs_bulkstat_agichunk formatting cursor takes buffer values from
the main loop and passes them via the structure to the chunk
formatter, and the writes the changed values back into the main loop
local variables. Unfortunately, this complex dance is full of corner
cases that aren't handled correctly.

The biggest problem is that it is double handling the information in
both the main loop and the chunk formatting function, leading to
inconsistent updates and endless loops where progress is not made.

To fix this, push the struct xfs_bulkstat_agichunk outwards to be
the primary holder of user buffer information. this removes the
double handling in the main loop.

Also, pass the last inode processed by the chunk formatter as a
separate parameter as it purely an output variable and is not
related to the user buffer consumption cursor.

Finally, the chunk formatting code is not shared by anyone, so make
it local to xfs_itable.c.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: bulkstat btree walk doesn't terminate

2014-11-14T18:10:40+00:00

commit afa947cb52a8e73fe71915a0b0af6fcf98dfbe1a upstream.

The bulkstat code has several different ways of detecting the end of
an AG when doing a walk. They are not consistently detected, and the
code that checks for the end of AG conditions is not consistently
coded. Hence the are conditions where the walk code can get stuck in
an endless loop making no progress and not triggering any
termination conditions.

Convert all the "tmp/i" status return codes from btree operations
to a common name (stat) and apply end-of-ag detection to these
operations consistently.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: Check error during inode btree iteration in xfs_bulkstat()

2014-11-14T18:10:40+00:00

commit 7a19dee116c8fae7ba7a778043c245194289f5a2 upstream.

xfs_bulkstat() doesn't check error return from xfs_btree_increment(). In
case of specific fs corruption that could result in xfs_bulkstat()
entering an infinite loop because we would be looping over the same
chunk over and over again. Fix the problem by checking the return value
and terminating the loop properly.

Coverity-id: 1231338
Signed-off-by: Jan Kara 
Reviewed-by: Jie Liu 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: bulkstat doesn't release AGI buffer on error

2014-11-14T18:10:40+00:00

commit a6bbce54efa9145dbcf3029c885549f7ebc40a3b upstream.

The recent refactoring of the bulkstat code left a small landmine in
the code. If a inobt read fails, then the tree walk is aborted and
returns without releasing the AGI buffer or freeing the cursor. This
can lead to a subsequent bulkstat call hanging trying to grab the
AGI buffer again.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Reviewed-by: Eric Sandeen 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: fix agno increment in xfs_inumbers() loop

2014-10-30T16:43:13+00:00

commit a8b1ee8bafc765ebf029d03c5479a69aebff9693 upstream.

caused a regression in xfs_inumbers, which in turn broke
xfsdump, causing incomplete dumps.

The loop in xfs_inumbers() needs to fill the user-supplied
buffers, and iterates via xfs_btree_increment, reading new
ags as needed.

But the first time through the loop, if xfs_btree_increment()
succeeds, we continue, which triggers the ++agno at the bottom
of the loop, and we skip to soon to the next ag - without
the proper setup under next_ag to read the next ag.

Fix this by removing the agno increment from the loop conditional,
and only increment agno if we have actually hit the code under
the next_ag: target.

Signed-off-by: Eric Sandeen 
Reviewed-by: Dave Chinner 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly

2014-10-30T16:43:13+00:00

commit 0d085a529b427d97710e6a41f8a4f23e1757cd12 upstream.

XFS has been having trouble with stray delayed allocation extents
beyond EOF for a long time. Recent changes to the collapse range
code has triggered erroneous EBUSY errors on page invalidtion for
block size smaller than page size filesystems. These
have been caused by dirty buffers beyond EOF on a partial page which
do not get written to disk during a sync.

The issue is that write-ahead in xfs_cluster_write() finds such a
partial page and handles it by leaving the page dirty but pushing it
into a writeback state. This used to work just fine, as the
write_cache_pages() code would then find the dirty partial page in
the next mapping tree lookup as the dirty tag is still set.

Unfortunately, when we moved to a mark and sweep approach to
writeback to fix other writeback sync issues, we broken this. THe
act of marking the page as under writeback now clears the TOWRITE
tag in the radix tree, even though the page is still dirty. This
causes the TOWRITE tag to be cleared, and hence the next lookup on
the mapping tree does not find the dirty partial page and so doesn't
try to write it again.

This same writeback bug was found recently in ext4 and fixed in
commit 1c8349a ("ext4: fix data integrity sync in ordered mode")
without communication to the wider filesystem community. We can use
exactly the same fix here so the TOWRITE flag is not cleared on
partial page writes.

cc: stable@vger.kernel.org # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e
Root-cause-found-by: Brian Foster 
Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman