linux-stable.git/fs/btrfs/ordered-data.c, branch linux-5.4.y

Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents

2020-02-28T16:22:24+00:00

commit e75fd33b3f744f644061a4f9662bd63f5434f806 upstream.

In btrfs_wait_ordered_range() once we find an ordered extent that has
finished with an error we exit the loop and don't wait for any other
ordered extents that might be still in progress.

All the users of btrfs_wait_ordered_range() expect that there are no more
ordered extents in progress after that function returns. So past fixes
such like the ones from the two following commits:

  ff612ba7849964 ("btrfs: fix panic during relocation after ENOSPC before
                   writeback happens")

  28aeeac1dd3080 ("Btrfs: fix panic when starting bg cache writeout after
                   IO error")

don't work when there are multiple ordered extents in the range.

Fix that by making btrfs_wait_ordered_range() wait for all ordered extents
even after it finds one that had an error.

Link: https://github.com/kdave/btrfs-progs/issues/228#issuecomment-569777554
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo 
Reviewed-by: Josef Bacik 
Signed-off-by: Filipe Manana 
Signed-off-by: David Sterba 
Signed-off-by: Greg Kroah-Hartman

btrfs: get rid of unique workqueue helper functions

2020-01-09T09:20:06+00:00

[ Upstream commit a0cac0ec961f0d42828eeef196ac2246a2f07659 ]

Commit 9e0af2376434 ("Btrfs: fix task hang under heavy compressed
write") worked around the issue that a recycled work item could get a
false dependency on the original work item due to how the workqueue code
guarantees non-reentrancy. It did so by giving different work functions
to different types of work.

However, the fixes in the previous few patches are more complete, as
they prevent a work item from being recycled at all (except for a tiny
window that the kernel workqueue code handles for us). This obsoletes
the previous fix, so we don't need the unique helpers for correctness.
The only other reason to keep them would be so they show up in stack
traces, but they always seem to be optimized to a tail call, so they
don't show up anyways. So, let's just get rid of the extra indirection.

While we're here, rename normal_work_helper() to the more informative
btrfs_work_helper().

Reviewed-by: Nikolay Borisov 
Reviewed-by: Filipe Manana 
Signed-off-by: Omar Sandoval 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba 
Signed-off-by: Sasha Levin

btrfs: move cond_wake_up functions out of ctree

2019-09-09T12:59:15+00:00

The file ctree.h serves as a header for everything and has become quite
bloated. Split some helpers that are generic and create a new file that
should be the catch-all for code that's not btrfs-specific.

Reviewed-by: Johannes Thumshirn 
Signed-off-by: David Sterba

btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range

2019-07-26T10:21:22+00:00

btrfs_lock_and_flush_ordered_range() loads given "*cached_state" into
cachedp, which, in general, is NULL. Then, lock_extent_bits() updates
"cachedp", but it never goes backs to the caller. Thus the caller still
see its "cached_state" to be NULL and never free the state allocated
under btrfs_lock_and_flush_ordered_range(). As a result, we will
see massive state leak with e.g. fstests btrfs/005. Fix this bug by
properly handling the pointers.

Fixes: bd80d94efb83 ("btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range")
Reviewed-by: Nikolay Borisov 
Signed-off-by: Naohiro Aota 
Signed-off-by: David Sterba

btrfs: migrate the delalloc space stuff to it's own home

2019-07-04T15:26:17+00:00

We have code for data and metadata reservations for delalloc.  There's
quite a bit of code here, and it's used in a lot of places so I've
separated it out to it's own file.  inode.c and file.c are already
pretty large, and this code is complicated enough to live in its own
space.

Signed-off-by: Josef Bacik 
Signed-off-by: David Sterba

btrfs: don't assume ordered sums to be 4 bytes

2019-07-01T11:35:00+00:00

BTRFS has the implicit assumption that a checksum in btrfs_orderd_sums
is 4 bytes. While this is true for CRC32C, it is not for any other
checksum.

Change the data type to be a byte array and adjust loop index
calculation accordingly.

This includes moving the adjustment of 'index' by 'ins_size' in
btrfs_csum_file_blocks() before dividing 'ins_size' by the checksum
size, because before this patch the 'sums' member of 'struct
btrfs_ordered_sum' was 4 Bytes in size and afterwards it is only one
byte.

Reviewed-by: Nikolay Borisov 
Signed-off-by: Johannes Thumshirn 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range

2019-07-01T11:34:59+00:00

In case no cached_state argument is passed to
btrfs_lock_and_flush_ordered_range use one locally in the function. This
optimises the case when an ordered extent is found since the unlock
function will be able to unlock that state directly without searching
for it again.

Reviewed-by: Josef Bacik 
Signed-off-by: Nikolay Borisov 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: add new helper btrfs_lock_and_flush_ordered_range

2019-07-01T11:34:59+00:00

There is a certain idiom used in multiple places in btrfs' codebase,
dealing with flushing an ordered range. Factor this in a separate
function that can be reused. Future patches will replace the existing
code with that function.

Reviewed-by: Josef Bacik 
Signed-off-by: Nikolay Borisov 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: track DIO bytes in flight

2019-04-29T17:25:37+00:00

When diagnosing a slowdown of generic/224 I noticed we were not doing
anything when calling into shrink_delalloc().  This is because all
writes in 224 are O_DIRECT, not delalloc, and thus our delalloc_bytes
counter is 0, which short circuits most of the work inside of
shrink_delalloc().  However O_DIRECT writes still consume metadata
resources and generate ordered extents, which we can still wait on.

Fix this by tracking outstanding DIO write bytes, and use this as well
as the delalloc bytes counter to decide if we need to lookup and wait on
any ordered extents.  If we have more DIO writes than delalloc bytes
we'll go ahead and wait on any ordered extents regardless of our flush
state as flushing delalloc is likely to not gain us anything.

Signed-off-by: Josef Bacik 
[ use dio instead of odirect in identifiers ]
Signed-off-by: David Sterba

btrfs: Remove redundant inode argument from btrfs_add_ordered_sum

2019-04-29T17:02:40+00:00

Ordered csums are keyed off of a btrfs_ordered_extent, which already has
a reference to the inode. This implies that an explicit inode argument
is redundant. So remove it.

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Nikolay Borisov 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba