linux.git/fs/btrfs/relocation.c, branch v5.14

btrfs: ensure relocation never runs while we have send operations running

2021-06-22T12:11:58+00:00

Relocation and send do not play well together because while send is
running a block group can be relocated, a transaction committed and
the respective disk extents get re-allocated and written to or discarded
while send is about to do something with the extents.

This was explained in commit 9e967495e0e0ae ("Btrfs: prevent send failures
and crashes due to concurrent relocation"), which prevented balance and
send from running in parallel but it did not address one remaining case
where chunk relocation can happen: shrinking a device (and device deletion
which shrinks a device's size to 0 before deleting the device).

We also have now one more case where relocation is triggered: on zoned
filesystems partially used block groups get relocated by a background
thread, introduced in commit 18bb8bbf13c183 ("btrfs: zoned: automatically
reclaim zones").

So make sure that instead of preventing balance from running when there
are ongoing send operations, we prevent relocation from happening.
This uses the infrastructure recently added by a patch that has the
subject: "btrfs: add cancellable chunk relocation support".

Also it adds a spinlock used exclusively for the exclusivity between
send and relocation, as before fs_info->balance_mutex was used, which
would make an attempt to run send to block waiting for balance to
finish, which can take a lot of time on large filesystems.

Signed-off-by: Filipe Manana 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: add cancellable chunk relocation support

2021-06-21T13:19:07+00:00

Add support code that will allow canceling relocation on the chunk
granularity. This is different and independent of balance, that also
uses relocation but is a higher level operation and manages it's own
state and pause/cancellation requests.

Relocation is used for resize (shrink) and device deletion so this will
be a common point to implement cancellation for both. The context is
entirely in btrfs_relocate_block_group and btrfs_recover_relocation,
enclosing one chunk relocation. The status bit is set and unset between
the chunks. As relocation can take long, the effects may not be
immediate and the request and actual action can slightly race.

The fs_info::reloc_cancel_req is only supposed to be increased and does
not pair with decrement like fs_info::balance_cancel_req.

Reviewed-by: Filipe Manana 
Signed-off-by: David Sterba

btrfs: check return value of btrfs_commit_transaction in relocation

2021-04-19T15:25:22+00:00

There are a few places where we don't check the return value of
btrfs_commit_transaction in relocation.c.  Thankfully all these places
have straightforward error handling, so simply change all of the sites
at once.

Reviewed-by: Qu Wenruo 
Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: do proper error handling in merge_reloc_roots

2021-04-19T15:25:22+00:00

We have a BUG_ON() if we get an error back from btrfs_get_fs_root().
This honestly should never fail, as at this point we have a solid
coordination of fs root to reloc root, and these roots will all be in
memory.  But in the name of killing BUG_ON()'s remove these and handle
the error condition properly, ASSERT()'ing for developers.

Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: handle extent corruption with select_one_root properly

2021-04-19T15:25:22+00:00

In corruption cases we could have paths from a block up to no root at
all, and thus we'll BUG_ON(!root) in select_one_root.  Handle this by
adding an ASSERT() for developers, and returning an error for normal
users.

Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: cleanup error handling in prepare_to_merge

2021-04-19T15:25:22+00:00

This probably can't happen even with a corrupt file system, because we
would have failed much earlier on than here.  However there's no reason
we can't just check and bail out as appropriate, so do that and convert
the correctness BUG_ON() to an ASSERT().

Reviewed-by: Qu Wenruo 
Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
[ add comment ]
Signed-off-by: David Sterba

btrfs: do not panic in __add_reloc_root

2021-04-19T15:25:22+00:00

If we have a duplicate entry for a reloc root then we could have fs
corruption that resulted in a double allocation.  Since this shouldn't
happen unless there is corruption, add an ASSERT(ret != -EEXIST) to all
of the callers of __add_reloc_root() to catch any logic mistakes for
developers, otherwise normal error handling will happen for normal
users.

Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: handle __add_reloc_root failures in btrfs_recover_relocation

2021-04-19T15:25:22+00:00

We can already handle errors appropriately from this function, deal with
an error coming from __add_reloc_root appropriately.

Reviewed-by: Qu Wenruo 
Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
[ add comment ]
Signed-off-by: David Sterba

btrfs: do proper error handling in create_reloc_inode

2021-04-19T15:25:21+00:00

We already handle some errors in this function, and the callers do the
correct error handling, so clean up the rest of the function to do the
appropriate error handling.

There's a little extra work that needs to be done here, as we create the
inode item before we create the orphan item.  We could potentially add
the orphan item, but if we failed to create the inode item we would have
to abort the transaction.

Instead add a helper to delete the inode item we created in the case
that we're unable to look up the inode (this would likely be caused by
an ENOMEM), which if it succeeds means we can avoid a transaction abort
in this particular error case.

Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

btrfs: remove the extent item sanity checks in relocate_block_group

2021-04-19T15:25:21+00:00

These checks are all taken care of for us by the tree checker code:

- the flags don't change or are updated consistently
- the v0 extent item format is invalid and caught in many other places
  too

Reviewed-by: Qu Wenruo 
Signed-off-by: Josef Bacik 
Reviewed-by: David Sterba 
[ update changelog ]
Signed-off-by: David Sterba