<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs/tests, branch v5.7</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>btrfs: kill the subvol_srcu</title>
<updated>2020-03-23T16:02:00+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2020-02-14T21:11:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c75e839414d3610e6487ae3145199c500d55f7f7'/>
<id>c75e839414d3610e6487ae3145199c500d55f7f7</id>
<content type='text'>
Now that we have proper root ref counting everywhere we can kill the
subvol_srcu.

* removal of fs_info::subvol_srcu reduces size of fs_info by 1176 bytes

* the refcount_t used for the references checks for accidental 0-&gt;1
  in cases where the root lifetime would not be properly protected

* there's a leak detector for roots to catch unfreed roots at umount
  time

* SRCU served us well over the years but is was not a proper
  synchronization mechanism for some cases

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update changelog ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we have proper root ref counting everywhere we can kill the
subvol_srcu.

* removal of fs_info::subvol_srcu reduces size of fs_info by 1176 bytes

* the refcount_t used for the references checks for accidental 0-&gt;1
  in cases where the root lifetime would not be properly protected

* there's a leak detector for roots to catch unfreed roots at umount
  time

* SRCU served us well over the years but is was not a proper
  synchronization mechanism for some cases

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update changelog ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: don't take an extra root ref at allocation time</title>
<updated>2020-03-23T16:01:59+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2020-02-14T21:11:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4785e24fa5d23b05bc0f579e10aeb44c4a0a2a3d'/>
<id>4785e24fa5d23b05bc0f579e10aeb44c4a0a2a3d</id>
<content type='text'>
Now that all the users of roots take references for them we can drop the
extra root ref we've been taking.  Before we had roots at 2 refs for the
life of the file system, one for the radix tree, and one simply for
existing.  Now that we have proper ref accounting in all places that use
roots we can drop this extra ref simply for existing as we no longer
need it.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that all the users of roots take references for them we can drop the
extra root ref we've been taking.  Before we had roots at 2 refs for the
life of the file system, one for the radix tree, and one simply for
existing.  Now that we have proper ref accounting in all places that use
roots we can drop this extra ref simply for existing as we no longer
need it.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: move the root freeing stuff into btrfs_put_root</title>
<updated>2020-03-23T16:01:59+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2020-02-14T21:11:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8c38938c7bb096313ad00c2bafa82af37636b0ec'/>
<id>8c38938c7bb096313ad00c2bafa82af37636b0ec</id>
<content type='text'>
There are a few different ways to free roots, either you allocated them
yourself and you just do

free_extent_buffer(root-&gt;node);
free_extent_buffer(root-&gt;commit_node);
btrfs_put_root(root);

Which is the pattern for log roots.  Or for snapshots/subvolumes that
are being dropped you simply call btrfs_free_fs_root() which does all
the cleanup for you.

Unify this all into btrfs_put_root(), so that we don't free up things
associated with the root until the last reference is dropped.  This
makes the root freeing code much more significant.

The only caveat is at close_ctree() time we have to free the extent
buffers for all of our main roots (extent_root, chunk_root, etc) because
we have to drop the btree_inode and we'll run into issues if we hold
onto those nodes until -&gt;kill_sb() time.  This will be addressed in the
future when we kill the btree_inode.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are a few different ways to free roots, either you allocated them
yourself and you just do

free_extent_buffer(root-&gt;node);
free_extent_buffer(root-&gt;commit_node);
btrfs_put_root(root);

Which is the pattern for log roots.  Or for snapshots/subvolumes that
are being dropped you simply call btrfs_free_fs_root() which does all
the cleanup for you.

Unify this all into btrfs_put_root(), so that we don't free up things
associated with the root until the last reference is dropped.  This
makes the root freeing code much more significant.

The only caveat is at close_ctree() time we have to free the extent
buffers for all of our main roots (extent_root, chunk_root, etc) because
we have to drop the btree_inode and we'll run into issues if we hold
onto those nodes until -&gt;kill_sb() time.  This will be addressed in the
future when we kill the btree_inode.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root</title>
<updated>2020-03-23T16:01:33+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2020-01-24T14:33:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0024652895e3479cd0d372f63b57d9581a0bdd38'/>
<id>0024652895e3479cd0d372f63b57d9581a0bdd38</id>
<content type='text'>
We are now using these for all roots, rename them to btrfs_put_root()
and btrfs_grab_root();

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We are now using these for all roots, rename them to btrfs_put_root()
and btrfs_grab_root();

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: add a leak check for roots</title>
<updated>2020-03-23T16:01:33+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2020-01-24T14:33:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bd647ce385ec110fe7796267b6555873e48e44eb'/>
<id>bd647ce385ec110fe7796267b6555873e48e44eb</id>
<content type='text'>
Now that we're going to start relying on getting ref counting right for
roots, add a list to track allocated roots and print out any roots that
aren't freed up at free_fs_info time.

Hide this behind CONFIG_BTRFS_DEBUG because this will just be used for
developers to verify they aren't breaking things.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we're going to start relying on getting ref counting right for
roots, add a list to track allocated roots and print out any roots that
aren't freed up at free_fs_info time.

Hide this behind CONFIG_BTRFS_DEBUG because this will just be used for
developers to verify they aren't breaking things.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: make the init of static elements in fs_info separate</title>
<updated>2020-03-23T16:01:33+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2020-01-24T14:32:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8260edba67a2e6bd5e709d32188e23aa22cb4a38'/>
<id>8260edba67a2e6bd5e709d32188e23aa22cb4a38</id>
<content type='text'>
In adding things like eb leak checking and root leak checking there were
a lot of weird corner cases that come from the fact that

  1) We do not init the fs_info until we get to open_ctree time in the
     normal case and

  2) The test infrastructure half-init's the fs_info for things that it
     needs.

This makes it really annoying to make changes because you have to add
init in two different places, have special cases for testing fs_info's
that may not have certain things initialized, and cases for fs_info's
that didn't make it to open_ctree and thus are not fully set up.

Fix this by extracting out the non-allocating init of the fs info into
it's own public function and use that to make sure we're all getting
consistent views of an allocated fs_info.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In adding things like eb leak checking and root leak checking there were
a lot of weird corner cases that come from the fact that

  1) We do not init the fs_info until we get to open_ctree time in the
     normal case and

  2) The test infrastructure half-init's the fs_info for things that it
     needs.

This makes it really annoying to make changes because you have to add
init in two different places, have special cases for testing fs_info's
that may not have certain things initialized, and cases for fs_info's
that didn't make it to open_ctree and thus are not fully set up.

Fix this by extracting out the non-allocating init of the fs info into
it's own public function and use that to make sure we're all getting
consistent views of an allocated fs_info.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: use btrfs_put_fs_root to free roots always</title>
<updated>2020-03-23T16:01:32+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2020-01-24T14:32:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=81f096edf0472864879e1e5a7f0fdf87ea90fe75'/>
<id>81f096edf0472864879e1e5a7f0fdf87ea90fe75</id>
<content type='text'>
If we are going to track leaked roots we need to free them all the same
way, so don't kfree() roots directly, use btrfs_put_fs_root.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we are going to track leaked roots we need to free them all the same
way, so don't kfree() roots directly, use btrfs_put_fs_root.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Correctly handle empty trees in find_first_clear_extent_bit</title>
<updated>2020-01-31T13:01:29+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2020-01-27T09:59:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5750c37523a2c8cbb450b9ef31e21c2ba876b05e'/>
<id>5750c37523a2c8cbb450b9ef31e21c2ba876b05e</id>
<content type='text'>
Raviu reported that running his regular fs_trim segfaulted with the
following backtrace:

[  237.525947] assertion failed: prev, in ../fs/btrfs/extent_io.c:1595
[  237.525984] ------------[ cut here ]------------
[  237.525985] kernel BUG at ../fs/btrfs/ctree.h:3117!
[  237.525992] invalid opcode: 0000 [#1] SMP PTI
[  237.525998] CPU: 4 PID: 4423 Comm: fstrim Tainted: G     U     OE     5.4.14-8-vanilla #1
[  237.526001] Hardware name: ASUSTeK COMPUTER INC.
[  237.526044] RIP: 0010:assfail.constprop.58+0x18/0x1a [btrfs]
[  237.526079] Call Trace:
[  237.526120]  find_first_clear_extent_bit+0x13d/0x150 [btrfs]
[  237.526148]  btrfs_trim_fs+0x211/0x3f0 [btrfs]
[  237.526184]  btrfs_ioctl_fitrim+0x103/0x170 [btrfs]
[  237.526219]  btrfs_ioctl+0x129a/0x2ed0 [btrfs]
[  237.526227]  ? filemap_map_pages+0x190/0x3d0
[  237.526232]  ? do_filp_open+0xaf/0x110
[  237.526238]  ? _copy_to_user+0x22/0x30
[  237.526242]  ? cp_new_stat+0x150/0x180
[  237.526247]  ? do_vfs_ioctl+0xa4/0x640
[  237.526278]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[  237.526283]  do_vfs_ioctl+0xa4/0x640
[  237.526288]  ? __do_sys_newfstat+0x3c/0x60
[  237.526292]  ksys_ioctl+0x70/0x80
[  237.526297]  __x64_sys_ioctl+0x16/0x20
[  237.526303]  do_syscall_64+0x5a/0x1c0
[  237.526310]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

That was due to btrfs_fs_device::aloc_tree being empty. Initially I
thought this wasn't possible and as a percaution have put the assert in
find_first_clear_extent_bit. Turns out this is indeed possible and could
happen when a file system with SINGLE data/metadata profile has a 2nd
device added. Until balance is run or a new chunk is allocated on this
device it will be completely empty.

In this case find_first_clear_extent_bit should return the full range
[0, -1ULL] and let the caller handle this i.e for trim the end will be
capped at the size of actual device.

Link: https://lore.kernel.org/linux-btrfs/izW2WNyvy1dEDweBICizKnd2KDwDiDyY2EYQr4YCwk7pkuIpthx-JRn65MPBde00ND6V0_Lh8mW0kZwzDiLDv25pUYWxkskWNJnVP0kgdMA=@protonmail.com/
Fixes: 45bfcfc168f8 ("btrfs: Implement find_first_clear_extent_bit")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Raviu reported that running his regular fs_trim segfaulted with the
following backtrace:

[  237.525947] assertion failed: prev, in ../fs/btrfs/extent_io.c:1595
[  237.525984] ------------[ cut here ]------------
[  237.525985] kernel BUG at ../fs/btrfs/ctree.h:3117!
[  237.525992] invalid opcode: 0000 [#1] SMP PTI
[  237.525998] CPU: 4 PID: 4423 Comm: fstrim Tainted: G     U     OE     5.4.14-8-vanilla #1
[  237.526001] Hardware name: ASUSTeK COMPUTER INC.
[  237.526044] RIP: 0010:assfail.constprop.58+0x18/0x1a [btrfs]
[  237.526079] Call Trace:
[  237.526120]  find_first_clear_extent_bit+0x13d/0x150 [btrfs]
[  237.526148]  btrfs_trim_fs+0x211/0x3f0 [btrfs]
[  237.526184]  btrfs_ioctl_fitrim+0x103/0x170 [btrfs]
[  237.526219]  btrfs_ioctl+0x129a/0x2ed0 [btrfs]
[  237.526227]  ? filemap_map_pages+0x190/0x3d0
[  237.526232]  ? do_filp_open+0xaf/0x110
[  237.526238]  ? _copy_to_user+0x22/0x30
[  237.526242]  ? cp_new_stat+0x150/0x180
[  237.526247]  ? do_vfs_ioctl+0xa4/0x640
[  237.526278]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[  237.526283]  do_vfs_ioctl+0xa4/0x640
[  237.526288]  ? __do_sys_newfstat+0x3c/0x60
[  237.526292]  ksys_ioctl+0x70/0x80
[  237.526297]  __x64_sys_ioctl+0x16/0x20
[  237.526303]  do_syscall_64+0x5a/0x1c0
[  237.526310]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

That was due to btrfs_fs_device::aloc_tree being empty. Initially I
thought this wasn't possible and as a percaution have put the assert in
find_first_clear_extent_bit. Turns out this is indeed possible and could
happen when a file system with SINGLE data/metadata profile has a 2nd
device added. Until balance is run or a new chunk is allocated on this
device it will be completely empty.

In this case find_first_clear_extent_bit should return the full range
[0, -1ULL] and let the caller handle this i.e for trim the end will be
capped at the size of actual device.

Link: https://lore.kernel.org/linux-btrfs/izW2WNyvy1dEDweBICizKnd2KDwDiDyY2EYQr4YCwk7pkuIpthx-JRn65MPBde00ND6V0_Lh8mW0kZwzDiLDv25pUYWxkskWNJnVP0kgdMA=@protonmail.com/
Fixes: 45bfcfc168f8 ("btrfs: Implement find_first_clear_extent_bit")
CC: stable@vger.kernel.org # 5.2+
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix race between adding and putting tree mod seq elements and nodes</title>
<updated>2020-01-31T13:01:20+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2020-01-22T12:23:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7227ff4de55d931bbdc156c8ef0ce4f100c78a5b'/>
<id>7227ff4de55d931bbdc156c8ef0ce4f100c78a5b</id>
<content type='text'>
There is a race between adding and removing elements to the tree mod log
list and rbtree that can lead to use-after-free problems.

Consider the following example that explains how/why the problems happens:

1) Task A has mod log element with sequence number 200. It currently is
   the only element in the mod log list;

2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
   access the tree mod log. When it enters the function, it initializes
   'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
   before checking if there are other elements in the mod seq list.
   Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
   unlocks the lock 'tree_mod_seq_lock';

3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
   itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
   sequence number of 201;

4) Some other task, name it task C, modifies a btree and because there
   elements in the mod seq list, it adds a tree mod elem to the tree
   mod log rbtree. That node added to the mod log rbtree is assigned
   a sequence number of 202;

5) Task B, which is doing fiemap and resolving indirect back references,
   calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
   calls tree_mod_log_search() - the search returns the mod log node
   from the rbtree with sequence number 202, created by task C;

6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
   the mod log rbtree and finds the node with sequence number 202. Since
   202 is less than the previously computed 'min_seq', (u64)-1, it
   removes the node and frees it;

7) Task B still has a pointer to the node with sequence number 202, and
   it dereferences the pointer itself and through the call to
   __tree_mod_log_rewind(), resulting in a use-after-free problem.

This issue can be triggered sporadically with the test case generic/561
from fstests, and it happens more frequently with a higher number of
duperemove processes. When it happens to me, it either freezes the VM or
it produces a trace like the following before crashing:

  [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1
  [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [ 1245.321287] RIP: 0010:rb_next+0x16/0x50
  [ 1245.321307] Code: ....
  [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202
  [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b
  [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80
  [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000
  [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038
  [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8
  [ 1245.321539] FS:  00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000
  [ 1245.321591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0
  [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1245.321706] Call Trace:
  [ 1245.321798]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
  [ 1245.321841]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
  [ 1245.321877]  resolve_indirect_refs+0x1eb/0xc60 [btrfs]
  [ 1245.321912]  find_parent_nodes+0x3dc/0x11b0 [btrfs]
  [ 1245.321947]  btrfs_check_shared+0x115/0x1c0 [btrfs]
  [ 1245.321980]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322029]  extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322066]  do_vfs_ioctl+0x45a/0x750
  [ 1245.322081]  ksys_ioctl+0x70/0x80
  [ 1245.322092]  ? trace_hardirqs_off_thunk+0x1a/0x1c
  [ 1245.322113]  __x64_sys_ioctl+0x16/0x20
  [ 1245.322126]  do_syscall_64+0x5c/0x280
  [ 1245.322139]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [ 1245.322155] RIP: 0033:0x7fdee3942dd7
  [ 1245.322177] Code: ....
  [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7
  [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004
  [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44
  [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48
  [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50
  [ 1245.322423] Modules linked in: ....
  [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]---

Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
sequence number and iterates the rbtree while holding the lock
'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
lock, since it is now redundant.

Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions")
Fixes: 097b8a7c9e48e2 ("Btrfs: join tree mod log code with the code holding back delayed refs")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a race between adding and removing elements to the tree mod log
list and rbtree that can lead to use-after-free problems.

Consider the following example that explains how/why the problems happens:

1) Task A has mod log element with sequence number 200. It currently is
   the only element in the mod log list;

2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
   access the tree mod log. When it enters the function, it initializes
   'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
   before checking if there are other elements in the mod seq list.
   Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
   unlocks the lock 'tree_mod_seq_lock';

3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
   itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
   sequence number of 201;

4) Some other task, name it task C, modifies a btree and because there
   elements in the mod seq list, it adds a tree mod elem to the tree
   mod log rbtree. That node added to the mod log rbtree is assigned
   a sequence number of 202;

5) Task B, which is doing fiemap and resolving indirect back references,
   calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
   calls tree_mod_log_search() - the search returns the mod log node
   from the rbtree with sequence number 202, created by task C;

6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
   the mod log rbtree and finds the node with sequence number 202. Since
   202 is less than the previously computed 'min_seq', (u64)-1, it
   removes the node and frees it;

7) Task B still has a pointer to the node with sequence number 202, and
   it dereferences the pointer itself and through the call to
   __tree_mod_log_rewind(), resulting in a use-after-free problem.

This issue can be triggered sporadically with the test case generic/561
from fstests, and it happens more frequently with a higher number of
duperemove processes. When it happens to me, it either freezes the VM or
it produces a trace like the following before crashing:

  [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1
  [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [ 1245.321287] RIP: 0010:rb_next+0x16/0x50
  [ 1245.321307] Code: ....
  [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202
  [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b
  [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80
  [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000
  [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038
  [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8
  [ 1245.321539] FS:  00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000
  [ 1245.321591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0
  [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 1245.321706] Call Trace:
  [ 1245.321798]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
  [ 1245.321841]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
  [ 1245.321877]  resolve_indirect_refs+0x1eb/0xc60 [btrfs]
  [ 1245.321912]  find_parent_nodes+0x3dc/0x11b0 [btrfs]
  [ 1245.321947]  btrfs_check_shared+0x115/0x1c0 [btrfs]
  [ 1245.321980]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322029]  extent_fiemap+0x59d/0x6d0 [btrfs]
  [ 1245.322066]  do_vfs_ioctl+0x45a/0x750
  [ 1245.322081]  ksys_ioctl+0x70/0x80
  [ 1245.322092]  ? trace_hardirqs_off_thunk+0x1a/0x1c
  [ 1245.322113]  __x64_sys_ioctl+0x16/0x20
  [ 1245.322126]  do_syscall_64+0x5c/0x280
  [ 1245.322139]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [ 1245.322155] RIP: 0033:0x7fdee3942dd7
  [ 1245.322177] Code: ....
  [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7
  [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004
  [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44
  [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48
  [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50
  [ 1245.322423] Modules linked in: ....
  [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]---

Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
sequence number and iterates the rbtree while holding the lock
'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
lock, since it is now redundant.

Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions")
Fixes: 097b8a7c9e48e2 ("Btrfs: join tree mod log code with the code holding back delayed refs")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Add self-tests for btrfs_rmap_block</title>
<updated>2020-01-23T16:24:35+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2019-12-10T18:00:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bf2e2eb060fa29dd892fa42dd0a03f130815acf0'/>
<id>bf2e2eb060fa29dd892fa42dd0a03f130815acf0</id>
<content type='text'>
Add RAID1 and single testcases to verify that data stripes are excluded
from super block locations and that the address mapping is valid.

Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update changelog ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add RAID1 and single testcases to verify that data stripes are excluded
from super block locations and that the address mapping is valid.

Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update changelog ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
