<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs, branch v4.15</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2018-01-25T17:03:10+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-01-25T17:03:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=525273fb2e7c84c98e3dffc5b446141bda99ff01'/>
<id>525273fb2e7c84c98e3dffc5b446141bda99ff01</id>
<content type='text'>
Pull btrfs fix from David Sterba:
 "It's been reported recently that readdir can list stale entries under
  some conditions. Fix it."

* tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix stale entries in readdir
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fix from David Sterba:
 "It's been reported recently that readdir can list stale entries under
  some conditions. Fix it."

* tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix stale entries in readdir
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix stale entries in readdir</title>
<updated>2018-01-24T19:27:48+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2018-01-23T20:17:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e4fd493c0541d36953f7b9d3bfced67a1321792f'/>
<id>e4fd493c0541d36953f7b9d3bfced67a1321792f</id>
<content type='text'>
In fixing the readdir+pagefault deadlock I accidentally introduced a
stale entry regression in readdir.  If we get close to full for the
temporary buffer, and then skip a few delayed deletions, and then try to
add another entry that won't fit, we will emit the entries we found and
retry.  Unfortunately we delete entries from our del_list as we find
them, assuming we won't need them.  However our pos will be with
whatever our last entry was, which could be before the delayed deletions
we skipped, so the next search will add the deleted entries back into
our readdir buffer.  So instead don't delete entries we find in our
del_list so we can make sure we always find our delayed deletions.  This
is a slight perf hit for readdir with lots of pending deletions, but
hopefully this isn't a common occurrence.  If it is we can revist this
and optimize it.

cc: stable@vger.kernel.org
Fixes: 23b5ec74943f ("btrfs: fix readdir deadlock with pagefault")
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In fixing the readdir+pagefault deadlock I accidentally introduced a
stale entry regression in readdir.  If we get close to full for the
temporary buffer, and then skip a few delayed deletions, and then try to
add another entry that won't fit, we will emit the entries we found and
retry.  Unfortunately we delete entries from our del_list as we find
them, assuming we won't need them.  However our pos will be with
whatever our last entry was, which could be before the delayed deletions
we skipped, so the next search will add the deleted entries back into
our readdir buffer.  So instead don't delete entries we find in our
del_list so we can make sure we always find our delayed deletions.  This
is a slight perf hit for readdir with lots of pending deletions, but
hopefully this isn't a common occurrence.  If it is we can revist this
and optimize it.

cc: stable@vger.kernel.org
Fixes: 23b5ec74943f ("btrfs: fix readdir deadlock with pagefault")
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2018-01-05T21:02:46+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-01-05T21:02:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=89876f275e8d562912d9c238cd888b52065cf25c'/>
<id>89876f275e8d562912d9c238cd888b52065cf25c</id>
<content type='text'>
Pull btrfs fixes from David Sterba:
 "We have two more fixes for 4.15, both aimed for stable.

  The leak fix is obvious, the second patch fixes a bug revealed by the
  refcount API, when it behaves differently than previous atomic_t and
  reports refs going from 0 to 1 in one case"

* tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
  btrfs: Fix flush bio leak
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fixes from David Sterba:
 "We have two more fixes for 4.15, both aimed for stable.

  The leak fix is obvious, the second patch fixes a bug revealed by the
  refcount API, when it behaves differently than previous atomic_t and
  reports refs going from 0 to 1 in one case"

* tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
  btrfs: Fix flush bio leak
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes</title>
<updated>2018-01-02T17:00:14+00:00</updated>
<author>
<name>Chris Mason</name>
<email>clm@fb.com</email>
</author>
<published>2017-12-15T19:58:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ec35e48b286959991cdbb886f1bdeda4575c80b4'/>
<id>ec35e48b286959991cdbb886f1bdeda4575c80b4</id>
<content type='text'>
refcounts have a generic implementation and an asm optimized one.  The
generic version has extra debugging to make sure that once a refcount
goes to zero, refcount_inc won't increase it.

The btrfs delayed inode code wasn't expecting this, and we're tripping
over the warnings when the generic refcounts are used.  We ended up with
this race:

Process A                                         Process B
                                                  btrfs_get_delayed_node()
						  spin_lock(root-&gt;inode_lock)
						  radix_tree_lookup()
__btrfs_release_delayed_node()
refcount_dec_and_test(&amp;delayed_node-&gt;refs)
our refcount is now zero
						  refcount_add(2) &lt;---
						  warning here, refcount
                                                  unchanged

spin_lock(root-&gt;inode_lock)
radix_tree_delete()

With the generic refcounts, we actually warn again when process B above
tries to release his refcount because refcount_add() turned into a
no-op.

We saw this in production on older kernels without the asm optimized
refcounts.

The fix used here is to use refcount_inc_not_zero() to detect when the
object is in the middle of being freed and return NULL.  This is almost
always the right answer anyway, since we usually end up pitching the
delayed_node if it didn't have fresh data in it.

This also changes __btrfs_release_delayed_node() to remove the extra
check for zero refcounts before radix tree deletion.
btrfs_get_delayed_node() was the only path that was allowing refcounts
to go from zero to one.

Fixes: 6de5f18e7b0da ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node")
CC: &lt;stable@vger.kernel.org&gt; # 4.12+
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
refcounts have a generic implementation and an asm optimized one.  The
generic version has extra debugging to make sure that once a refcount
goes to zero, refcount_inc won't increase it.

The btrfs delayed inode code wasn't expecting this, and we're tripping
over the warnings when the generic refcounts are used.  We ended up with
this race:

Process A                                         Process B
                                                  btrfs_get_delayed_node()
						  spin_lock(root-&gt;inode_lock)
						  radix_tree_lookup()
__btrfs_release_delayed_node()
refcount_dec_and_test(&amp;delayed_node-&gt;refs)
our refcount is now zero
						  refcount_add(2) &lt;---
						  warning here, refcount
                                                  unchanged

spin_lock(root-&gt;inode_lock)
radix_tree_delete()

With the generic refcounts, we actually warn again when process B above
tries to release his refcount because refcount_add() turned into a
no-op.

We saw this in production on older kernels without the asm optimized
refcounts.

The fix used here is to use refcount_inc_not_zero() to detect when the
object is in the middle of being freed and return NULL.  This is almost
always the right answer anyway, since we usually end up pitching the
delayed_node if it didn't have fresh data in it.

This also changes __btrfs_release_delayed_node() to remove the extra
check for zero refcounts before radix tree deletion.
btrfs_get_delayed_node() was the only path that was allowing refcounts
to go from zero to one.

Fixes: 6de5f18e7b0da ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node")
CC: &lt;stable@vger.kernel.org&gt; # 4.12+
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Fix flush bio leak</title>
<updated>2018-01-02T17:00:13+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2017-12-13T11:50:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=beed9263f4000c48a5c48912f26576f6fa091181'/>
<id>beed9263f4000c48a5c48912f26576f6fa091181</id>
<content type='text'>
Commit e0ae99941423 ("btrfs: preallocate device flush bio") reworked
the way the flush bio is allocated and used. Concretely it allocates
the bio in __alloc_device and then re-uses it multiple times with a
very simple endio routine that just calls complete() without consuming
a reference. Allocated bios by default come with a ref count of 1,
which is then consumed by the endio routine (or not, in which case they
should be bio_put by the caller). The way the impleementation works now
is that the flush bio has a refcount of 2 and we only ever bio_put it
once, leaving it to hang indefinitely. Fix this by removing the extra
bio_get in __alloc_device.

Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit e0ae99941423 ("btrfs: preallocate device flush bio") reworked
the way the flush bio is allocated and used. Concretely it allocates
the bio in __alloc_device and then re-uses it multiple times with a
very simple endio routine that just calls complete() without consuming
a reference. Allocated bios by default come with a ref count of 1,
which is then consumed by the endio routine (or not, in which case they
should be bio_put by the caller). The way the impleementation works now
is that the flush bio has a refcount of 2 and we only ever bio_put it
once, leaving it to hang indefinitely. Fix this by removing the extra
bio_get in __alloc_device.

Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2017-12-10T16:30:04+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-12-10T16:30:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=51090c5d6de08cfc86b2d861775dedddd9a2c023'/>
<id>51090c5d6de08cfc86b2d861775dedddd9a2c023</id>
<content type='text'>
Pull btrfs fixes from David Sterba:
 "This contains a few fixes (error handling, quota leak, FUA vs
  nobarrier mount option).

  There's one one worth mentioning separately - an off-by-one fix that
  leads to overwriting first byte of an adjacent page with 0, out of
  bounds of the memory allocated by an ioctl. This is under a privileged
  part of the ioctl, can be triggerd in some subvolume layouts"

* tag 'for-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Fix possible off-by-one in btrfs_search_path_in_tree
  Btrfs: disable FUA if mounted with nobarrier
  btrfs: fix missing error return in btrfs_drop_snapshot
  btrfs: handle errors while updating refcounts in update_ref_for_cow
  btrfs: Fix quota reservation leak on preallocated files
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fixes from David Sterba:
 "This contains a few fixes (error handling, quota leak, FUA vs
  nobarrier mount option).

  There's one one worth mentioning separately - an off-by-one fix that
  leads to overwriting first byte of an adjacent page with 0, out of
  bounds of the memory allocated by an ioctl. This is under a privileged
  part of the ioctl, can be triggerd in some subvolume layouts"

* tag 'for-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Fix possible off-by-one in btrfs_search_path_in_tree
  Btrfs: disable FUA if mounted with nobarrier
  btrfs: fix missing error return in btrfs_drop_snapshot
  btrfs: handle errors while updating refcounts in update_ref_for_cow
  btrfs: Fix quota reservation leak on preallocated files
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Fix possible off-by-one in btrfs_search_path_in_tree</title>
<updated>2017-12-06T23:35:15+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2017-12-01T09:19:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c8bcbfbd239ed60a6562964b58034ac8a25f4c31'/>
<id>c8bcbfbd239ed60a6562964b58034ac8a25f4c31</id>
<content type='text'>
The name char array passed to btrfs_search_path_in_tree is of size
BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes
are in the range of [0, 4079]. Currently the code uses the define but this
represents an off-by-one.

Implications:

Size of btrfs_ioctl_ino_lookup_args is 4096, so the new byte will be
written to extra space, not some padding that could be provided by the
allocator.

btrfs-progs store the arguments on stack, but kernel does own copy of
the ioctl buffer and the off-by-one overwrite does not affect userspace,
but the ending 0 might be lost.

Kernel ioctl buffer is allocated dynamically so we're overwriting
somebody else's memory, and the ioctl is privileged if args.objectid is
not 256. Which is in most cases, but resolving a subvolume stored in
another directory will trigger that path.

Before this patch the buffer was one byte larger, but then the -1 was
not added.

Fixes: ac8e9819d71f907 ("Btrfs: add search and inode lookup ioctls")
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ added implications ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The name char array passed to btrfs_search_path_in_tree is of size
BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes
are in the range of [0, 4079]. Currently the code uses the define but this
represents an off-by-one.

Implications:

Size of btrfs_ioctl_ino_lookup_args is 4096, so the new byte will be
written to extra space, not some padding that could be provided by the
allocator.

btrfs-progs store the arguments on stack, but kernel does own copy of
the ioctl buffer and the off-by-one overwrite does not affect userspace,
but the ending 0 might be lost.

Kernel ioctl buffer is allocated dynamically so we're overwriting
somebody else's memory, and the ioctl is privileged if args.objectid is
not 256. Which is in most cases, but resolving a subvolume stored in
another directory will trigger that path.

Before this patch the buffer was one byte larger, but then the -1 was
not added.

Fixes: ac8e9819d71f907 ("Btrfs: add search and inode lookup ioctls")
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ added implications ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: disable FUA if mounted with nobarrier</title>
<updated>2017-12-06T23:34:45+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-12-06T06:54:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1b9e619c5bc8235cfba3dc4ced2fb0e3554a05d4'/>
<id>1b9e619c5bc8235cfba3dc4ced2fb0e3554a05d4</id>
<content type='text'>
I was seeing disk flushes still happening when I mounted a Btrfs
filesystem with nobarrier for testing. This is because we use FUA to
write out the first super block, and on devices without FUA support, the
block layer translates FUA to a flush. Even on devices supporting true
FUA, using FUA when we asked for no barriers is surprising.

Fixes: 387125fc722a8ed ("Btrfs: fix barrier flushes")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I was seeing disk flushes still happening when I mounted a Btrfs
filesystem with nobarrier for testing. This is because we use FUA to
write out the first super block, and on devices without FUA support, the
block layer translates FUA to a flush. Even on devices supporting true
FUA, using FUA when we asked for no barriers is surprising.

Fixes: 387125fc722a8ed ("Btrfs: fix barrier flushes")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: fix missing error return in btrfs_drop_snapshot</title>
<updated>2017-12-06T23:30:29+00:00</updated>
<author>
<name>Jeff Mahoney</name>
<email>jeffm@suse.com</email>
</author>
<published>2017-12-04T18:11:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e19182c0fff451e3744c1107d98f072e7ca377a0'/>
<id>e19182c0fff451e3744c1107d98f072e7ca377a0</id>
<content type='text'>
If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the
error but then return 0 anyway due to mixing err and ret.

Fixes: 79787eaab4612 ("btrfs: replace many BUG_ONs with proper error handling")
Cc: &lt;stable@vger.kernel.org&gt; # v3.4+
Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the
error but then return 0 anyway due to mixing err and ret.

Fixes: 79787eaab4612 ("btrfs: replace many BUG_ONs with proper error handling")
Cc: &lt;stable@vger.kernel.org&gt; # v3.4+
Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: handle errors while updating refcounts in update_ref_for_cow</title>
<updated>2017-12-06T23:30:03+00:00</updated>
<author>
<name>Jeff Mahoney</name>
<email>jeffm@suse.com</email>
</author>
<published>2017-11-21T18:58:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=692826b2738101549f032a761a9191636e83be4e'/>
<id>692826b2738101549f032a761a9191636e83be4e</id>
<content type='text'>
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans) the assumption that
btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has
been false.  The qgroup operations call into btrfs_search_slot
and friends and can now return the full spectrum of error codes.

Fortunately, the fix here is easy since update_ref_for_cow failing
is already handled so we just need to bail early with the error
code.

Fixes: fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting ...)
Cc: &lt;stable@vger.kernel.org&gt; # v4.11+
Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Reviewed-by: Edmund Nadolski &lt;enadolski@suse.com&gt;
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup
accounting time out of commit trans) the assumption that
btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has
been false.  The qgroup operations call into btrfs_search_slot
and friends and can now return the full spectrum of error codes.

Fortunately, the fix here is easy since update_ref_for_cow failing
is already handled so we just need to bail early with the error
code.

Fixes: fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting ...)
Cc: &lt;stable@vger.kernel.org&gt; # v4.11+
Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Reviewed-by: Edmund Nadolski &lt;enadolski@suse.com&gt;
Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
