<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs/ordered-data.c, branch v4.16</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Btrfs: rework outstanding_extents</title>
<updated>2017-11-01T19:45:35+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2017-10-19T18:15:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8b62f87bad9cf06e536799bf8cb942ab95f6bfa4'/>
<id>8b62f87bad9cf06e536799bf8cb942ab95f6bfa4</id>
<content type='text'>
Right now we do a lot of weird hoops around outstanding_extents in order
to keep the extent count consistent.  This is because we logically
transfer the outstanding_extent count from the initial reservation
through the set_delalloc_bits.  This makes it pretty difficult to get a
handle on how and when we need to mess with outstanding_extents.

Fix this by revamping the rules of how we deal with outstanding_extents.
Now instead everybody that is holding on to a delalloc extent is
required to increase the outstanding extents count for itself.  This
means we'll have something like this

btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
 btrfs_set_extent_delalloc	- outstanding_extents = 2
btrfs_release_delalloc_extents	- outstanding_extents = 1

for an initial file write.  Now take the append write where we extend an
existing delalloc range but still under the maximum extent size

btrfs_delalloc_reserve_metadata - outstanding_extents = 2
  btrfs_set_extent_delalloc
    btrfs_set_bit_hook		- outstanding_extents = 3
    btrfs_merge_extent_hook	- outstanding_extents = 2
btrfs_delalloc_release_extents	- outstanding_extnets = 1

In order to make the ordered extent transition we of course must now
make ordered extents carry their own outstanding_extent reservation, so
for cow_file_range we end up with

btrfs_add_ordered_extent	- outstanding_extents = 2
clear_extent_bit		- outstanding_extents = 1
btrfs_remove_ordered_extent	- outstanding_extents = 0

This makes all manipulations of outstanding_extents much more explicit.
Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
combined with btrfs_release_delalloc_extents, even in the error case, as
that is the only function that actually modifies the
outstanding_extents counter.

The drawback to this is now we are much more likely to have transient
cases where outstanding_extents is much larger than it actually should
be.  This could happen before as we manipulated the delalloc bits, but
now it happens basically at every write.  This may put more pressure on
the ENOSPC flushing code, but I think making this code simpler is worth
the cost.  I have another change coming to mitigate this side-effect
somewhat.

I also added trace points for the counter manipulation.  These were used
by a bpf script I wrote to help track down leak issues.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Right now we do a lot of weird hoops around outstanding_extents in order
to keep the extent count consistent.  This is because we logically
transfer the outstanding_extent count from the initial reservation
through the set_delalloc_bits.  This makes it pretty difficult to get a
handle on how and when we need to mess with outstanding_extents.

Fix this by revamping the rules of how we deal with outstanding_extents.
Now instead everybody that is holding on to a delalloc extent is
required to increase the outstanding extents count for itself.  This
means we'll have something like this

btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
 btrfs_set_extent_delalloc	- outstanding_extents = 2
btrfs_release_delalloc_extents	- outstanding_extents = 1

for an initial file write.  Now take the append write where we extend an
existing delalloc range but still under the maximum extent size

btrfs_delalloc_reserve_metadata - outstanding_extents = 2
  btrfs_set_extent_delalloc
    btrfs_set_bit_hook		- outstanding_extents = 3
    btrfs_merge_extent_hook	- outstanding_extents = 2
btrfs_delalloc_release_extents	- outstanding_extnets = 1

In order to make the ordered extent transition we of course must now
make ordered extents carry their own outstanding_extent reservation, so
for cow_file_range we end up with

btrfs_add_ordered_extent	- outstanding_extents = 2
clear_extent_bit		- outstanding_extents = 1
btrfs_remove_ordered_extent	- outstanding_extents = 0

This makes all manipulations of outstanding_extents much more explicit.
Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
combined with btrfs_release_delalloc_extents, even in the error case, as
that is the only function that actually modifies the
outstanding_extents counter.

The drawback to this is now we are much more likely to have transient
cases where outstanding_extents is much larger than it actually should
be.  This could happen before as we manipulated the delalloc bits, but
now it happens basically at every write.  This may put more pressure on
the ENOSPC flushing code, but I think making this code simpler is worth
the cost.  I have another change coming to mitigate this side-effect
somewhat.

I also added trace points for the counter manipulation.  These were used
by a bpf script I wrote to help track down leak issues.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: fix integer overflow in calc_reclaim_items_nr</title>
<updated>2017-06-29T18:17:02+00:00</updated>
<author>
<name>Chris Mason</name>
<email>clm@fb.com</email>
</author>
<published>2017-06-23T16:48:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6374e57ad8091b9c2db2eecc536c7f0166ce099e'/>
<id>6374e57ad8091b9c2db2eecc536c7f0166ce099e</id>
<content type='text'>
Dave Jones hit a WARN_ON(nr &lt; 0) in btrfs_wait_ordered_roots() with
v4.12-rc6.  This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number.  It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64-&gt;32 bit overflows.

This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.

Reported-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dave Jones hit a WARN_ON(nr &lt; 0) in btrfs_wait_ordered_roots() with
v4.12-rc6.  This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number.  It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64-&gt;32 bit overflows.

This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.

Reported-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: convert btrfs_ordered_extent.refs from atomic_t to refcount_t</title>
<updated>2017-04-18T12:07:23+00:00</updated>
<author>
<name>Elena Reshetova</name>
<email>elena.reshetova@intel.com</email>
</author>
<published>2017-03-03T08:55:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e76edab7f059bc1047c1865141e2709d70e74852'/>
<id>e76edab7f059bc1047c1865141e2709d70e74852</id>
<content type='text'>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t</title>
<updated>2017-04-18T12:07:23+00:00</updated>
<author>
<name>Elena Reshetova</name>
<email>elena.reshetova@intel.com</email>
</author>
<published>2017-03-03T08:55:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9b64f57ddf8673d29fafb3405d4aa1e93f5a4cd7'/>
<id>9b64f57ddf8673d29fafb3405d4aa1e93f5a4cd7</id>
<content type='text'>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Make btrfs_lookup_ordered_range take btrfs_inode</title>
<updated>2017-02-28T10:30:08+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>n.borisov.lkml@gmail.com</email>
</author>
<published>2017-02-20T11:50:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a776c6fa1feba7a84519170ebdb7f4a4155b89d6'/>
<id>a776c6fa1feba7a84519170ebdb7f4a4155b89d6</id>
<content type='text'>
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: clean up btrfs_ordered_update_i_size</title>
<updated>2017-02-14T14:50:58+00:00</updated>
<author>
<name>Liu Bo</name>
<email>bo.li.liu@oracle.com</email>
</author>
<published>2016-12-13T20:51:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=62c821a8e23ab7bdbc14841fafa3c90e3b057de7'/>
<id>62c821a8e23ab7bdbc14841fafa3c90e3b057de7</id>
<content type='text'>
Since we have a good helper entry_end, use it for ordered extent.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ whitespace reformatting ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since we have a good helper entry_end, use it for ordered extent.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ whitespace reformatting ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix btrfs_ordered_update_i_size to update disk_i_size properly</title>
<updated>2017-02-14T14:50:57+00:00</updated>
<author>
<name>Liu Bo</name>
<email>bo.li.liu@oracle.com</email>
</author>
<published>2016-12-01T21:01:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=19fd2df5b1167eeb6ae8912cdfe9532e3b2c8bbe'/>
<id>19fd2df5b1167eeb6ae8912cdfe9532e3b2c8bbe</id>
<content type='text'>
btrfs_ordered_update_i_size can be called by truncate and endio, but
only endio takes ordered_extent which contains the completed IO.

while truncating down a file, if there are some in-flight IOs,
btrfs_ordered_update_i_size in endio will set disk_i_size to
@orig_offset that is zero.  If truncating-down fails somehow, we try to
recover in memory isize with this zero'd disk_i_size.

Fix it by only updating disk_i_size with @orig_offset when
btrfs_ordered_update_i_size is not called from endio while truncating
down and waiting for in-flight IOs completing their work before recover
in-memory size.

Besides fixing the above issue, add an assertion for last_size to double
check we truncate down to the desired size.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
btrfs_ordered_update_i_size can be called by truncate and endio, but
only endio takes ordered_extent which contains the completed IO.

while truncating down a file, if there are some in-flight IOs,
btrfs_ordered_update_i_size in endio will set disk_i_size to
@orig_offset that is zero.  If truncating-down fails somehow, we try to
recover in memory isize with this zero'd disk_i_size.

Fix it by only updating disk_i_size with @orig_offset when
btrfs_ordered_update_i_size is not called from endio while truncating
down and waiting for in-flight IOs completing their work before recover
in-memory size.

Besides fixing the above issue, add an assertion for last_size to double
check we truncate down to the desired size.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Make btrfs_get_logged_extents take btrfs_inode</title>
<updated>2017-02-14T14:50:55+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>n.borisov.lkml@gmail.com</email>
</author>
<published>2017-01-17T22:31:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=223466370cf6e05f863fbfeda54a3b6c282749c8'/>
<id>223466370cf6e05f863fbfeda54a3b6c282749c8</id>
<content type='text'>
Signed-off-by: Nikolay Borisov &lt;n.borisov.lkml@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Nikolay Borisov &lt;n.borisov.lkml@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: root-&gt;fs_info cleanup, add fs_info convenience variables</title>
<updated>2016-12-06T15:06:59+00:00</updated>
<author>
<name>Jeff Mahoney</name>
<email>jeffm@suse.com</email>
</author>
<published>2016-06-22T22:54:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0b246afa62b0cf5b09d078121f543135f28492ad'/>
<id>0b246afa62b0cf5b09d078121f543135f28492ad</id>
<content type='text'>
In routines where someptr-&gt;fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In routines where someptr-&gt;fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: pull node/sector/stripe sizes out of root and into fs_info</title>
<updated>2016-12-06T15:06:58+00:00</updated>
<author>
<name>Jeff Mahoney</name>
<email>jeffm@suse.com</email>
</author>
<published>2016-06-15T13:22:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=da17066c40472c2d6a1aab7bb0090c3d285531c9'/>
<id>da17066c40472c2d6a1aab7bb0090c3d285531c9</id>
<content type='text'>
We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root-&gt;fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root-&gt;fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney &lt;jeffm@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
