<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs/ordered-data.c, branch v4.6</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>btrfs: Fix misspellings in comments.</title>
<updated>2016-03-14T14:05:02+00:00</updated>
<author>
<name>Adam Buchbinder</name>
<email>adam.buchbinder@gmail.com</email>
</author>
<published>2016-03-04T19:23:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bb7ab3b92e46da06b580c6f83abe7894dc449cca'/>
<id>bb7ab3b92e46da06b580c6f83abe7894dc449cca</id>
<content type='text'>
Signed-off-by: Adam Buchbinder &lt;adam.buchbinder@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Adam Buchbinder &lt;adam.buchbinder@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: move btrfs_compression_type to compression.h</title>
<updated>2016-03-11T16:12:46+00:00</updated>
<author>
<name>Anand Jain</name>
<email>anand.jain@oracle.com</email>
</author>
<published>2016-03-10T09:26:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ebb8765b2ded869b75bf5154b048119eb52571f7'/>
<id>ebb8765b2ded869b75bf5154b048119eb52571f7</id>
<content type='text'>
So that its better organized.

Signed-off-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
So that its better organized.

Signed-off-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: drop null testing before destroy functions</title>
<updated>2016-02-18T10:46:03+00:00</updated>
<author>
<name>Kinglong Mee</name>
<email>kinglongmee@gmail.com</email>
</author>
<published>2016-01-29T13:36:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5598e9005a4076d6700bbd89d0cdbe5b2922a846'/>
<id>5598e9005a4076d6700bbd89d0cdbe5b2922a846</id>
<content type='text'>
Cleanup.

kmem_cache_destroy has support NULL argument checking,
so drop the double null testing before calling it.

Signed-off-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cleanup.

kmem_cache_destroy has support NULL argument checking,
so drop the double null testing before calling it.

Signed-off-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: change how we wait for pending ordered extents</title>
<updated>2015-10-22T01:51:40+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2015-09-24T20:17:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=161c3549b45aeef05451b6822d8aaaf39c7bedce'/>
<id>161c3549b45aeef05451b6822d8aaaf39c7bedce</id>
<content type='text'>
We have a mechanism to make sure we don't lose updates for ordered extents that
were logged in the transaction that is currently running.  We add the ordered
extent to a transaction list and then the transaction waits on all the ordered
extents in that list.  However are substantially large file systems this list
can be extremely large, and can give us soft lockups, since the ordered extents
don't remove themselves from the list when they do complete.

To fix this we simply add a counter to the transaction that is incremented any
time we have a logged extent that needs to be completed in the current
transaction.  Then when the ordered extent finally completes it decrements the
per transaction counter and wakes up the transaction if we are the last ones.
This will eliminate the softlockup.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have a mechanism to make sure we don't lose updates for ordered extents that
were logged in the transaction that is currently running.  We add the ordered
extent to a transaction list and then the transaction waits on all the ordered
extents in that list.  However are substantially large file systems this list
can be extremely large, and can give us soft lockups, since the ordered extents
don't remove themselves from the list when they do complete.

To fix this we simply add a counter to the transaction that is incremented any
time we have a logged extent that needs to be completed in the current
transaction.  Then when the ordered extent finally completes it decrements the
per transaction counter and wakes up the transaction if we are the last ones.
This will eliminate the softlockup.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: add comments to barriers before waitqueue_active</title>
<updated>2015-10-10T16:40:04+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2015-02-16T18:36:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a83342aa0c8f0ca90057d3837ae8d198186e5153'/>
<id>a83342aa0c8f0ca90057d3837ae8d198186e5153</id>
<content type='text'>
Reduce number of undocumented barriers out there.

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reduce number of undocumented barriers out there.

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix memory corruption on failure to submit bio for direct IO</title>
<updated>2015-07-02T00:17:18+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2015-07-01T11:13:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=61de718fceb6bc028dafe4d06a1f87a9e0998303'/>
<id>61de718fceb6bc028dafe4d06a1f87a9e0998303</id>
<content type='text'>
If we fail to submit a bio for a direct IO request, we were grabbing the
corresponding ordered extent and decrementing its reference count twice,
once for our lookup reference and once for the ordered tree reference.
This was a problem because it caused the ordered extent to be freed
without removing it from the ordered tree and any lists it might be
attached to, leaving dangling pointers to the ordered extent around.
Example trace with CONFIG_DEBUG_PAGEALLOC=y:

[161779.858707] BUG: unable to handle kernel paging request at 0000000087654330
[161779.859983] IP: [&lt;ffffffff8124ca68&gt;] rb_prev+0x22/0x3b
[161779.860636] PGD 34d818067 PUD 0
[161779.860636] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
(...)
[161779.860636] Call Trace:
[161779.860636]  [&lt;ffffffffa06b36a6&gt;] __tree_search+0xd9/0xf9 [btrfs]
[161779.860636]  [&lt;ffffffffa06b3708&gt;] tree_search+0x42/0x63 [btrfs]
[161779.860636]  [&lt;ffffffffa06b4868&gt;] ? btrfs_lookup_ordered_range+0x2d/0xa5 [btrfs]
[161779.860636]  [&lt;ffffffffa06b4873&gt;] btrfs_lookup_ordered_range+0x38/0xa5 [btrfs]
[161779.860636]  [&lt;ffffffffa06aab8e&gt;] btrfs_get_blocks_direct+0x11b/0x615 [btrfs]
[161779.860636]  [&lt;ffffffff8119727f&gt;] do_blockdev_direct_IO+0x5ff/0xb43
[161779.860636]  [&lt;ffffffffa06aaa73&gt;] ? btrfs_page_exists_in_range+0x1ad/0x1ad [btrfs]
[161779.860636]  [&lt;ffffffffa06a2c9a&gt;] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [&lt;ffffffff811977f5&gt;] __blockdev_direct_IO+0x32/0x34
[161779.860636]  [&lt;ffffffffa06a2c9a&gt;] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [&lt;ffffffffa06a10ae&gt;] btrfs_direct_IO+0x198/0x21f [btrfs]
[161779.860636]  [&lt;ffffffffa06a2c9a&gt;] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [&lt;ffffffff81112ca1&gt;] generic_file_direct_write+0xb3/0x128
[161779.860636]  [&lt;ffffffffa06affaa&gt;] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
[161779.860636]  [&lt;ffffffffa06b004c&gt;] btrfs_file_write_iter+0x201/0x3e0 [btrfs]
(...)

We were also not freeing the btrfs_dio_private we allocated previously,
which kmemleak reported with the following trace in its sysfs file:

unreferenced object 0xffff8803f553bf80 (size 96):
  comm "xfs_io", pid 4501, jiffies 4295039588 (age 173.936s)
  hex dump (first 32 bytes):
    88 6c 9b f5 02 88 ff ff 00 00 00 00 00 00 00 00  .l..............
    00 00 00 00 00 00 00 00 00 00 c4 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff81161ffe&gt;] create_object+0x172/0x29a
    [&lt;ffffffff8145870f&gt;] kmemleak_alloc+0x25/0x41
    [&lt;ffffffff81154e64&gt;] kmemleak_alloc_recursive.constprop.40+0x16/0x18
    [&lt;ffffffff811579ed&gt;] kmem_cache_alloc_trace+0xfb/0x148
    [&lt;ffffffffa03d8cff&gt;] btrfs_submit_direct+0x65/0x16a [btrfs]
    [&lt;ffffffff811968dc&gt;] dio_bio_submit+0x62/0x8f
    [&lt;ffffffff811975fe&gt;] do_blockdev_direct_IO+0x97e/0xb43
    [&lt;ffffffff811977f5&gt;] __blockdev_direct_IO+0x32/0x34
    [&lt;ffffffffa03d70ae&gt;] btrfs_direct_IO+0x198/0x21f [btrfs]
    [&lt;ffffffff81112ca1&gt;] generic_file_direct_write+0xb3/0x128
    [&lt;ffffffffa03e604d&gt;] btrfs_file_write_iter+0x201/0x3e0 [btrfs]
    [&lt;ffffffff8116586a&gt;] __vfs_write+0x7c/0xa5
    [&lt;ffffffff81165da9&gt;] vfs_write+0xa0/0xe4
    [&lt;ffffffff81166675&gt;] SyS_pwrite64+0x64/0x82
    [&lt;ffffffff81464fd7&gt;] system_call_fastpath+0x12/0x6f
    [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

For read requests we weren't doing any cleanup either (none of the work
done by btrfs_endio_direct_read()), so a failure submitting a bio for a
read request would leave a range in the inode's io_tree locked forever,
blocking any future operations (both reads and writes) against that range.

So fix this by making sure we do the same cleanup that we do for the case
where the bio submission succeeds.

Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we fail to submit a bio for a direct IO request, we were grabbing the
corresponding ordered extent and decrementing its reference count twice,
once for our lookup reference and once for the ordered tree reference.
This was a problem because it caused the ordered extent to be freed
without removing it from the ordered tree and any lists it might be
attached to, leaving dangling pointers to the ordered extent around.
Example trace with CONFIG_DEBUG_PAGEALLOC=y:

[161779.858707] BUG: unable to handle kernel paging request at 0000000087654330
[161779.859983] IP: [&lt;ffffffff8124ca68&gt;] rb_prev+0x22/0x3b
[161779.860636] PGD 34d818067 PUD 0
[161779.860636] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
(...)
[161779.860636] Call Trace:
[161779.860636]  [&lt;ffffffffa06b36a6&gt;] __tree_search+0xd9/0xf9 [btrfs]
[161779.860636]  [&lt;ffffffffa06b3708&gt;] tree_search+0x42/0x63 [btrfs]
[161779.860636]  [&lt;ffffffffa06b4868&gt;] ? btrfs_lookup_ordered_range+0x2d/0xa5 [btrfs]
[161779.860636]  [&lt;ffffffffa06b4873&gt;] btrfs_lookup_ordered_range+0x38/0xa5 [btrfs]
[161779.860636]  [&lt;ffffffffa06aab8e&gt;] btrfs_get_blocks_direct+0x11b/0x615 [btrfs]
[161779.860636]  [&lt;ffffffff8119727f&gt;] do_blockdev_direct_IO+0x5ff/0xb43
[161779.860636]  [&lt;ffffffffa06aaa73&gt;] ? btrfs_page_exists_in_range+0x1ad/0x1ad [btrfs]
[161779.860636]  [&lt;ffffffffa06a2c9a&gt;] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [&lt;ffffffff811977f5&gt;] __blockdev_direct_IO+0x32/0x34
[161779.860636]  [&lt;ffffffffa06a2c9a&gt;] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [&lt;ffffffffa06a10ae&gt;] btrfs_direct_IO+0x198/0x21f [btrfs]
[161779.860636]  [&lt;ffffffffa06a2c9a&gt;] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [&lt;ffffffff81112ca1&gt;] generic_file_direct_write+0xb3/0x128
[161779.860636]  [&lt;ffffffffa06affaa&gt;] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
[161779.860636]  [&lt;ffffffffa06b004c&gt;] btrfs_file_write_iter+0x201/0x3e0 [btrfs]
(...)

We were also not freeing the btrfs_dio_private we allocated previously,
which kmemleak reported with the following trace in its sysfs file:

unreferenced object 0xffff8803f553bf80 (size 96):
  comm "xfs_io", pid 4501, jiffies 4295039588 (age 173.936s)
  hex dump (first 32 bytes):
    88 6c 9b f5 02 88 ff ff 00 00 00 00 00 00 00 00  .l..............
    00 00 00 00 00 00 00 00 00 00 c4 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff81161ffe&gt;] create_object+0x172/0x29a
    [&lt;ffffffff8145870f&gt;] kmemleak_alloc+0x25/0x41
    [&lt;ffffffff81154e64&gt;] kmemleak_alloc_recursive.constprop.40+0x16/0x18
    [&lt;ffffffff811579ed&gt;] kmem_cache_alloc_trace+0xfb/0x148
    [&lt;ffffffffa03d8cff&gt;] btrfs_submit_direct+0x65/0x16a [btrfs]
    [&lt;ffffffff811968dc&gt;] dio_bio_submit+0x62/0x8f
    [&lt;ffffffff811975fe&gt;] do_blockdev_direct_IO+0x97e/0xb43
    [&lt;ffffffff811977f5&gt;] __blockdev_direct_IO+0x32/0x34
    [&lt;ffffffffa03d70ae&gt;] btrfs_direct_IO+0x198/0x21f [btrfs]
    [&lt;ffffffff81112ca1&gt;] generic_file_direct_write+0xb3/0x128
    [&lt;ffffffffa03e604d&gt;] btrfs_file_write_iter+0x201/0x3e0 [btrfs]
    [&lt;ffffffff8116586a&gt;] __vfs_write+0x7c/0xa5
    [&lt;ffffffff81165da9&gt;] vfs_write+0xa0/0xe4
    [&lt;ffffffff81166675&gt;] SyS_pwrite64+0x64/0x82
    [&lt;ffffffff81464fd7&gt;] system_call_fastpath+0x12/0x6f
    [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

For read requests we weren't doing any cleanup either (none of the work
done by btrfs_endio_direct_read()), so a failure submitting a bio for a
read request would leave a range in the inode's io_tree locked forever,
blocking any future operations (both reads and writes) against that range.

So fix this by making sure we do the same cleanup that we do for the case
where the bio submission succeeds.

Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: don't attach unnecessary extents to transaction on fsync</title>
<updated>2015-06-10T14:02:44+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2015-04-17T16:08:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7558c8bc17481c1f856e009af8503ab40fec348a'/>
<id>7558c8bc17481c1f856e009af8503ab40fec348a</id>
<content type='text'>
We don't need to attach ordered extents that have completed to the current
transaction. Doing so only makes us hold memory for longer than necessary
and delaying the iput of the inode until the transaction is committed (for
each created ordered extent we do an igrab and then schedule an asynchronous
iput when the ordered extent's reference count drops to 0), preventing the
inode from being evictable until the transaction commits.

Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't need to attach ordered extents that have completed to the current
transaction. Doing so only makes us hold memory for longer than necessary
and delaying the iput of the inode until the transaction is committed (for
each created ordered extent we do an igrab and then schedule an asynchronous
iput when the ordered extent's reference count drops to 0), preventing the
inode from being evictable until the transaction commits.

Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: avoid syncing log in the fast fsync path when not necessary</title>
<updated>2015-06-10T14:02:43+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2015-03-31T13:16:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b659ef027792219b590d67a2baf1643a93727d29'/>
<id>b659ef027792219b590d67a2baf1643a93727d29</id>
<content type='text'>
Commit 3a8b36f37806 ("Btrfs: fix data loss in the fast fsync path") added
a performance regression for that causes an unnecessary sync of the log
trees (fs/subvol and root log trees) when 2 consecutive fsyncs are done
against a file, without no writes or any metadata updates to the inode in
between them and if a transaction is committed before the second fsync is
called.

Huang Ying reported this to lkml (https://lkml.org/lkml/2015/3/18/99)
after a test sysbench test that measured a -62% decrease of file io
requests per second for that tests' workload.

The test is:

  echo performance &gt; /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  echo performance &gt; /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
  echo performance &gt; /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
  echo performance &gt; /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
  mkfs -t btrfs /dev/sda2
  mount -t btrfs /dev/sda2 /fs/sda2
  cd /fs/sda2
  for ((i = 0; i &lt; 1024; i++)); do fallocate -l 67108864 testfile.$i; done
  sysbench --test=fileio --max-requests=0 --num-threads=4 --max-time=600 \
    --file-test-mode=rndwr --file-total-size=68719476736 --file-io-mode=sync \
    --file-num=1024 run

A test on kvm guest, running a debug kernel gave me the following results:

Without 3a8b36f378060d:             16.01 reqs/sec
With 3a8b36f378060d:                 3.39 reqs/sec
With 3a8b36f378060d and this patch: 16.04 reqs/sec

Reported-by: Huang Ying &lt;ying.huang@intel.com&gt;
Tested-by: Huang, Ying &lt;ying.huang@intel.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 3a8b36f37806 ("Btrfs: fix data loss in the fast fsync path") added
a performance regression for that causes an unnecessary sync of the log
trees (fs/subvol and root log trees) when 2 consecutive fsyncs are done
against a file, without no writes or any metadata updates to the inode in
between them and if a transaction is committed before the second fsync is
called.

Huang Ying reported this to lkml (https://lkml.org/lkml/2015/3/18/99)
after a test sysbench test that measured a -62% decrease of file io
requests per second for that tests' workload.

The test is:

  echo performance &gt; /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  echo performance &gt; /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
  echo performance &gt; /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
  echo performance &gt; /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
  mkfs -t btrfs /dev/sda2
  mount -t btrfs /dev/sda2 /fs/sda2
  cd /fs/sda2
  for ((i = 0; i &lt; 1024; i++)); do fallocate -l 67108864 testfile.$i; done
  sysbench --test=fileio --max-requests=0 --num-threads=4 --max-time=600 \
    --file-test-mode=rndwr --file-total-size=68719476736 --file-io-mode=sync \
    --file-num=1024 run

A test on kvm guest, running a debug kernel gave me the following results:

Without 3a8b36f378060d:             16.01 reqs/sec
With 3a8b36f378060d:                 3.39 reqs/sec
With 3a8b36f378060d and this patch: 16.04 reqs/sec

Reported-by: Huang Ying &lt;ying.huang@intel.com&gt;
Tested-by: Huang, Ying &lt;ying.huang@intel.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: remove csum_bytes_left</title>
<updated>2015-06-03T11:03:06+00:00</updated>
<author>
<name>Liu Bo</name>
<email>bo.li.liu@oracle.com</email>
</author>
<published>2015-05-25T03:20:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0c304304feab8a576ed6ba6ec964255d00d2886e'/>
<id>0c304304feab8a576ed6ba6ec964255d00d2886e</id>
<content type='text'>
After commit 8407f553268a
("Btrfs: fix data corruption after fast fsync and writeback error"),
during wait_ordered_extents(), we wait for ordered extent setting
BTRFS_ORDERED_IO_DONE or BTRFS_ORDERED_IOERR, at which point we've
already got checksum information, so we don't need to check
(csum_bytes_left == 0) in the whole logging path.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 8407f553268a
("Btrfs: fix data corruption after fast fsync and writeback error"),
during wait_ordered_extents(), we wait for ordered extent setting
BTRFS_ORDERED_IO_DONE or BTRFS_ORDERED_IOERR, at which point we've
already got checksum information, so we don't need to check
(csum_bytes_left == 0) in the whole logging path.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: fix panic when starting bg cache writeout after IO error</title>
<updated>2015-05-11T14:59:10+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2015-05-05T18:03:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28aeeac1dd3080db5108b7b446be69f05c470a90'/>
<id>28aeeac1dd3080db5108b7b446be69f05c470a90</id>
<content type='text'>
When waiting for the writeback of block group cache we returned
immediately if there was an error during writeback without waiting
for the ordered extent to complete. This left a short time window
where if some other task attempts to start the writeout for the same
block group cache it can attempt to add a new ordered extent, starting
at the same offset (0) before the previous one is removed from the
ordered tree, causing an ordered tree panic (calls BUG()).

This normally doesn't happen in other write paths, such as buffered
writes or direct IO writes for regular files, since before marking
page ranges dirty we lock the ranges and wait for any ordered extents
within the range to complete first.

Fix this by making btrfs_wait_ordered_range() not return immediately
if it gets an error from the writeback, waiting for all ordered extents
to complete first.

This issue happened often when running the fstest btrfs/088 and it's
easy to trigger it by running in a loop until the panic happens:

  for ((i = 1; i &lt;= 10000; i++)) do ./check btrfs/088 ; done

[17156.862573] BTRFS critical (device sdc): panic in ordered_data_tree_panic:70: Inconsistency in ordered tree at offset 0 (errno=-17 Object already exists)
[17156.864052] ------------[ cut here ]------------
[17156.864052] kernel BUG at fs/btrfs/ordered-data.c:70!
(...)
[17156.864052] Call Trace:
[17156.864052]  [&lt;ffffffffa03876e3&gt;] btrfs_add_ordered_extent+0x12/0x14 [btrfs]
[17156.864052]  [&lt;ffffffffa03787e2&gt;] run_delalloc_nocow+0x5bf/0x747 [btrfs]
[17156.864052]  [&lt;ffffffffa03789ff&gt;] run_delalloc_range+0x95/0x353 [btrfs]
[17156.864052]  [&lt;ffffffffa038b7fe&gt;] writepage_delalloc.isra.16+0xb9/0x13f [btrfs]
[17156.864052]  [&lt;ffffffffa038d75b&gt;] __extent_writepage+0x129/0x1f7 [btrfs]
[17156.864052]  [&lt;ffffffffa038da5a&gt;] extent_write_cache_pages.isra.15.constprop.28+0x231/0x2f4 [btrfs]
[17156.864052]  [&lt;ffffffff810ad2af&gt;] ? __module_text_address+0x12/0x59
[17156.864052]  [&lt;ffffffff8107d33d&gt;] ? trace_hardirqs_on+0xd/0xf
[17156.864052]  [&lt;ffffffffa038df76&gt;] extent_writepages+0x4b/0x5c [btrfs]
[17156.864052]  [&lt;ffffffff81144431&gt;] ? kmem_cache_free+0x9b/0xce
[17156.864052]  [&lt;ffffffffa0376a46&gt;] ? btrfs_submit_direct+0x3fc/0x3fc [btrfs]
[17156.864052]  [&lt;ffffffffa0389cd6&gt;] ? free_extent_state+0x8c/0xc1 [btrfs]
[17156.864052]  [&lt;ffffffffa0374871&gt;] btrfs_writepages+0x28/0x2a [btrfs]
[17156.864052]  [&lt;ffffffff8110c4c8&gt;] do_writepages+0x23/0x2c
[17156.864052]  [&lt;ffffffff81102f36&gt;] __filemap_fdatawrite_range+0x5a/0x61
[17156.864052]  [&lt;ffffffff81102f6e&gt;] filemap_fdatawrite_range+0x13/0x15
[17156.864052]  [&lt;ffffffffa0383ef7&gt;] btrfs_fdatawrite_range+0x21/0x48 [btrfs]
[17156.864052]  [&lt;ffffffffa03ab89e&gt;] __btrfs_write_out_cache.isra.14+0x2d9/0x3a7 [btrfs]
[17156.864052]  [&lt;ffffffffa03ac1ab&gt;] ? btrfs_write_out_cache+0x41/0xdc [btrfs]
[17156.864052]  [&lt;ffffffffa03ac1fd&gt;] btrfs_write_out_cache+0x93/0xdc [btrfs]
[17156.864052]  [&lt;ffffffffa0363847&gt;] ? btrfs_start_dirty_block_groups+0x13a/0x2b2 [btrfs]
[17156.864052]  [&lt;ffffffffa03638e6&gt;] btrfs_start_dirty_block_groups+0x1d9/0x2b2 [btrfs]
[17156.864052]  [&lt;ffffffff8107d33d&gt;] ? trace_hardirqs_on+0xd/0xf
[17156.864052]  [&lt;ffffffffa037209e&gt;] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
[17156.864052]  [&lt;ffffffffa034c748&gt;] btrfs_sync_fs+0xe1/0x12d [btrfs]

Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When waiting for the writeback of block group cache we returned
immediately if there was an error during writeback without waiting
for the ordered extent to complete. This left a short time window
where if some other task attempts to start the writeout for the same
block group cache it can attempt to add a new ordered extent, starting
at the same offset (0) before the previous one is removed from the
ordered tree, causing an ordered tree panic (calls BUG()).

This normally doesn't happen in other write paths, such as buffered
writes or direct IO writes for regular files, since before marking
page ranges dirty we lock the ranges and wait for any ordered extents
within the range to complete first.

Fix this by making btrfs_wait_ordered_range() not return immediately
if it gets an error from the writeback, waiting for all ordered extents
to complete first.

This issue happened often when running the fstest btrfs/088 and it's
easy to trigger it by running in a loop until the panic happens:

  for ((i = 1; i &lt;= 10000; i++)) do ./check btrfs/088 ; done

[17156.862573] BTRFS critical (device sdc): panic in ordered_data_tree_panic:70: Inconsistency in ordered tree at offset 0 (errno=-17 Object already exists)
[17156.864052] ------------[ cut here ]------------
[17156.864052] kernel BUG at fs/btrfs/ordered-data.c:70!
(...)
[17156.864052] Call Trace:
[17156.864052]  [&lt;ffffffffa03876e3&gt;] btrfs_add_ordered_extent+0x12/0x14 [btrfs]
[17156.864052]  [&lt;ffffffffa03787e2&gt;] run_delalloc_nocow+0x5bf/0x747 [btrfs]
[17156.864052]  [&lt;ffffffffa03789ff&gt;] run_delalloc_range+0x95/0x353 [btrfs]
[17156.864052]  [&lt;ffffffffa038b7fe&gt;] writepage_delalloc.isra.16+0xb9/0x13f [btrfs]
[17156.864052]  [&lt;ffffffffa038d75b&gt;] __extent_writepage+0x129/0x1f7 [btrfs]
[17156.864052]  [&lt;ffffffffa038da5a&gt;] extent_write_cache_pages.isra.15.constprop.28+0x231/0x2f4 [btrfs]
[17156.864052]  [&lt;ffffffff810ad2af&gt;] ? __module_text_address+0x12/0x59
[17156.864052]  [&lt;ffffffff8107d33d&gt;] ? trace_hardirqs_on+0xd/0xf
[17156.864052]  [&lt;ffffffffa038df76&gt;] extent_writepages+0x4b/0x5c [btrfs]
[17156.864052]  [&lt;ffffffff81144431&gt;] ? kmem_cache_free+0x9b/0xce
[17156.864052]  [&lt;ffffffffa0376a46&gt;] ? btrfs_submit_direct+0x3fc/0x3fc [btrfs]
[17156.864052]  [&lt;ffffffffa0389cd6&gt;] ? free_extent_state+0x8c/0xc1 [btrfs]
[17156.864052]  [&lt;ffffffffa0374871&gt;] btrfs_writepages+0x28/0x2a [btrfs]
[17156.864052]  [&lt;ffffffff8110c4c8&gt;] do_writepages+0x23/0x2c
[17156.864052]  [&lt;ffffffff81102f36&gt;] __filemap_fdatawrite_range+0x5a/0x61
[17156.864052]  [&lt;ffffffff81102f6e&gt;] filemap_fdatawrite_range+0x13/0x15
[17156.864052]  [&lt;ffffffffa0383ef7&gt;] btrfs_fdatawrite_range+0x21/0x48 [btrfs]
[17156.864052]  [&lt;ffffffffa03ab89e&gt;] __btrfs_write_out_cache.isra.14+0x2d9/0x3a7 [btrfs]
[17156.864052]  [&lt;ffffffffa03ac1ab&gt;] ? btrfs_write_out_cache+0x41/0xdc [btrfs]
[17156.864052]  [&lt;ffffffffa03ac1fd&gt;] btrfs_write_out_cache+0x93/0xdc [btrfs]
[17156.864052]  [&lt;ffffffffa0363847&gt;] ? btrfs_start_dirty_block_groups+0x13a/0x2b2 [btrfs]
[17156.864052]  [&lt;ffffffffa03638e6&gt;] btrfs_start_dirty_block_groups+0x1d9/0x2b2 [btrfs]
[17156.864052]  [&lt;ffffffff8107d33d&gt;] ? trace_hardirqs_on+0xd/0xf
[17156.864052]  [&lt;ffffffffa037209e&gt;] btrfs_commit_transaction+0x130/0x9c9 [btrfs]
[17156.864052]  [&lt;ffffffffa034c748&gt;] btrfs_sync_fs+0xe1/0x12d [btrfs]

Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
