<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs/ordered-data.h, branch v3.13</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Btrfs: don't wait for the completion of all the ordered extents</title>
<updated>2013-11-12T03:13:44+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2013-11-04T15:13:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b02441999efcc6152b87cd58e7970bb7843f76cf'/>
<id>b02441999efcc6152b87cd58e7970bb7843f76cf</id>
<content type='text'>
It is very likely that there are lots of ordered extents in the filesytem,
if we wait for the completion of all of them when we want to reclaim some
space for the metadata space reservation, we would be blocked for a long
time. The performance would drop down suddenly for a long time.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is very likely that there are lots of ordered extents in the filesytem,
if we wait for the completion of all of them when we want to reclaim some
space for the metadata space reservation, we would be blocked for a long
time. The performance would drop down suddenly for a long time.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: return an error from btrfs_wait_ordered_range</title>
<updated>2013-11-12T03:07:35+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fusionio.com</email>
</author>
<published>2013-10-25T20:13:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0ef8b726075aa6931ddf1c16f5bae043eef184f9'/>
<id>0ef8b726075aa6931ddf1c16f5bae043eef184f9</id>
<content type='text'>
I noticed that if the free space cache has an error writing out it's data it
won't actually error out, it will just carry on.  This is because it doesn't
check the return value of btrfs_wait_ordered_range, which didn't actually return
anything.  So fix this in order to keep us from making free space cache look
valid when it really isnt.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I noticed that if the free space cache has an error writing out it's data it
won't actually error out, it will just carry on.  This is because it doesn't
check the return value of btrfs_wait_ordered_range, which didn't actually return
anything.  So fix this in order to keep us from making free space cache look
valid when it really isnt.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: kill delay_iput arg to the wait_ordered functions</title>
<updated>2013-09-21T15:05:27+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fusionio.com</email>
</author>
<published>2013-09-17T14:55:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f0de181c9b48a397c5a2fbe63dcdd2a26a872695'/>
<id>f0de181c9b48a397c5a2fbe63dcdd2a26a872695</id>
<content type='text'>
This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it.  However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it.  However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: allow partial ordered extent completion</title>
<updated>2013-09-01T12:16:34+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fusionio.com</email>
</author>
<published>2013-08-29T17:57:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=77cef2ec5484564eca6bd12a2b4a1e88fd766fbc'/>
<id>77cef2ec5484564eca6bd12a2b4a1e88fd766fbc</id>
<content type='text'>
We currently have this problem where you can truncate pages that have not yet
been written for an ordered extent.  We do this because the truncate will be
coming behind to clean us up anyway so what's the harm right?  Well if truncate
fails for whatever reason we leave an orphan item around for the file to be
cleaned up later.  But if the user goes and truncates up the file and tries to
read from the area that had been discarded previously they will get a csum error
because we never actually wrote that data out.

This patch fixes this by allowing us to either discard the ordered extent
completely, by which I mean we just free up the space we had allocated and not
add the file extent, or adjust the length of the file extent we write.  We do
this by setting the length we truncated down to in the ordered extent, and then
we set the file extent length and ram bytes to this length.  The total disk
space stays unchanged since we may be compressed and we can't just chop off the
disk space, but at least this way the file extent only points to the valid data.
Then when the file extent is free'd the extent and csums will be freed normally.

This patch is needed for the next series which will give us more graceful
recovery of failed truncates.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We currently have this problem where you can truncate pages that have not yet
been written for an ordered extent.  We do this because the truncate will be
coming behind to clean us up anyway so what's the harm right?  Well if truncate
fails for whatever reason we leave an orphan item around for the file to be
cleaned up later.  But if the user goes and truncates up the file and tries to
read from the area that had been discarded previously they will get a csum error
because we never actually wrote that data out.

This patch fixes this by allowing us to either discard the ordered extent
completely, by which I mean we just free up the space we had allocated and not
add the file extent, or adjust the length of the file extent we write.  We do
this by setting the length we truncated down to in the ordered extent, and then
we set the file extent length and ram bytes to this length.  The total disk
space stays unchanged since we may be compressed and we can't just chop off the
disk space, but at least this way the file extent only points to the valid data.
Then when the file extent is free'd the extent and csums will be freed normally.

This patch is needed for the next series which will give us more graceful
recovery of failed truncates.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: remove btrfs_sector_sum structure</title>
<updated>2013-07-02T15:50:47+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2013-06-19T02:36:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f51a4a1826ff810eb9c00cadff8978b028c40756'/>
<id>f51a4a1826ff810eb9c00cadff8978b028c40756</id>
<content type='text'>
Using the structure btrfs_sector_sum to keep the checksum value is
unnecessary, because the extents that btrfs_sector_sum points to are
continuous, we can find out the expected checksums by btrfs_ordered_sum's
bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
removing bytenr, there is only one member in the structure, so it makes
no sense to keep the structure, just remove it, and use a u32 array to
store the checksum value.

By this change, we don't use the while loop to get the checksums one by
one. Now, we can get several checksum value at one time, it improved the
performance by ~74% on my SSD (31MB/s -&gt; 54MB/s).

test command:
 # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using the structure btrfs_sector_sum to keep the checksum value is
unnecessary, because the extents that btrfs_sector_sum points to are
continuous, we can find out the expected checksums by btrfs_ordered_sum's
bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
removing bytenr, there is only one member in the structure, so it makes
no sense to keep the structure, just remove it, and use a u32 array to
store the checksum value.

By this change, we don't use the while loop to get the checksums one by
one. Now, we can get several checksum value at one time, it improved the
performance by ~74% on my SSD (31MB/s -&gt; 54MB/s).

test command:
 # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: introduce per-subvolume ordered extent list</title>
<updated>2013-06-14T15:29:41+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2013-05-15T07:48:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=199c2a9c3d1389db7f7a211e64f6809d352ce5f6'/>
<id>199c2a9c3d1389db7f7a211e64f6809d352ce5f6</id>
<content type='text'>
The reason we introduce per-subvolume ordered extent list is the same
as the per-subvolume delalloc inode list.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The reason we introduce per-subvolume ordered extent list is the same
as the per-subvolume delalloc inode list.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: improve the performance of the csums lookup</title>
<updated>2013-05-06T19:54:35+00:00</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2013-04-05T07:20:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e4100d987b2437596ebcf11809022b79507f3db1'/>
<id>e4100d987b2437596ebcf11809022b79507f3db1</id>
<content type='text'>
It is very likely that there are several blocks in bio, it is very
inefficient if we get their csums one by one. This patch improves
this problem by getting the csums in batch.

According to the result of the following test, the execute time of
__btrfs_lookup_bio_sums() is down by ~28%(300us -&gt; 217us).

 # dd if=&lt;mnt&gt;/file of=/dev/null bs=1M count=1024

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is very likely that there are several blocks in bio, it is very
inefficient if we get their csums one by one. This patch improves
this problem by getting the csums in batch.

According to the result of the following test, the execute time of
__btrfs_lookup_bio_sums() is down by ~28%(300us -&gt; 217us).

 # dd if=&lt;mnt&gt;/file of=/dev/null bs=1M count=1024

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9</title>
<updated>2013-02-20T19:05:45+00:00</updated>
<author>
<name>Chris Mason</name>
<email>chris.mason@fusionio.com</email>
</author>
<published>2013-02-20T19:05:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b2c6b3e0611c58fbeb6b9c0892b6249f7bdfaf6b'/>
<id>b2c6b3e0611c58fbeb6b9c0892b6249f7bdfaf6b</id>
<content type='text'>
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;

Conflicts:
	fs/btrfs/disk-io.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;

Conflicts:
	fs/btrfs/disk-io.c
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: place ordered operations on a per transaction list</title>
<updated>2013-02-20T17:59:57+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fusionio.com</email>
</author>
<published>2013-02-13T16:09:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=569e0f358c0c37f6733702d4a5d2c412860f7169'/>
<id>569e0f358c0c37f6733702d4a5d2c412860f7169</id>
<content type='text'>
Miao made the ordered operations stuff run async, which introduced a
deadlock where we could get somebody (sync) racing in and committing the
transaction while a commit was already happening.  The new committer would
try and flush ordered operations which would hang waiting for the commit to
finish because it is done asynchronously and no longer inherits the callers
trans handle.  To fix this we need to make the ordered operations list a per
transaction list.  We can get new inodes added to the ordered operation list
by truncating them and then having another process writing to them, so this
makes it so that anybody trying to add an ordered operation _must_ start a
transaction in order to add itself to the list, which will keep new inodes
from getting added to the ordered operations list after we start committing.
This should fix the deadlock and also keeps us from doing a lot more work
than we need to during commit.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Miao made the ordered operations stuff run async, which introduced a
deadlock where we could get somebody (sync) racing in and committing the
transaction while a commit was already happening.  The new committer would
try and flush ordered operations which would hang waiting for the commit to
finish because it is done asynchronously and no longer inherits the callers
trans handle.  To fix this we need to make the ordered operations list a per
transaction list.  We can get new inodes added to the ordered operation list
by truncating them and then having another process writing to them, so this
makes it so that anybody trying to add an ordered operation _must_ start a
transaction in order to add itself to the list, which will keep new inodes
from getting added to the ordered operations list after we start committing.
This should fix the deadlock and also keeps us from doing a lot more work
than we need to during commit.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: wait on ordered extents at the last possible moment</title>
<updated>2013-02-20T14:37:04+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fusionio.com</email>
</author>
<published>2012-10-12T19:27:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2ab28f322f9896782da904f5942f3873432addc8'/>
<id>2ab28f322f9896782da904f5942f3873432addc8</id>
<content type='text'>
Since we don't actually copy the extent information from the source tree in
the fast case we don't need to wait for ordered io to be completed in order
to fsync, we just need to wait for the io to be completed.  So when we're
logging our file just attach all of the ordered extents to the log, and then
when the log syncs just wait for IO_DONE on the ordered extents and then
write the super.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since we don't actually copy the extent information from the source tree in
the fast case we don't need to wait for ordered io to be completed in order
to fsync, we just need to wait for the io to be completed.  So when we're
logging our file just attach all of the ordered extents to the log, and then
when the log syncs just wait for IO_DONE on the ordered extents and then
write the super.  Thanks,

Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
