<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ext4/fsync.c, branch v2.6.38</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ext4: flush the i_completed_io_list during ext4_truncate</title>
<updated>2011-01-10T17:47:05+00:00</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-01-10T17:47:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3889fd57ea3c58209354862523275774fca9db03'/>
<id>3889fd57ea3c58209354862523275774fca9db03</id>
<content type='text'>
Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.

The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls 
ext4_convert_unwritten_extents(), we may reallocate some blocks that have 
been truncated, so the inode size becomes inconsistent with the allocated
blocks. 

The following patch flushes the i_completed_io_list during truncate to reduce 
the risk that some pending end_io requests are executed later and convert 
already truncated blocks to initialized. 

Note that although the fix helps reduce the problem a lot there may still 
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.

Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:

a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold 
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.

b) We change the ext4 write path so that we change the extent tree to contain 
the newly allocated blocks and update i_size both at the same time --- when 
the write of the data blocks is completed.

c) We add some additional locking to synchronize vmtruncate() and 
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.

All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.

The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls 
ext4_convert_unwritten_extents(), we may reallocate some blocks that have 
been truncated, so the inode size becomes inconsistent with the allocated
blocks. 

The following patch flushes the i_completed_io_list during truncate to reduce 
the risk that some pending end_io requests are executed later and convert 
already truncated blocks to initialized. 

Note that although the fix helps reduce the problem a lot there may still 
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.

Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:

a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold 
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.

b) We change the ext4 write path so that we change the extent tree to contain 
the newly allocated blocks and update i_size both at the same time --- when 
the write of the data blocks is completed.

c) We add some additional locking to synchronize vmtruncate() and 
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.

All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'next' into upstream-merge</title>
<updated>2010-10-28T03:44:47+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2010-10-28T03:44:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a107e5a3a473a2ea62bd5af24e11b84adf1486ff'/>
<id>a107e5a3a473a2ea62bd5af24e11b84adf1486ff</id>
<content type='text'>
Conflicts:
	fs/ext4/inode.c
	fs/ext4/mballoc.c
	include/trace/events/ext4.h
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	fs/ext4/inode.c
	fs/ext4/mballoc.c
	include/trace/events/ext4.h
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static</title>
<updated>2010-10-28T01:30:14+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2010-10-28T01:30:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4a873a472b3bbcfd425d7ae210afdec28c04e2e5'/>
<id>4a873a472b3bbcfd425d7ae210afdec28c04e2e5</id>
<content type='text'>
Fix a namespace leak by moving the function to the file where it is
used and making it static.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix a namespace leak by moving the function to the file where it is
used and making it static.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: remove BLKDEV_IFL_WAIT</title>
<updated>2010-09-16T18:52:58+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-09-16T18:51:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dd3932eddf428571762596e17b65f5dc92ca361b'/>
<id>dd3932eddf428571762596e17b65f5dc92ca361b</id>
<content type='text'>
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller.  To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller.  To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rename the generic fsync implementations</title>
<updated>2010-05-28T02:06:06+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-05-26T15:53:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1b061d9247f71cd15edc4c4c4600191a903642c0'/>
<id>1b061d9247f71cd15edc4c4c4600191a903642c0</id>
<content type='text'>
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drop unused dentry argument to -&gt;fsync</title>
<updated>2010-05-28T02:05:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-05-26T15:53:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7ea8085910ef3dd4f3cad6845aaa2b580d39b115'/>
<id>7ea8085910ef3dd4f3cad6845aaa2b580d39b115</id>
<content type='text'>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4</title>
<updated>2010-05-27T17:26:37+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-05-27T17:26:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e4ce30f3779c2ddaa7dfaa4042209e5dbacbada5'/>
<id>e4ce30f3779c2ddaa7dfaa4042209e5dbacbada5</id>
<content type='text'>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
  ext4: Make fsync sync new parent directories in no-journal mode
  ext4: Drop whitespace at end of lines
  ext4: Fix compat EXT4_IOC_ADD_GROUP
  ext4: Conditionally define compat ioctl numbers
  tracing: Convert more ext4 events to DEFINE_EVENT
  ext4: Add new tracepoints to track mballoc's buddy bitmap loads
  ext4: Add a missing trace hook
  ext4: restart ext4_ext_remove_space() after transaction restart
  ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
  ext4: Avoid crashing on NULL ptr dereference on a filesystem error
  ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
  ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
  ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
  ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
  ext4: Use our own write_cache_pages()
  ext4: Show journal_checksum option
  ext4: Fix for ext4_mb_collect_stats()
  ext4: check for a good block group before loading buddy pages
  ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
  ext4: Remove extraneous newlines in ext4_msg() calls
  ...

Fixed up trivial conflict in fs/ext4/fsync.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
  ext4: Make fsync sync new parent directories in no-journal mode
  ext4: Drop whitespace at end of lines
  ext4: Fix compat EXT4_IOC_ADD_GROUP
  ext4: Conditionally define compat ioctl numbers
  tracing: Convert more ext4 events to DEFINE_EVENT
  ext4: Add new tracepoints to track mballoc's buddy bitmap loads
  ext4: Add a missing trace hook
  ext4: restart ext4_ext_remove_space() after transaction restart
  ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted
  ext4: Avoid crashing on NULL ptr dereference on a filesystem error
  ext4: Use bitops to read/modify i_flags in struct ext4_inode_info
  ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()
  ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()
  ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()
  ext4: Use our own write_cache_pages()
  ext4: Show journal_checksum option
  ext4: Fix for ext4_mb_collect_stats()
  ext4: check for a good block group before loading buddy pages
  ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate
  ext4: Remove extraneous newlines in ext4_msg() calls
  ...

Fixed up trivial conflict in fs/ext4/fsync.c
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Make fsync sync new parent directories in no-journal mode</title>
<updated>2010-05-17T12:00:00+00:00</updated>
<author>
<name>Frank Mayhar</name>
<email>fmayhar@google.com</email>
</author>
<published>2010-05-17T12:00:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=14ece1028b3ed53ffec1b1213ffc6acaf79ad77c'/>
<id>14ece1028b3ed53ffec1b1213ffc6acaf79ad77c</id>
<content type='text'>
Add a new ext4 state to tell us when a file has been newly created; use
that state in ext4_sync_file in no-journal mode to tell us when we need
to sync the parent directory as well as the inode and data itself.  This
fixes a problem in which a panic or power failure may lose the entire
file even when using fsync, since the parent directory entry is lost.

Addresses-Google-Bug: #2480057

Signed-off-by: Frank Mayhar &lt;fmayhar@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new ext4 state to tell us when a file has been newly created; use
that state in ext4_sync_file in no-journal mode to tell us when we need
to sync the parent directory as well as the inode and data itself.  This
fixes a problem in which a panic or power failure may lose the entire
file even when using fsync, since the parent directory entry is lost.

Addresses-Google-Bug: #2480057

Signed-off-by: Frank Mayhar &lt;fmayhar@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Drop whitespace at end of lines</title>
<updated>2010-05-17T11:00:00+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2010-05-17T11:00:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=60e6679e28518ccd67169c4a539d8cc7490eb8a6'/>
<id>60e6679e28518ccd67169c4a539d8cc7490eb8a6</id>
<content type='text'>
This patch was generated using:

#!/usr/bin/perl -i
while (&lt;&gt;) {
    s/[ 	]+$//;
    print;
}

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch was generated using:

#!/usr/bin/perl -i
while (&lt;&gt;) {
    s/[ 	]+$//;
    print;
}

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: check missed return value in ext4_sync_file()</title>
<updated>2010-05-10T04:00:00+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2010-05-10T04:00:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0671e704658b9f26f85e78d51176daa861f955c7'/>
<id>0671e704658b9f26f85e78d51176daa861f955c7</id>
<content type='text'>
Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
</feed>
