<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs/nilfs2, branch v3.19</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>nilfs2: fix deadlock of segment constructor over I_SYNC flag</title>
<updated>2015-02-05T21:35:29+00:00</updated>
<author>
<name>Ryusuke Konishi</name>
<email>konishi.ryusuke@lab.ntt.co.jp</email>
</author>
<published>2015-02-05T20:25:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7ef3ff2fea8bf5e4a21cef47ad87710a3d0fdb52'/>
<id>7ef3ff2fea8bf5e4a21cef47ad87710a3d0fdb52</id>
<content type='text'>
Nilfs2 eventually hangs in a stress test with fsstress program.  This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():

  nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
      nilfs_segctor_unlock()
        nilfs_dispose_list()
          iput()
            iput_final()
              evict()
                inode_wait_for_writeback()  * wait for I_SYNC flag

  writeback_sb_inodes()
     * set I_SYNC flag on inode-&gt;i_state
    __writeback_single_inode()
      do_writepages()
        nilfs_writepages()
          nilfs_construct_dsync_segment()
            nilfs_segctor_sync()
               * wait for completion of segment constructor
    inode_sync_complete()
       * clear I_SYNC flag after __writeback_single_inode() completed

writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode-&gt;i_state.  do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion.  On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().

Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode-&gt;i_nlink has a
non-zero count.  We can prevent evict() from being called in iput() by
implementing sop-&gt;drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation.  So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Tested-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Nilfs2 eventually hangs in a stress test with fsstress program.  This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():

  nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
      nilfs_segctor_unlock()
        nilfs_dispose_list()
          iput()
            iput_final()
              evict()
                inode_wait_for_writeback()  * wait for I_SYNC flag

  writeback_sb_inodes()
     * set I_SYNC flag on inode-&gt;i_state
    __writeback_single_inode()
      do_writepages()
        nilfs_writepages()
          nilfs_construct_dsync_segment()
            nilfs_segctor_sync()
               * wait for completion of segment constructor
    inode_sync_complete()
       * clear I_SYNC flag after __writeback_single_inode() completed

writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode-&gt;i_state.  do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion.  On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().

Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode-&gt;i_nlink has a
non-zero count.  We can prevent evict() from being called in iput() by
implementing sop-&gt;drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation.  So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Tested-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races</title>
<updated>2014-12-11T01:41:16+00:00</updated>
<author>
<name>Ryusuke Konishi</name>
<email>konishi.ryusuke@lab.ntt.co.jp</email>
</author>
<published>2014-12-10T23:54:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=705304a863cc41585508c0f476f6d3ec28cf7e00'/>
<id>705304a863cc41585508c0f476f6d3ec28cf7e00</id>
<content type='text'>
Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar
ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
of insert_inode_locked() and a bug of a check for dead inodes needs to
be fixed.

If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.

If nilfs_iget() is called before nilfs_new_inode() calls
insert_inode_locked4(), it will create an in-core inode and read its
data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
zero and fail at nilfs_read_inode_common(), which will lead it to call
iget_failed() and cleanly fail.

However, this sanity check doesn't work as expected for reused on-disk
inodes because they leave a non-zero value in i_mode field and it
hinders the test of i_nlink.  This patch also fixes the issue by
removing the test on i_mode that nilfs2 doesn't need.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Same story as in commit 41080b5a2401 ("nfsd race fixes: ext2") (similar
ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
of insert_inode_locked() and a bug of a check for dead inodes needs to
be fixed.

If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.

If nilfs_iget() is called before nilfs_new_inode() calls
insert_inode_locked4(), it will create an in-core inode and read its
data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
zero and fail at nilfs_read_inode_common(), which will lead it to call
iget_failed() and cleanly fail.

However, this sanity check doesn't work as expected for reused on-disk
inodes because they leave a non-zero value in i_mode field and it
hinders the test of i_nlink.  This patch also fixes the issue by
removing the test on i_mode that nilfs2 doesn't need.

Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: deletion of an unnecessary check before the function call "iput"</title>
<updated>2014-12-11T01:41:16+00:00</updated>
<author>
<name>Markus Elfring</name>
<email>elfring@users.sourceforge.net</email>
</author>
<published>2014-12-10T23:54:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=72b9918ea4d7f2d8362a2defdb93e5fd25a86308'/>
<id>72b9918ea4d7f2d8362a2defdb93e5fd25a86308</id>
<content type='text'>
The iput() function tests whether its argument is NULL and then returns
immediately.  Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring &lt;elfring@users.sourceforge.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The iput() function tests whether its argument is NULL and then returns
immediately.  Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring &lt;elfring@users.sourceforge.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: avoid duplicate segment construction for fsync()</title>
<updated>2014-12-11T01:41:16+00:00</updated>
<author>
<name>Andreas Rohner</name>
<email>andreas.rohner@gmx.net</email>
</author>
<published>2014-12-10T23:54:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=75dc857c46996e336cb2f9cf0ed1e5508fb241a7'/>
<id>75dc857c46996e336cb2f9cf0ed1e5508fb241a7</id>
<content type='text'>
This patch removes filemap_write_and_wait_range() from nilfs_sync_file(),
because it triggers a data segment construction by calling
nilfs_writepages() with WB_SYNC_ALL.  A data segment construction does not
remove the inode from the i_dirty list and it does not clear the
NILFS_I_DIRTY flag.  Therefore nilfs_inode_dirty() still returns true,
which leads to an unnecessary duplicate segment construction in
nilfs_sync_file().

A call to filemap_write_and_wait_range() is not needed, because NILFS2
does not rely on the generic writeback mechanisms.  Instead it implements
its own mechanism to collect all dirty pages and write them into segments.
 It is more efficient to initiate the segment construction directly in
nilfs_sync_file() without the detour over filemap_write_and_wait_range().

Additionally the lock of i_mutex is not needed, because all code blocks
that are protected by i_mutex are also protected by a NILFS transaction:

  Function                i_mutex     nilfs_transaction
  ------------------------------------------------------
  nilfs_ioctl_setflags:   yes         yes
  nilfs_fiemap:           yes         no
  nilfs_write_begin:      yes         yes
  nilfs_write_end:        yes         yes
  nilfs_lookup:           yes         no
  nilfs_create:           yes         yes
  nilfs_link:             yes         yes
  nilfs_mknod:            yes         yes
  nilfs_symlink:          yes         yes
  nilfs_mkdir:            yes         yes
  nilfs_unlink:           yes         yes
  nilfs_rmdir:            yes         yes
  nilfs_rename:           yes         yes
  nilfs_setattr:          yes         yes

For nilfs_lookup() i_mutex is held for the parent directory, to protect it
from modification.  The segment construction does not modify directory
inodes, so no lock is needed.

nilfs_fiemap() reads the block layout on the disk, by using
nilfs_bmap_lookup_contig(). This is already protected by bmap-&gt;b_sem.

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch removes filemap_write_and_wait_range() from nilfs_sync_file(),
because it triggers a data segment construction by calling
nilfs_writepages() with WB_SYNC_ALL.  A data segment construction does not
remove the inode from the i_dirty list and it does not clear the
NILFS_I_DIRTY flag.  Therefore nilfs_inode_dirty() still returns true,
which leads to an unnecessary duplicate segment construction in
nilfs_sync_file().

A call to filemap_write_and_wait_range() is not needed, because NILFS2
does not rely on the generic writeback mechanisms.  Instead it implements
its own mechanism to collect all dirty pages and write them into segments.
 It is more efficient to initiate the segment construction directly in
nilfs_sync_file() without the detour over filemap_write_and_wait_range().

Additionally the lock of i_mutex is not needed, because all code blocks
that are protected by i_mutex are also protected by a NILFS transaction:

  Function                i_mutex     nilfs_transaction
  ------------------------------------------------------
  nilfs_ioctl_setflags:   yes         yes
  nilfs_fiemap:           yes         no
  nilfs_write_begin:      yes         yes
  nilfs_write_end:        yes         yes
  nilfs_lookup:           yes         no
  nilfs_create:           yes         yes
  nilfs_link:             yes         yes
  nilfs_mknod:            yes         yes
  nilfs_symlink:          yes         yes
  nilfs_mkdir:            yes         yes
  nilfs_unlink:           yes         yes
  nilfs_rmdir:            yes         yes
  nilfs_rename:           yes         yes
  nilfs_setattr:          yes         yes

For nilfs_lookup() i_mutex is held for the parent directory, to protect it
from modification.  The segment construction does not modify directory
inodes, so no lock is needed.

nilfs_fiemap() reads the block layout on the disk, by using
nilfs_bmap_lookup_contig(). This is already protected by bmap-&gt;b_sem.

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: improve the performance of fdatasync()</title>
<updated>2014-10-14T00:18:20+00:00</updated>
<author>
<name>Andreas Rohner</name>
<email>andreas.rohner@gmx.net</email>
</author>
<published>2014-10-13T22:53:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b9f6614072687f1ea9bf09a99789c976cbe89714'/>
<id>b9f6614072687f1ea9bf09a99789c976cbe89714</id>
<content type='text'>
Support for fdatasync() has been implemented in NILFS2 for a long time,
but whenever the corresponding inode is dirty the implementation falls
back to a full-flegded sync().  Since every write operation has to
update the modification time of the file, the inode will almost always
be dirty and fdatasync() will fall back to sync() most of the time.  But
this fallback is only necessary for a change of the file size and not
for a change of the various timestamps.

This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between
those two situations.

 * If it is set the file size was changed and a full sync is necessary.
 * If it is not set then only the timestamps were updated and
   fdatasync() can go ahead.

There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with
the exact same semantics.  Unfortunately it cannot be used directly,
because NILFS2 doesn't implement write_inode() and doesn't clear the VFS
flags when inodes are written out.  So the VFS writeback thread can
clear I_DIRTY_DATASYNC at any time without notifying NILFS2.  So
I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in
nilfs_update_inode().

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Support for fdatasync() has been implemented in NILFS2 for a long time,
but whenever the corresponding inode is dirty the implementation falls
back to a full-flegded sync().  Since every write operation has to
update the modification time of the file, the inode will almost always
be dirty and fdatasync() will fall back to sync() most of the time.  But
this fallback is only necessary for a change of the file size and not
for a change of the various timestamps.

This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between
those two situations.

 * If it is set the file size was changed and a full sync is necessary.
 * If it is not set then only the timestamps were updated and
   fdatasync() can go ahead.

There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with
the exact same semantics.  Unfortunately it cannot be used directly,
because NILFS2 doesn't implement write_inode() and doesn't clear the VFS
flags when inodes are written out.  So the VFS writeback thread can
clear I_DIRTY_DATASYNC at any time without notifying NILFS2.  So
I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in
nilfs_update_inode().

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()</title>
<updated>2014-10-14T00:18:20+00:00</updated>
<author>
<name>Andreas Rohner</name>
<email>andreas.rohner@gmx.net</email>
</author>
<published>2014-10-13T22:53:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e2c7617ae36b27f97643bfa08aabe27e630c1a76'/>
<id>e2c7617ae36b27f97643bfa08aabe27e630c1a76</id>
<content type='text'>
Under normal circumstances nilfs_sync_fs() writes out the super block,
which causes a flush of the underlying block device.  But this depends
on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
last segment crosses a segment boundary.  So if only a small amount of
data is written before the call to nilfs_sync_fs(), no flush of the
block device occurs.

In the above case an additional call to blkdev_issue_flush() is needed.
To prevent unnecessary overhead, the new flag nilfs-&gt;ns_flushed_device
is introduced, which is cleared whenever new logs are written and set
whenever the block device is flushed.  For convenience the function
nilfs_flush_device() is added, which contains the above logic.

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under normal circumstances nilfs_sync_fs() writes out the super block,
which causes a flush of the underlying block device.  But this depends
on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
last segment crosses a segment boundary.  So if only a small amount of
data is written before the call to nilfs_sync_fs(), no flush of the
block device occurs.

In the above case an additional call to blkdev_issue_flush() is needed.
To prevent unnecessary overhead, the new flag nilfs-&gt;ns_flushed_device
is introduced, which is cleared whenever new logs are written and set
whenever the block device is flushed.  For convenience the function
nilfs_flush_device() is added, which contains the above logic.

Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: fix data loss with mmap()</title>
<updated>2014-09-26T15:10:34+00:00</updated>
<author>
<name>Andreas Rohner</name>
<email>andreas.rohner@gmx.net</email>
</author>
<published>2014-09-25T23:05:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=56d7acc792c0d98f38f22058671ee715ff197023'/>
<id>56d7acc792c0d98f38f22058671ee715ff197023</id>
<content type='text'>
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system.  It is easily
reproducible with the following script:

  ----------------[BEGIN SCRIPT]--------------------
  mkfs.nilfs2 -f /dev/sdb
  mount /dev/sdb /mnt

  dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

  /root/mmaptest/mmaptest /mnt/testfile 30 10 5

  sync
  CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
  umount /mnt

  echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
  echo "AFTER MMAP:\t$CHECKSUM_AFTER"
  echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
  ----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

  ----------------[BEGIN mmaptest]--------------------
  data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
              MAP_SHARED, fd, file_offset);

  for (i = 0; i &lt; write_count; ++i) {
        memcpy(data + i * 4096, buf, sizeof(buf));
        msync(data, file_size - file_offset, MS_SYNC))
  }
  ----------------[END mmaptest]--------------------

The output of the script looks something like this:

  BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
  AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
  AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount.  This can be reproduced a 100% of the time.  The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it.  In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.

[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system.  It is easily
reproducible with the following script:

  ----------------[BEGIN SCRIPT]--------------------
  mkfs.nilfs2 -f /dev/sdb
  mount /dev/sdb /mnt

  dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

  /root/mmaptest/mmaptest /mnt/testfile 30 10 5

  sync
  CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
  umount /mnt

  echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
  echo "AFTER MMAP:\t$CHECKSUM_AFTER"
  echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
  ----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

  ----------------[BEGIN mmaptest]--------------------
  data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
              MAP_SHARED, fd, file_offset);

  for (i = 0; i &lt; write_count; ++i) {
        memcpy(data + i * 4096, buf, sizeof(buf));
        msync(data, file_size - file_offset, MS_SYNC))
  }
  ----------------[END mmaptest]--------------------

The output of the script looks something like this:

  BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
  AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
  AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount.  This can be reproduced a 100% of the time.  The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it.  In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.

[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Tested-by: Andreas Rohner &lt;andreas.rohner@gmx.net&gt;
Signed-off-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2014-08-11T18:44:11+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-08-11T18:44:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f6f993328b2abcab86a3c99d7bd9f2066ab03d36'/>
<id>f6f993328b2abcab86a3c99d7bd9f2066ab03d36</id>
<content type='text'>
Pull vfs updates from Al Viro:
 "Stuff in here:

   - acct.c fixes and general rework of mnt_pin mechanism.  That allows
     to go for delayed-mntput stuff, which will permit mntput() on deep
     stack without worrying about stack overflows - fs shutdown will
     happen on shallow stack.  IOW, we can do Eric's umount-on-rmdir
     series without introducing tons of stack overflows on new mntput()
     call chains it introduces.
   - Bruce's d_splice_alias() patches
   - more Miklos' rename() stuff.
   - a couple of regression fixes (stable fodder, in the end of branch)
     and a fix for API idiocy in iov_iter.c.

  There definitely will be another pile, maybe even two.  I'd like to
  get Eric's series in this time, but even if we miss it, it'll go right
  in the beginning of for-next in the next cycle - the tricky part of
  prereqs is in this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
  fix copy_tree() regression
  __generic_file_write_iter(): fix handling of sync error after DIO
  switch iov_iter_get_pages() to passing maximal number of pages
  fs: mark __d_obtain_alias static
  dcache: d_splice_alias should detect loops
  exportfs: update Exporting documentation
  dcache: d_find_alias needn't recheck IS_ROOT &amp;&amp; DCACHE_DISCONNECTED
  dcache: remove unused d_find_alias parameter
  dcache: d_obtain_alias callers don't all want DISCONNECTED
  dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
  dcache: d_splice_alias mustn't create directory aliases
  dcache: close d_move race in d_splice_alias
  dcache: move d_splice_alias
  namei: trivial fix to vfs_rename_dir comment
  VFS: allow -&gt;d_manage() to declare -EISDIR in rcu_walk mode.
  cifs: support RENAME_NOREPLACE
  hostfs: support rename flags
  shmem: support RENAME_EXCHANGE
  shmem: support RENAME_NOREPLACE
  btrfs: add RENAME_NOREPLACE
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull vfs updates from Al Viro:
 "Stuff in here:

   - acct.c fixes and general rework of mnt_pin mechanism.  That allows
     to go for delayed-mntput stuff, which will permit mntput() on deep
     stack without worrying about stack overflows - fs shutdown will
     happen on shallow stack.  IOW, we can do Eric's umount-on-rmdir
     series without introducing tons of stack overflows on new mntput()
     call chains it introduces.
   - Bruce's d_splice_alias() patches
   - more Miklos' rename() stuff.
   - a couple of regression fixes (stable fodder, in the end of branch)
     and a fix for API idiocy in iov_iter.c.

  There definitely will be another pile, maybe even two.  I'd like to
  get Eric's series in this time, but even if we miss it, it'll go right
  in the beginning of for-next in the next cycle - the tricky part of
  prereqs is in this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
  fix copy_tree() regression
  __generic_file_write_iter(): fix handling of sync error after DIO
  switch iov_iter_get_pages() to passing maximal number of pages
  fs: mark __d_obtain_alias static
  dcache: d_splice_alias should detect loops
  exportfs: update Exporting documentation
  dcache: d_find_alias needn't recheck IS_ROOT &amp;&amp; DCACHE_DISCONNECTED
  dcache: remove unused d_find_alias parameter
  dcache: d_obtain_alias callers don't all want DISCONNECTED
  dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
  dcache: d_splice_alias mustn't create directory aliases
  dcache: close d_move race in d_splice_alias
  dcache: move d_splice_alias
  namei: trivial fix to vfs_rename_dir comment
  VFS: allow -&gt;d_manage() to declare -EISDIR in rcu_walk mode.
  cifs: support RENAME_NOREPLACE
  hostfs: support rename flags
  shmem: support RENAME_EXCHANGE
  shmem: support RENAME_NOREPLACE
  btrfs: add RENAME_NOREPLACE
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: integrate sysfs support into driver</title>
<updated>2014-08-08T22:57:21+00:00</updated>
<author>
<name>Vyacheslav Dubeyko</name>
<email>Vyacheslav.Dubeyko@hgst.com</email>
</author>
<published>2014-08-08T21:20:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dd70edbde2627f47df118d899de6bbb55abcfdbf'/>
<id>dd70edbde2627f47df118d899de6bbb55abcfdbf</id>
<content type='text'>
This patch integrates creation of sysfs groups and
attributes into NILFS file system driver.

It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group
functions by Michael L Semon &lt;mlsemon35@gmail.com&gt; in the first
version of the patch:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579
  in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2
  2 locks held by umount.nilfs2/32676:
   #0:  (&amp;type-&gt;s_umount_key#21){++++..}, at: [&lt;790c18e2&gt;] deactivate_super+0x37/0x58
   #1:  (&amp;(&amp;nilfs-&gt;ns_cptree_lock)-&gt;rlock){+.+...}, at: [&lt;791bf659&gt;] nilfs_put_root+0x23/0x5a
  Preemption disabled at:[&lt;791bf659&gt;] nilfs_put_root+0x23/0x5a

  CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2
  Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
  Call Trace:
    dump_stack+0x4b/0x75
    __might_sleep+0x111/0x16f
    mutex_lock_nested+0x1e/0x3ad
    kernfs_remove+0x12/0x26
    sysfs_remove_dir+0x3d/0x62
    kobject_del+0x13/0x38
    nilfs_sysfs_delete_snapshot_group+0xb/0xd
    nilfs_put_root+0x2a/0x5a
    nilfs_detach_log_writer+0x1ab/0x2c1
    nilfs_put_super+0x13/0x68
    generic_shutdown_super+0x60/0xd1
    kill_block_super+0x1d/0x60
    deactivate_locked_super+0x22/0x3f
    deactivate_super+0x3e/0x58
    mntput_no_expire+0xe2/0x141
    SyS_oldumount+0x70/0xa5
    syscall_call+0x7/0xb

The reason of the issue was placement of
nilfs_sysfs_{create/delete}_snapshot_group() call under
nilfs-&gt;ns_cptree_lock protection.  But this protection is unnecessary and
wrong solution.  The second version of the patch fixes this issue.

[fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static]
Reported-by: Michael L. Semon &lt;mlsemon35@gmail.com&gt;
Signed-off-by: Vyacheslav Dubeyko &lt;Vyacheslav.Dubeyko@hgst.com&gt;
Cc: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Tested-by: Michael L. Semon &lt;mlsemon35@gmail.com&gt;
Signed-off-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch integrates creation of sysfs groups and
attributes into NILFS file system driver.

It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group
functions by Michael L Semon &lt;mlsemon35@gmail.com&gt; in the first
version of the patch:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579
  in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2
  2 locks held by umount.nilfs2/32676:
   #0:  (&amp;type-&gt;s_umount_key#21){++++..}, at: [&lt;790c18e2&gt;] deactivate_super+0x37/0x58
   #1:  (&amp;(&amp;nilfs-&gt;ns_cptree_lock)-&gt;rlock){+.+...}, at: [&lt;791bf659&gt;] nilfs_put_root+0x23/0x5a
  Preemption disabled at:[&lt;791bf659&gt;] nilfs_put_root+0x23/0x5a

  CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2
  Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
  Call Trace:
    dump_stack+0x4b/0x75
    __might_sleep+0x111/0x16f
    mutex_lock_nested+0x1e/0x3ad
    kernfs_remove+0x12/0x26
    sysfs_remove_dir+0x3d/0x62
    kobject_del+0x13/0x38
    nilfs_sysfs_delete_snapshot_group+0xb/0xd
    nilfs_put_root+0x2a/0x5a
    nilfs_detach_log_writer+0x1ab/0x2c1
    nilfs_put_super+0x13/0x68
    generic_shutdown_super+0x60/0xd1
    kill_block_super+0x1d/0x60
    deactivate_locked_super+0x22/0x3f
    deactivate_super+0x3e/0x58
    mntput_no_expire+0xe2/0x141
    SyS_oldumount+0x70/0xa5
    syscall_call+0x7/0xb

The reason of the issue was placement of
nilfs_sysfs_{create/delete}_snapshot_group() call under
nilfs-&gt;ns_cptree_lock protection.  But this protection is unnecessary and
wrong solution.  The second version of the patch fixes this issue.

[fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static]
Reported-by: Michael L. Semon &lt;mlsemon35@gmail.com&gt;
Signed-off-by: Vyacheslav Dubeyko &lt;Vyacheslav.Dubeyko@hgst.com&gt;
Cc: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Tested-by: Michael L. Semon &lt;mlsemon35@gmail.com&gt;
Signed-off-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nilfs2: add /sys/fs/nilfs2/&lt;device&gt;/mounted_snapshots/&lt;snapshot&gt; group</title>
<updated>2014-08-08T22:57:21+00:00</updated>
<author>
<name>Vyacheslav Dubeyko</name>
<email>Vyacheslav.Dubeyko@hgst.com</email>
</author>
<published>2014-08-08T21:20:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a5a7332a291b55beb0863b119816d12ffc04dfb0'/>
<id>a5a7332a291b55beb0863b119816d12ffc04dfb0</id>
<content type='text'>
This patch adds creation of &lt;snapshot&gt; group for every mounted
snapshot in /sys/fs/nilfs2/&lt;device&gt;/mounted_snapshots group.

The group contains details about mounted snapshot:
(1) inodes_count - show number of inodes for snapshot.
(2) blocks_count - show number of blocks for snapshot.

Signed-off-by: Vyacheslav Dubeyko &lt;Vyacheslav.Dubeyko@hgst.com&gt;
Cc: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Michael L. Semon &lt;mlsemon35@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds creation of &lt;snapshot&gt; group for every mounted
snapshot in /sys/fs/nilfs2/&lt;device&gt;/mounted_snapshots group.

The group contains details about mounted snapshot:
(1) inodes_count - show number of inodes for snapshot.
(2) blocks_count - show number of blocks for snapshot.

Signed-off-by: Vyacheslav Dubeyko &lt;Vyacheslav.Dubeyko@hgst.com&gt;
Cc: Vyacheslav Dubeyko &lt;slava@dubeyko.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Michael L. Semon &lt;mlsemon35@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
