linux-stable.git/fs, branch v3.2.71

Btrfs: fix file corruption after cloning inline extents

2015-08-12T14:33:21+00:00

commit ed958762644b404654a6f5d23e869f496fe127c6 upstream.

Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.

For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:

  _scratch_mkfs >>$seqres.full 2>&1
  _scratch_mount

  # Create our test files. File foo has the same 2K of data at offset 4K
  # as file bar has at its offset 0.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
      -c "pwrite -S 0xbb 4k 2K" \
      -c "pwrite -S 0xcc 8K 4K" \
      $SCRATCH_MNT/foo | _filter_xfs_io

  # File bar consists of a single inline extent (2K size).
  $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
     $SCRATCH_MNT/bar | _filter_xfs_io

  # Now call the clone ioctl to clone the extent of file bar into file
  # foo at its offset 4K. This made file foo have an inline extent at
  # offset 4K, something which the btrfs code can not deal with in future
  # IO operations because all inline extents are supposed to start at an
  # offset of 0, resulting in all sorts of chaos.
  # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
  # what it returns for other cases dealing with inlined extents.
  $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
      $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Because of the inline extent at offset 4K, the following write made
  # the kernel crash with a BUG_ON().
  $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io

  status=0
  exit

The stack trace of the BUG_ON() triggered by the last write is:

  [152154.035903] ------------[ cut here ]------------
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
  [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
  [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
  [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
  [152154.036424] RIP: 0010:[]  [] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
  [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
  [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
  [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
  [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
  [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
  [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
  [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
  [152154.036424] Stack:
  [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
  [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
  [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
  [152154.036424] Call Trace:
  [152154.036424]  [] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
  [152154.036424]  [] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
  [152154.036424]  [] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
  [152154.036424]  [] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
  [152154.036424]  [] __vfs_write+0x7c/0xa5
  [152154.036424]  [] vfs_write+0xa0/0xe4
  [152154.036424]  [] SyS_pwrite64+0x64/0x82
  [152154.036424]  [] system_call_fastpath+0x12/0x6f
  [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
  [152154.036424] RIP  [] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424]  RSP 
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:

   00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
   3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")

Signed-off-by: Filipe Manana 
[bwh: Backported to 3.2: test new_key.offset as last_dest_end isn't defined
 in this function]
Signed-off-by: Ben Hutchings

9p: don't leave a half-initialized inode sitting around

2015-08-12T14:33:21+00:00

commit 0a73d0a204a4a04a1e110539c5a524ae51f91d6d upstream.

Signed-off-by: Al Viro 
Signed-off-by: Ben Hutchings

ext4: replace open coded nofail allocation in ext4_free_blocks()

2015-08-12T14:33:19+00:00

commit 7444a072c387a93ebee7066e8aee776954ab0e41 upstream.

ext4_free_blocks is looping around the allocation request and mimics
__GFP_NOFAIL behavior without any allocation fallback strategy. Let's
remove the open coded loop and replace it with __GFP_NOFAIL. Without the
flag the allocator has no way to find out never-fail requirement and
cannot help in any way.

Signed-off-by: Michal Hocko 
Signed-off-by: Theodore Ts'o 
[bwh: Backported to 3.2:
 - Adjust context
 - s/ext4_free_data_cachep/ext4_free_ext_cachep/]
Signed-off-by: Ben Hutchings

ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp

2015-08-12T14:33:19+00:00

commit c45653c341f5c8a0ce19c8f0ad4678640849cb86 upstream.

Switch ext4 to using sb_getblk_gfp with GFP_NOFS added to fix possible
deadlocks in the page writeback path.

Signed-off-by: Nikolay Borisov 
Signed-off-by: Theodore Ts'o 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

fs/buffer.c: support buffer cache allocations with gfp modifiers

2015-08-12T14:33:18+00:00

commit 3b5e6454aaf6b4439b19400d8365e2ec2d24e411 upstream.

A buffer cache is allocated from movable area because it is referred
for a while and released soon.  But some filesystems are taking buffer
cache for a long time and it can disturb page migration.

New APIs are introduced to allocate buffer cache with user specific
flag.  *_gfp APIs are for user want to set page allocation flag for
page cache allocation.  And *_unmovable APIs are for the user wants to
allocate page cache from non-movable area.

Signed-off-by: Gioh Kim 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
[bwh: Prerequisite for "bufferhead: Add _gfp version for sb_getblk()".
 Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

fuse: initialize fc->release before calling it

2015-08-12T14:33:18+00:00

commit 0ad0b3255a08020eaf50e34ef0d6df5bdf5e09ed upstream.

fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.

[Jeremiah Mahler : assign fc->release after calling
fuse_conn_init(fc) instead of before.]

Signed-off-by: Miklos Szeredi 
Fixes: a325f9b92273 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Signed-off-by: Ben Hutchings

Btrfs: fix race between caching kthread and returning inode to inode cache

2015-08-12T14:33:18+00:00

commit ae9d8f17118551bedd797406a6768b87c2146234 upstream.

While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.

This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.

This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:

[132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
[132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
[132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
[132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
[132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
[132408.505075] Call Trace:
[132408.505075]  [] dump_stack+0x4f/0x7b
[132408.505075]  [] ? console_unlock+0x356/0x3a2
[132408.505075]  [] __slab_error.isra.28+0x25/0x36
[132408.505075]  [] __cache_free+0xe2/0x4b6
[132408.505075]  [] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
[132408.505075]  [] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [] ? time_hardirqs_off+0x15/0x28
[132408.505075]  [] ? trace_hardirqs_off+0xd/0xf
[132408.505075]  [] ? kfree+0xb6/0x14e
[132408.505075]  [] kfree+0xe5/0x14e
[132408.505075]  [] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [] caching_kthread+0x29e/0x2d9 [btrfs]
[132408.505075]  [] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
[132408.505075]  [] kthread+0xef/0xf7
[132408.505075]  [] ? time_hardirqs_on+0x15/0x28
[132408.505075]  [] ? __kthread_parkme+0xad/0xad
[132408.505075]  [] ret_from_fork+0x42/0x70
[132408.505075]  [] ? __kthread_parkme+0xad/0xad
[132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
[132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
[132409.503355] ------------[ cut here ]------------
[132409.504241] kernel BUG at mm/slab.c:2571!

Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
that protects the rbtree while doing the searches and removing entries.

Fixes: 1c70d8fb4dfa ("Btrfs: fix inode caching vs tree log")
Signed-off-by: Filipe Manana 
Signed-off-by: Chris Mason 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

Btrfs: use kmem_cache_free when freeing entry in inode cache

2015-08-12T14:33:18+00:00

commit c3f4a1685bb87e59c886ee68f7967eae07d4dffa upstream.

The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:

  /*
   * (...)
   *
   * Don't free memory not originally allocated by kmalloc()
   * or you will run into trouble.
   */

So better be safe and use kmem_cache_free().

Signed-off-by: Filipe Manana 
Reviewed-by: David Sterba 
Signed-off-by: Chris Mason 
Signed-off-by: Ben Hutchings

ext4: don't retry file block mapping on bigalloc fs with non-extent file

2015-08-12T14:33:16+00:00

commit 292db1bc6c105d86111e858859456bcb11f90f91 upstream.

ext4 isn't willing to map clusters to a non-extent file.  Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever.  Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.

(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)

Signed-off-by: Darrick J. Wong 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Ben Hutchings

ext4: call sync_blockdev() before invalidate_bdev() in put_super()

2015-08-12T14:33:16+00:00

commit 89d96a6f8e6491f24fc8f99fd6ae66820e85c6c1 upstream.

Normally all of the buffers will have been forced out to disk before
we call invalidate_bdev(), but there will be some cases, where a file
system operation was aborted due to an ext4_error(), where there may
still be some dirty buffers in the buffer cache for the device.  So
try to force them out to memory before calling invalidate_bdev().

This fixes a warning triggered by generic/081:

WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()

Signed-off-by: Theodore Ts'o 
Signed-off-by: Ben Hutchings