linux-stable.git/fs/ext4, branch v3.18.26

ext4, jbd2: ensure entering into panic after recording an error in superblock

2016-01-21T16:23:28+00:00

[ Upstream commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 ]

If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option.  But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A                        Task B
ext4_handle_error()
-> jbd2_journal_abort()
  -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    |                         __ext4_abort()
    |                         -> jbd2_journal_abort()
    |                         | -> __journal_abort_soft()
    |                         |   -> if (journal->j_flags & JBD2_ABORT)
    |                         |           return;
    |                         -> panic()
    |
    -> jbd2_journal_update_sb_errno()

Tested-by: Hobin Woo 
Signed-off-by: Daeho Jeong 
Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

ext4: fix loss of delalloc extent info in ext4_zero_range()

2015-11-15T17:51:54+00:00

[ Upstream commit 94426f4b9648154dc5a6760b59e6953e640ab3b1 ]

In ext4_zero_range(), removing a file's entire block range from the
extent status tree removes all records of that file's delalloc extents.
The delalloc accounting code uses this information, and its loss can
then lead to accounting errors and kernel warnings at writeback time and
subsequent file system damage.  This is most noticeable on bigalloc
file systems where code in ext4_ext_map_blocks() handles cases where
delalloc extents share clusters with a newly allocated extent.

Because we're not deleting a block range and are correctly updating the
status of its associated extent, there is no need to remove anything
from the extent status tree.

When this patch is combined with an unrelated bug fix for
ext4_zero_range(), kernel warnings and e2fsck errors reported during
xfstests runs on bigalloc filesystems are greatly reduced without
introducing regressions on other xfstests-bld test scenarios.

Signed-off-by: Eric Whitney 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Sasha Levin

ext4: allocate entire range in zero range

2015-11-15T17:51:54+00:00

[ Upstream commit 0f2af21aae11972fa924374ddcf52e88347cf5a8 ]

Currently there is a bug in zero range code which causes zero range
calls to only allocate block aligned portion of the range, while
ignoring the rest in some cases.

In some cases, namely if the end of the range is past i_size, we do
attempt to preallocate the last nonaligned block. However this might
cause kernel to BUG() in some carefully designed zero range requests
on setups where page size > block size.

Fix this problem by first preallocating the entire range, including
the nonaligned edges and converting the written extents to unwritten
in the next step. This approach will also give us the advantage of
having the range to be as linearly contiguous as possible.

Signed-off-by: Lukas Czerner 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Sasha Levin

ext4: don't manipulate recovery flag when freezing no-journal fs

2015-10-27T13:33:09+00:00

[ Upstream commit c642dc9e1aaed953597e7092d7df329e6234096e ]

At some point along this sequence of changes:

f6e63f9 ext4: fold ext4_nojournal_sops into ext4_sops
bb04457 ext4: support freezing ext2 (nojournal) file systems
9ca9238 ext4: Use separate super_operations structure for no_journal filesystems

ext4 started setting needs_recovery on filesystems without journals
when they are unfrozen.  This makes no sense, and in fact confuses
blkid to the point where it doesn't recognize the filesystem at all.

(freeze ext2; unfreeze ext2; run blkid; see no output; run dumpe2fs,
see needs_recovery set on fs w/ no journal).

To fix this, don't manipulate the INCOMPAT_RECOVER feature on
filesystems without journals.

Reported-by: Stu Mark 
Reviewed-by: Jan Kara 
Signed-off-by: Eric Sandeen 
Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

ext4: replace open coded nofail allocation in ext4_free_blocks()

2015-08-04T18:15:19+00:00

[ Upstream commit 7444a072c387a93ebee7066e8aee776954ab0e41 ]

ext4_free_blocks is looping around the allocation request and mimics
__GFP_NOFAIL behavior without any allocation fallback strategy. Let's
remove the open coded loop and replace it with __GFP_NOFAIL. Without the
flag the allocator has no way to find out never-fail requirement and
cannot help in any way.

Signed-off-by: Michal Hocko 
Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

ext4: correctly migrate a file with a hole at the beginning

2015-08-04T18:14:59+00:00

[ Upstream commit 8974fec7d72e3e02752fe0f27b4c3719c78d9a15 ]

Currently ext4_ind_migrate() doesn't correctly handle a file which
contains a hole at the beginning of the file.  This caused the migration
to be done incorrectly, and then if there is a subsequent following
delayed allocation write to the "hole", this would reclaim the same data
blocks again and results in fs corruption.

  # assmuing 4k block size ext4, with delalloc enabled
  # skip the first block and write to the second block
  xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile

  # converting to indirect-mapped file, which would move the data blocks
  # to the beginning of the file, but extent status cache still marks
  # that region as a hole
  chattr -e /mnt/ext4/testfile

  # delayed allocation writes to the "hole", reclaim the same data block
  # again, results in i_blocks corruption
  xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
  umount /mnt/ext4
  e2fsck -nf /dev/sda6
  ...
  Inode 53, i_blocks is 16, should be 8.  Fix? no
  ...

Signed-off-by: Eryu Guan 
Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

ext4: be more strict when migrating to non-extent based file

2015-08-04T18:14:41+00:00

[ Upstream commit d6f123a9297496ad0b6335fe881504c4b5b2a5e5 ]

Currently the check in ext4_ind_migrate() is not enough before doing the
real conversion:

a) delayed allocated extents could bypass the check on eh->eh_entries
   and eh->eh_depth

This can be demonstrated by this script

  xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
  chattr -e /mnt/ext4/testfile

where testfile has two extents but still be converted to non-extent
based file format.

b) only extent length is checked but not the offset, which would result
   in data lose (delalloc) or fs corruption (nodelalloc), because
   non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
   blocks

This can be demostrated by

  xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
  chattr -e /mnt/ext4/testfile
  sync

If delalloc is enabled, dmesg prints
  EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
  EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
  EXT4-fs (dm-4): This should not happen!! Data will be lost

If delalloc is disabled, e2fsck -nf shows corruption
  Inode 53, i_size is 5497558142976, should be 4096.  Fix? no

Fix the two issues by

a) forcing all delayed allocation blocks to be allocated before checking
   eh->eh_depth and eh->eh_entries
b) limiting the last logical block of the extent is within direct map

Signed-off-by: Eryu Guan 
Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

ext4: fix reservation release on invalidatepage for delalloc fs

2015-08-04T18:14:23+00:00

[ Upstream commit 9705acd63b125dee8b15c705216d7186daea4625 ]

On delalloc enabled file system on invalidatepage operation
in ext4_da_page_release_reservation() we want to clear the delayed
buffer and remove the extent covering the delayed buffer from the extent
status tree.

However currently there is a bug where on the systems with page size >
block size we will always remove extents from the start of the page
regardless where the actual delayed buffers are positioned in the page.
This leads to the errors like this:

EXT4-fs warning (device loop0): ext4_da_release_space:1225:
ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
blocks

This however can cause data loss on writeback time if the file system is
in ENOSPC condition because we're releasing reservation for someones
else delayed buffer.

Fix this by only removing extents that corresponds to the part of the
page we want to invalidate.

This problem is reproducible by the following fio receipt (however I was
only able to reproduce it with fio-2.1 or older.

[global]
bs=8k
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1m
directory=/mnt/test
numjobs=20
[job1]
ioengine=sync
bs=1k
direct=1
rw=randread
filename=file1:file2
[job2]
ioengine=libaio
rw=randwrite
direct=1
filename=file1:file2
[job3]
bs=1k
ioengine=posixaio
rw=randwrite
direct=1
filename=file1:file2
[job5]
bs=1k
ioengine=sync
rw=randread
filename=file1:file2
[job7]
ioengine=libaio
rw=randwrite
filename=file1:file2
[job8]
ioengine=posixaio
rw=randwrite
filename=file1:file2
[job10]
ioengine=mmap
rw=randwrite
bs=1k
filename=file1:file2
[job11]
ioengine=mmap
rw=randwrite
direct=1
filename=file1:file2

Signed-off-by: Lukas Czerner 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

ext4: don't retry file block mapping on bigalloc fs with non-extent file

2015-07-05T14:12:47+00:00

[ Upstream commit 292db1bc6c105d86111e858859456bcb11f90f91 ]

ext4 isn't willing to map clusters to a non-extent file.  Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever.  Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.

(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)

Signed-off-by: Darrick J. Wong 
Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

ext4: call sync_blockdev() before invalidate_bdev() in put_super()

2015-07-04T03:02:34+00:00

[ Upstream commit 89d96a6f8e6491f24fc8f99fd6ae66820e85c6c1 ]

Normally all of the buffers will have been forced out to disk before
we call invalidate_bdev(), but there will be some cases, where a file
system operation was aborted due to an ext4_error(), where there may
still be some dirty buffers in the buffer cache for the device.  So
try to force them out to memory before calling invalidate_bdev().

This fixes a warning triggered by generic/081:

WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()

Signed-off-by: Theodore Ts'o 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin