linux-stable.git/fs/ext4/inode.c, branch linux-3.14.y

ext4: don't call ext4_should_journal_data() on the journal inode

2016-08-16T07:29:03+00:00

commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c upstream.

If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up.  In that case, the iput() on the journal inode can
end up causing a BUG().

Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.

Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara 
Signed-off-by: Vegard Nossum 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
Signed-off-by: Greg Kroah-Hartman

ext4: fix deadlock during page writeback

2016-08-16T07:29:03+00:00

commit 646caa9c8e196880b41cd3e3d33a2ebc752bdb85 upstream.

Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.

The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.

Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().

[ Changed to defer the call to ext4_journal_stop() only if the handle
  is synchronous.  --tytso ]

Reported-and-tested-by: Eryu Guan 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jan Kara 
Signed-off-by: Greg Kroah-Hartman

ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()

2016-05-04T21:51:25+00:00

commit 5e1021f2b6dff1a86a468a1424d59faae2bc63c1 upstream.

ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
ignored in the following "if" condition and ext4_expand_extra_isize()
might be called with NULL iloc.bh set, which triggers NULL pointer
dereference.

This is uncovered by commit 8b4953e13f4c ("ext4: reserve code points for
the project quota feature"), which enlarges the ext4_inode size, and
run the following script on new kernel but with old mke2fs:

  #/bin/bash
  mnt=/mnt/ext4
  devname=ext4-error
  dev=/dev/mapper/$devname
  fsimg=/home/fs.img

  trap cleanup 0 1 2 3 9 15

  cleanup()
  {
          umount $mnt >/dev/null 2>&1
          dmsetup remove $devname
          losetup -d $backend_dev
          rm -f $fsimg
          exit 0
  }

  rm -f $fsimg
  fallocate -l 1g $fsimg
  backend_dev=`losetup -f --show $fsimg`
  devsize=`blockdev --getsz $backend_dev`

  good_tab="0 $devsize linear $backend_dev 0"
  error_tab="0 $devsize error $backend_dev 0"

  dmsetup create $devname --table "$good_tab"

  mkfs -t ext4 $dev
  mount -t ext4 -o errors=continue,strictatime $dev $mnt

  dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
  echo 3 > /proc/sys/vm/drop_caches
  ls -l $mnt
  exit 0

[ Patch changed to simplify the function a tiny bit. -- Ted ]

Signed-off-by: Eryu Guan 
Signed-off-by: Theodore Ts'o 
Cc: Jan Kara 
Signed-off-by: Greg Kroah-Hartman

ext4: fix reservation release on invalidatepage for delalloc fs

2015-08-03T16:29:53+00:00

commit 9705acd63b125dee8b15c705216d7186daea4625 upstream.

On delalloc enabled file system on invalidatepage operation
in ext4_da_page_release_reservation() we want to clear the delayed
buffer and remove the extent covering the delayed buffer from the extent
status tree.

However currently there is a bug where on the systems with page size >
block size we will always remove extents from the start of the page
regardless where the actual delayed buffers are positioned in the page.
This leads to the errors like this:

EXT4-fs warning (device loop0): ext4_da_release_space:1225:
ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
blocks

This however can cause data loss on writeback time if the file system is
in ENOSPC condition because we're releasing reservation for someones
else delayed buffer.

Fix this by only removing extents that corresponds to the part of the
page we want to invalidate.

This problem is reproducible by the following fio receipt (however I was
only able to reproduce it with fio-2.1 or older.

[global]
bs=8k
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1m
directory=/mnt/test
numjobs=20
[job1]
ioengine=sync
bs=1k
direct=1
rw=randread
filename=file1:file2
[job2]
ioengine=libaio
rw=randwrite
direct=1
filename=file1:file2
[job3]
bs=1k
ioengine=posixaio
rw=randwrite
direct=1
filename=file1:file2
[job5]
bs=1k
ioengine=sync
rw=randread
filename=file1:file2
[job7]
ioengine=libaio
rw=randwrite
filename=file1:file2
[job8]
ioengine=posixaio
rw=randwrite
filename=file1:file2
[job10]
ioengine=mmap
rw=randwrite
bs=1k
filename=file1:file2
[job11]
ioengine=mmap
rw=randwrite
direct=1
filename=file1:file2

Signed-off-by: Lukas Czerner 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
Signed-off-by: Greg Kroah-Hartman

ext4: fix race between truncate and __ext4_journalled_writepage()

2015-08-03T16:29:52+00:00

commit bdf96838aea6a265f2ae6cbcfb12a778c84a0b8e upstream.

The commit cf108bca465d: "ext4: Invert the locking order of page_lock
and transaction start" caused __ext4_journalled_writepage() to drop
the page lock before the page was written back, as part of changing
the locking order to jbd2_journal_start -> page_lock.  However, this
introduced a potential race if there was a truncate racing with the
data=journalled writeback mode.

Fix this by grabbing the page lock after starting the journal handle,
and then checking to see if page had gotten truncated out from under
us.

This fixes a number of different warnings or BUG_ON's when running
xfstests generic/086 in data=journalled mode, including:

jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7
c0, 164), jh->b_transaction (  (null), 0), jh->b_next_transaction (  (null), 0), jlist 0

	      	      	  - and -

kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
    ...
Call Trace:
 [] ? __ext4_journalled_invalidatepage+0x117/0x117
 [] __ext4_journalled_invalidatepage+0x10f/0x117
 [] ? __ext4_journalled_invalidatepage+0x117/0x117
 [] ? lock_buffer+0x36/0x36
 [] ext4_journalled_invalidatepage+0xd/0x22
 [] do_invalidatepage+0x22/0x26
 [] truncate_inode_page+0x5b/0x85
 [] truncate_inode_pages_range+0x156/0x38c
 [] truncate_inode_pages+0x11/0x15
 [] truncate_pagecache+0x55/0x71
 [] ext4_setattr+0x4a9/0x560
 [] ? current_kernel_time+0x10/0x44
 [] notify_change+0x1c7/0x2be
 [] do_truncate+0x65/0x85
 [] ? file_ra_state_init+0x12/0x29

	      	      	  - and -

WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
irty_metadata+0x14a/0x1ae()
    ...
Call Trace:
 [] ? console_unlock+0x3a1/0x3ce
 [] dump_stack+0x48/0x60
 [] warn_slowpath_common+0x89/0xa0
 [] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
 [] warn_slowpath_null+0x14/0x18
 [] jbd2_journal_dirty_metadata+0x14a/0x1ae
 [] __ext4_handle_dirty_metadata+0xd4/0x19d
 [] write_end_fn+0x40/0x53
 [] ext4_walk_page_buffers+0x4e/0x6a
 [] ext4_writepage+0x354/0x3b8
 [] ? mpage_release_unused_pages+0xd4/0xd4
 [] ? wait_on_buffer+0x2c/0x2c
 [] ? ext4_writepage+0x3b8/0x3b8
 [] __writepage+0x10/0x2e
 [] write_cache_pages+0x22d/0x32c
 [] ? ext4_writepage+0x3b8/0x3b8
 [] ext4_writepages+0x102/0x607
 [] ? sched_clock_local+0x10/0x10e
 [] ? __lock_is_held+0x2e/0x44
 [] ? lock_is_held+0x43/0x51
 [] do_writepages+0x1c/0x29
 [] __writeback_single_inode+0xc3/0x545
 [] writeback_sb_inodes+0x21f/0x36d
    ...

Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman

ext4: fix data corruption caused by unwritten and delayed extents

2015-05-13T12:16:57+00:00

commit d2dc317d564a46dfc683978a2e5a4f91434e9711 upstream.

Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.

The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.

At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.

When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.

For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.

This problem can be easily reproduced by running the following xfs_io.

xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
          -c "falloc 0 131072" \
          -c "pwrite -S 0xbb 65536 2048" \
          -c "fsync" /mnt/test/fff

echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)

Signed-off-by: Lukas Czerner 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman

ext4: Replace open coded mdata csum feature to helper function

2014-11-14T16:59:58+00:00

commit 9aa5d32ba269bec0e7eaba2697a986a7b0bc8528 upstream.

Besides the fact that this replacement improves code readability
it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
which may result attempt to use uninitialized  csum machinery.

#Testcase_BEGIN
IMG=/dev/ram0
MNT=/mnt
mkfs.ext4 $IMG
mount $IMG $MNT
#Enable feature directly on disk, on mounted fs
tune2fs -O metadata_csum  $IMG
# Provoke metadata update, likey result in OOPS
touch $MNT/test
umount $MNT
#Testcase_END

# Replacement script
@@
expression E;
@@
- EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+ ext4_has_metadata_csum(E)

https://bugzilla.kernel.org/show_bug.cgi?id=82201

Signed-off-by: Dmitry Monakhov 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman

ext4: fix reservation overflow in ext4_da_write_begin

2014-11-14T16:59:58+00:00

commit 0ff8947fc5f700172b37cbca811a38eb9cb81e08 upstream.

Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary.  However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.

This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:

dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
	conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
	conv=notrunc of=testfile

leads to:

EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28

Adjust the number of credits based on whether the flag is
already set, and whether the current write may extend past the
LARGE_FILE limit.

Signed-off-by: Eric Sandeen 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Andreas Dilger 
Signed-off-by: Greg Kroah-Hartman

ext4: add ext4_iget_normal() which is to be used for dir tree lookups

2014-11-14T16:59:58+00:00

commit f4bb2981024fc91b23b4d09a8817c415396dbabb upstream.

If there is a corrupted file system which has directory entries that
point at reserved, metadata inodes, prohibit them from being used by
treating them the same way we treat Boot Loader inodes --- that is,
mark them to be bad inodes.  This prohibits them from being opened,
deleted, or modified via chmod, chown, utimes, etc.

In particular, this prevents a corrupted file system which has a
directory entry which points at the journal inode from being deleted
and its blocks released, after which point Much Hilarity Ensues.

Reported-by: Sami Liedes 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman

ext4: fix mmap data corruption when blocksize < pagesize

2014-11-14T16:59:58+00:00

commit d6320cbfc92910a3e5f10c42d98c231c98db4f60 upstream.

Use truncate_isize_extended() when hole is being created in a file so that
->page_mkwrite() will get called for the partial tail page if it is
mmaped (see the first patch in the series for details).

Signed-off-by: Jan Kara 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman