linux-stable.git/fs/ext4, branch v3.2.96

ext4: fix fencepost in s_first_meta_bg validation

2017-11-26T13:51:10+00:00

commit 2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2 upstream.

It is OK for s_first_meta_bg to be equal to the number of block group
descriptor blocks.  (It rarely happens, but it shouldn't cause any
problems.)

https://bugzilla.kernel.org/show_bug.cgi?id=194567

Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe
Signed-off-by: Theodore Ts'o 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

ext4: validate s_first_meta_bg at mount time

2017-11-26T13:51:10+00:00

commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe upstream.

Ralf Spenneberg reported that he hit a kernel crash when mounting a
modified ext4 image. And it turns out that kernel crashed when
calculating fs overhead (ext4_calculate_overhead()), this is because
the image has very large s_first_meta_bg (debug code shows it's
842150400), and ext4 overruns the memory in count_overhead() when
setting bitmap buffer, which is PAGE_SIZE.

ext4_calculate_overhead():
  buf = get_zeroed_page(GFP_NOFS);  <=== PAGE_SIZE buffer
  blks = count_overhead(sb, i, buf);

count_overhead():
  for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400
          ext4_set_bit(EXT4_B2C(sbi, s++), buf);   <=== buffer overrun
          count++;
  }

This can be reproduced easily for me by this script:

  #!/bin/bash
  rm -f fs.img
  mkdir -p /mnt/ext4
  fallocate -l 16M fs.img
  mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img
  debugfs -w -R "ssv first_meta_bg 842150400" fs.img
  mount -o loop fs.img /mnt/ext4

Fix it by validating s_first_meta_bg first at mount time, and
refusing to mount if its value exceeds the largest possible meta_bg
number.

Reported-by: Ralf Spenneberg 
Signed-off-by: Eryu Guan 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Andreas Dilger 
[bwh: Backported to 3.2: open-code ext4_has_feature_meta_bg()]
Signed-off-by: Ben Hutchings

ext4: Don't clear SGID when inheriting ACLs

2017-10-12T14:27:18+00:00

commit a3bb2d5587521eea6dab2d05326abb0afb460abd upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__ext4_set_acl() into ext4_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
Signed-off-by: Theodore Ts'o 
Signed-off-by: Jan Kara 
Reviewed-by: Andreas Gruenbacher 
[bwh: Backported to 3.2: the __ext4_set_acl() function didn't exist,
 so added it]
Signed-off-by: Ben Hutchings

ext4: preserve i_mode if __ext4_set_acl() fails

2017-10-12T14:27:18+00:00

commit 397e434176bb62bc6068d2210af1d876c6212a7e upstream.

When changing a file's acl mask, __ext4_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández 
Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

ext4: fix fdatasync(2) after extent manipulation operations

2017-09-15T17:30:48+00:00

commit 67a7d5f561f469ad2fa5154d2888258ab8e6df7c upstream.

Currently, extent manipulation operations such as hole punch, range
zeroing, or extent shifting do not record the fact that file data has
changed and thus fdatasync(2) has a work to do. As a result if we crash
e.g. after a punch hole and fdatasync, user can still possibly see the
punched out data after journal replay. Test generic/392 fails due to
these problems.

Fix the problem by properly marking that file data has changed in these
operations.

Fixes: a4bb6b64e39abc0e41ca077725f2a72c868e7622
Signed-off-by: Jan Kara 
Signed-off-by: Theodore Ts'o 
[bwh: Backported to 3.2: Only the punch-hole operation is supported, and
 it's in extents.c.]
Signed-off-by: Ben Hutchings

ext4: fix data corruption for mmap writes

2017-09-15T17:30:48+00:00

commit a056bdaae7a181f7dcc876cfab2f94538e508709 upstream.

mpage_submit_page() can race with another process growing i_size and
writing data via mmap to the written-back page. As mpage_submit_page()
samples i_size too early, it may happen that ext4_bio_write_page()
zeroes out too large tail of the page and thus corrupts user data.

Fix the problem by sampling i_size only after the page has been
write-protected in page tables by clear_page_dirty_for_io() call.

Reported-by: Michael Zimmer 
Fixes: cb20d5188366f04d96d2e07b1240cc92170ade40
Signed-off-by: Jan Kara 
Signed-off-by: Theodore Ts'o 
[bwh: Backported to 3.2: The writeback path is very different here and
 it needs to read i_size long before calling clear_page_dirty_for_io().
 So read it twice and skip the page if it changed.]
Signed-off-by: Ben Hutchings

ext4: keep existing extra fields when inode expands

2017-09-15T17:30:46+00:00

commit 887a9730614727c4fff7cb756711b190593fc1df upstream.

ext4_expand_extra_isize() should clear only space between old and new
size.

Fixes: 6dd4ee7cab7e # v2.6.23
Signed-off-by: Konstantin Khlebnikov 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Ben Hutchings

ext4: preserve the needs_recovery flag when the journal is aborted

2017-06-05T20:13:47+00:00

commit 97abd7d4b5d9c48ec15c425485f054e1c15e591b upstream.

If the journal is aborted, the needs_recovery feature flag should not
be removed.  Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.

Signed-off-by: Theodore Ts'o 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

ext4: fix data corruption in data=journal mode

2017-06-05T20:13:45+00:00

commit 3b136499e906460919f0d21a49db1aaccf0ae963 upstream.

ext4_journalled_write_end() did not propely handle all the cases when
generic_perform_write() did not copy all the data into the target page
and could mark buffers with uninitialized contents as uptodate and dirty
leading to possible data corruption (which would be quickly fixed by
generic_perform_write() retrying the write but still). Fix the problem
by carefully handling the case when the page that is written to is not
uptodate.

Reported-by: Al Viro 
Signed-off-by: Jan Kara 
Signed-off-by: Theodore Ts'o 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

ext4: use private version of page_zero_new_buffers() for data=journal mode

2017-06-05T20:13:45+00:00

commit b90197b655185a11640cce3a0a0bc5d8291b8ad2 upstream.

If there is a error while copying data from userspace into the page
cache during a write(2) system call, in data=journal mode, in
ext4_journalled_write_end() were using page_zero_new_buffers() from
fs/buffer.c.  Unfortunately, this sets the buffer dirty flag, which is
no good if journalling is enabled.  This is a long-standing bug that
goes back for years and years in ext3, but a combination of (a)
data=journal not being very common, (b) in many case it only results
in a warning message. and (c) only very rarely causes the kernel hang,
means that we only really noticed this as a problem when commit
998ef75ddb caused this failure to happen frequently enough to cause
generic/208 to fail when run in data=journal mode.

The fix is to have our own version of this function that doesn't call
mark_dirty_buffer(), since we will end up calling
ext4_handle_dirty_metadata() on the buffer head(s) in questions very
shortly afterwards in ext4_journalled_write_end().

Thanks to Dave Hansen and Linus Torvalds for helping to identify the
root cause of the problem.

Signed-off-by: Theodore Ts'o 
Reviewed-by: Jan Kara 
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings