linux.git/fs/ext3, branch v2.6.23

ext34: ensure do_split leaves enough free space in both blocks

2007-09-19T18:24:18+00:00

The do_split() function for htree dir blocks is intended to split a leaf
block to make room for a new entry.  It sorts the entries in the original
block by hash value, then moves the last half of the entries to the new
block - without accounting for how much space this actually moves.  (IOW,
it moves half of the entry *count* not half of the entry *space*).  If by
chance we have both large & small entries, and we move only the smallest
entries, and we have a large new entry to insert, we may not have created
enough space for it.

The patch below stores each record size when calculating the dx_map, and
then walks the hash-sorted dx_map, calculating how many entries must be
moved to more evenly split the existing entries between the old block and
the new block, guaranteeing enough space for the new entry.

The dx_map "offs" member is reduced to u16 so that the overall map size
does not change - it is temporarily stored at the end of the new block, and
if it grows too large it may be overwritten.  By making offs and size both
u16, we won't grow the map size.

Also add a few comments to the functions involved.

This fixes the testcase reported by hooanon05@yahoo.co.jp on the
linux-ext4 list, "ext3 dir_index causes an error"

Thanks to Andreas Dilger for discussing the problem & solution with me.

Signed-off-by: Eric Sandeen 
Signed-off-by: Andreas Dilger 
Tested-by: Junjiro Okajima 
Cc: Theodore Ts'o 
Cc: 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

dir_index: error out instead of BUG on corrupt dx dirs

2007-09-19T18:24:18+00:00

Convert asserts (BUGs) in dx_probe from bad on-disk data to recoverable
errors with helpful warnings.  With help catching other asserts from Duane
Griffin 

Signed-off-by: Eric Sandeen 
Acked-by: Duane Griffin 
Acked-by: Theodore Ts'o 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

quota: fix infinite loop

2007-09-12T00:21:19+00:00

If we fail to start a transaction when releasing dquot, we have to call
dquot_release() anyway to mark dquot structure as inactive.  Otherwise we
end in an infinite loop inside dqput().

Signed-off-by: Jan Kara 
Cc: xb 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fix inode_table test in ext234_check_descriptors

2007-07-26T18:35:17+00:00

ext[234]_check_descriptors sanity checks block group descriptor geometry at
mount time, testing whether the block bitmap, inode bitmap, and inode table
reside wholly within the blockgroup.  However, the inode table test is off
by one so that if the last block in the inode table resides on the last
block of the block group, the test incorrectly fails.  This is because it
tests the last block as (start + length) rather than (start + length - 1).

This can be seen by trying to mount a filesystem made such as:

 mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024

which yields:

 EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)!
 EXT2-fs: group descriptors corrupted!

There is a similar bug in e2fsprogs, patch already sent for that.

(I wonder if inside(), outside(), and/or in_range() should someday be
used in this and other tests throughout the ext filesystems...)

Signed-off-by: Eric Sandeen 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: Remove slab destructors from kmem_cache_create().

2007-07-20T01:11:58+00:00

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

readahead: split ondemand readahead interface into two functions

2007-07-19T17:04:44+00:00

Split ondemand readahead interface into two functions.  I think this makes it
a little clearer for non-readahead experts (like Rusty).

Internally they both call ondemand_readahead(), but the page argument is
changed to an obvious boolean flag.

Signed-off-by: Rusty Russell 
Signed-off-by: Fengguang Wu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

readahead: convert ext3/ext4 invocations

2007-07-19T17:04:44+00:00

Convert ext3/ext4 dir reads to use on-demand readahead.

Readahead for dirs operates _not_ on file level, but on blockdev level.  This
makes a difference when the data blocks are not continuous.  And the read
routine is somehow opaque: there's no handy info about the status of current
page.  So a simplified call scheme is employed: to call into readahead
whenever the current page falls out of readahead windows.

Signed-off-by: Fengguang Wu 
Cc: Steven Pratt 
Cc: Ram Pai 
Cc: Rusty Russell 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check

2007-07-17T19:00:03+00:00

Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]

The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

Signed-off-by: Satyam Sharma 
Cc: Al Viro 
Acked-by: Serge E. Hallyn 
Signed-off-by: Linus Torvalds

knfsd: exportfs: add exportfs.h header

2007-07-17T17:23:06+00:00

currently the export_operation structure and helpers related to it are in
fs.h.  fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.

[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig 
Signed-off-by: Neil Brown 
Cc: Steven French 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

ext3: statfs speed up

2007-07-16T16:05:52+00:00

This is a patch that speeds up statfs.  It is very simple - the "overhead"
calculation, which takes a huge amount of time for large filesystems, never
changes unless the size of the filesystem itself changes.  That means we can
store it in memory and only recalculate if the filesystem has been resized
(almost never).

It also fixes a minor problem that we never update the on-disk superblock free
blocks/inodes counts until the filesystem is unmounted.  While not fatal, we
may as well update that on disk when we have the information, and it makes
things like debugfs and dumpe2fs report a bit more accurate info.

Signed-off-by: Badari Pulavarty 
Signed-off-by: Andreas Dilger 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds