linux-stable.git/fs, branch v3.2.24

eCryptfs: Properly check for O_RDONLY flag before doing privileged open

2012-07-25T03:11:44+00:00

commit 9fe79d7600497ed8a95c3981cbe5b73ab98222f0 upstream.

If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.

The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.

Signed-off-by: Tyler Hicks 
Reported-by: Dan Carpenter 
Acked-by: Dan Carpenter 
Signed-off-by: Ben Hutchings

eCryptfs: Fix lockdep warning in miscdev operations

2012-07-25T03:11:44+00:00

commit 60d65f1f07a7d81d3eb3b91fc13fca80f2fdbb12 upstream.

Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:

 ecryptfsd/2141 is trying to acquire lock:
  (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

 but task is already holding lock:
  (&(*daemon)->mux){+.+...}, at: [] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(*daemon)->mux){+.+...}:
        [] lock_acquire+0x9d/0x220
        [] __mutex_lock_common+0x5a/0x4b0
        [] mutex_lock_nested+0x44/0x50
        [] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
        [] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
        [] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
        [] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
        [] ecryptfs_create+0x130/0x250 [ecryptfs]
        [] vfs_create+0xb4/0x120
        [] do_last+0x8c5/0xa10
        [] path_openat+0xd9/0x460
        [] do_filp_open+0x42/0xa0
        [] do_sys_open+0xf8/0x1d0
        [] sys_open+0x21/0x30
        [] system_call_fastpath+0x16/0x1b

 -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
        [] __lock_acquire+0x1bf8/0x1c50
        [] lock_acquire+0x9d/0x220
        [] __mutex_lock_common+0x5a/0x4b0
        [] mutex_lock_nested+0x44/0x50
        [] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
        [] vfs_read+0xb3/0x180
        [] sys_read+0x4d/0x90
        [] system_call_fastpath+0x16/0x1b

Signed-off-by: Tyler Hicks 
Signed-off-by: Ben Hutchings

eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files

2012-07-25T03:11:43+00:00

commit 8dc6780587c99286c0d3de747a2946a76989414a upstream.

File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.

In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.

https://launchpad.net/bugs/994247

Signed-off-by: Tyler Hicks 
Reported-by: Sasha Levin 
Tested-by: Sasha Levin 
Signed-off-by: Ben Hutchings

pnfs-obj: Fix __r4w_get_page when offset is beyond i_size

2012-07-25T03:11:31+00:00

commit c999ff68029ebd0f56ccae75444f640f6d5a27d2 upstream.

It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
	Change the API to ->__r4w_get_page() so a NULL can be
	returned without being considered as error, since XOR API
	treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
Signed-off-by: Boaz Harrosh 
[bwh: Backported to 3.2: adjust for lack of wdata->header]
Signed-off-by: Ben Hutchings

pnfs-obj: don't leak objio_state if ore_write/read fails

2012-07-25T03:11:30+00:00

commit 9909d45a8557455ca5f8ee7af0f253debc851f1a upstream.

[Bug since 3.2 Kernel]
Signed-off-by: Boaz Harrosh 
Signed-off-by: Ben Hutchings

ore: Remove support of partial IO request (NFS crash)

2012-07-25T03:11:30+00:00

commit 62b62ad873f2accad9222a4d7ffbe1e93f6714c1 upstream.

Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.

Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.

TODO:
	Support real forward progress with some reserved allocations
	of resources, such as mem pools and/or bio_sets

[Bug since 3.2 Kernel]
CC: Benny Halevy 
Signed-off-by: Boaz Harrosh 
Signed-off-by: Ben Hutchings

ore: Fix NFS crash by supporting any unaligned RAID IO

2012-07-25T03:11:29+00:00

commit 9ff19309a9623f2963ac5a136782ea4d8b5d67fb upstream.

In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.

Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.

The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.

The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.

The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute

And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.

Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.

This solution is much better for NFS than the previous supposedly
solution because the short write was dealt  with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)

NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.

hurray!!

[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
Signed-off-by: Boaz Harrosh 
Signed-off-by: Ben Hutchings

UBIFS: fix a bug in empty space fix-up

2012-07-25T03:11:29+00:00

commit c6727932cfdb13501108b16c38463c09d5ec7a74 upstream.

UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).

The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.

There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:

UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0

The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.

The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.

Signed-off-by: Artem Bityutskiy
Reported-by: Iwo Mergler
Tested-by: Iwo Mergler
Reported-by: James Nute
Signed-off-by: Ben Hutchings

cifs: always update the inode cache with the results from a FIND_*

2012-07-25T03:11:26+00:00

commit cd60042cc1392e79410dc8de9e9c1abb38a29e57 upstream.

When we get back a FIND_FIRST/NEXT result, we have some info about the
dentry that we use to instantiate a new inode. We were ignoring and
discarding that info when we had an existing dentry in the cache.

Fix this by updating the inode in place when we find an existing dentry
and the uniqueid is the same.

Reported-and-Tested-by: Andrew Bartlett 
Reported-by: Bill Robertson 
Reported-by: Dion Edwards 
Signed-off-by: Jeff Layton 
Signed-off-by: Steve French 
Signed-off-by: Ben Hutchings

cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space

2012-07-25T03:11:26+00:00

commit 3ae629d98bd5ed77585a878566f04f310adbc591 upstream.

We currently rely on being able to kmap all of the pages in an async
read or write request. If you're on a machine that has CONFIG_HIGHMEM
set then that kmap space is limited, sometimes to as low as 512 slots.

With 512 slots, we can only support up to a 2M r/wsize, and that's
assuming that we can get our greedy little hands on all of them. There
are other users however, so it's possible we'll end up stuck with a
size that large.

Since we can't handle a rsize or wsize larger than that currently, cap
those options at the number of kmap slots we have. We could consider
capping it even lower, but we currently default to a max of 1M. Might as
well allow those luddites on 32 bit arches enough rope to hang
themselves.

A more robust fix would be to teach the send and receive routines how
to contend with an array of pages so we don't need to marshal up a kvec
array at all. That's a fairly significant overhaul though, so we'll need
this limit in place until that's ready.

Reported-by: Jian Li 
Signed-off-by: Jeff Layton 
Signed-off-by: Steve French 
Signed-off-by: Ben Hutchings