linux-stable.git/fs/ceph, branch v3.4.112

fs: create and use seq_show_option for escaping

2016-04-27T10:55:18+00:00

commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook 
Acked-by: Serge Hallyn 
Acked-by: Jan Kara 
Acked-by: Paul Moore 
Cc: J. R. Okajima 
Signed-off-by: Kees Cook 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[lizf: Backported to 3.4:
 - adjust context
 - one more place in ceph needs to be changed
 - drop changes to overlayfs
 - drop showing vers in cifs]
Signed-off-by: Zefan Li

move d_rcu from overlapping d_child to overlapping d_alias

2015-04-14T09:33:58+00:00

commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro 
[bwh: Backported to 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: Ben Hutchings 
[lizf: Backported to 3.4:
 - adjust context
 - need one more name change in debugfs]

introduce SIZE_MAX

2014-07-31T19:54:53+00:00

commit a3860c1c5dd1137db23d7786d284939c5761d517 upstream.

ULONG_MAX is often used to check for integer overflow when calculating
allocation size.  While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.

This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.

Signed-off-by: Xi Wang 
Acked-by: Alex Elder 
Cc: David Airlie 
Cc: Pekka Enberg 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Cc: Qiang Huang 
Signed-off-by: Greg Kroah-Hartman

ceph: Avoid data inconsistency due to d-cache aliasing in readpage()

2014-01-08T17:42:11+00:00

commit 56f91aad69444d650237295f68c195b74d888d95 upstream.

If the length of data to be read in readpage() is exactly
PAGE_CACHE_SIZE, the original code does not flush d-cache
for data consistency after finishing reading. This patches fixes
this.

Signed-off-by: Li Wang 
Signed-off-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

ceph: wake up 'safe' waiters when unregistering request

2014-01-08T17:42:10+00:00

commit fc55d2c9448b34218ca58733a6f51fbede09575b upstream.

We also need to wake up 'safe' waiters if error occurs or request
aborted. Otherwise sync(2)/fsync(2) may hang forever.

Signed-off-by: Yan, Zheng 
Signed-off-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

ceph: cleanup aborted requests when re-sending requests.

2014-01-08T17:42:10+00:00

commit eb1b8af33c2e42a9a57fc0a7588f4a7b255d2e79 upstream.

Aborted requests usually get cleared when the reply is received.
If MDS crashes, no reply will be received. So we need to cleanup
aborted requests when re-sending requests.

Signed-off-by: Yan, Zheng 
Reviewed-by: Greg Farnum 
Signed-off-by: Sage Weil 
Signed-off-by: Greg Kroah-Hartman

ceph: fix statvfs fr_size

2013-06-20T18:58:47+00:00

commit 92a49fb0f79f3300e6e50ddf56238e70678e4202 upstream.

Different versions of glibc are broken in different ways, but the short of
it is that for the time being, frsize should == bsize, and be used as the
multiple for the blocks, free, and available fields.  This mirrors what is
done for NFS.  The previous reporting of the page size for frsize meant
that newer glibc and df would report a very small value for the fs size.

Fixes http://tracker.ceph.com/issues/3793.

Signed-off-by: Sage Weil 
Reviewed-by: Greg Farnum 
Signed-off-by: Greg Kroah-Hartman

libceph: wrap auth ops in wrapper functions

2013-06-20T18:58:47+00:00

commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream.

Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks.  Simplifies the external
interface.

Signed-off-by: Sage Weil 
Reviewed-by: Alex Elder 
Signed-off-by: Greg Kroah-Hartman

libceph: add update_authorizer auth method

2013-06-20T18:58:46+00:00

commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream.

Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one.  In the meantime, when
we rotate our service keys, the authorizer doesn't get updated.  Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret.  If it
is not, we will build a new one that does.  This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

	http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil 
Reviewed-by: Alex Elder 
Signed-off-by: Greg Kroah-Hartman

ceph: ceph_pagelist_append might sleep while atomic

2013-06-20T18:58:43+00:00

commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream.

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [] __might_sleep+0xfc/0x110
[13439.384353]  [] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [] ? lock_flocks+0x15/0x20
[13439.409277]  [] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [] ? igrab+0x28/0x70
[13439.420610]  [] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [] ? mutex_unlock+0xe/0x10
[13439.473934]  [] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Signed-off-by: Jim Schutt 
Reviewed-by: Alex Elder 
Signed-off-by: Greg Kroah-Hartman