<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs/ceph, branch v3.4.112</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>fs: create and use seq_show_option for escaping</title>
<updated>2016-04-27T10:55:18+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2015-09-04T22:44:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b0cce01be5f58ed399fdfc8e1b0fbcd827a35aef'/>
<id>b0cce01be5f58ed399fdfc8e1b0fbcd827a35aef</id>
<content type='text'>
commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Jan Kara &lt;jack@suse.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: J. R. Okajima &lt;hooanon05g@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[lizf: Backported to 3.4:
 - adjust context
 - one more place in ceph needs to be changed
 - drop changes to overlayfs
 - drop showing vers in cifs]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream.

Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Jan Kara &lt;jack@suse.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: J. R. Okajima &lt;hooanon05g@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[lizf: Backported to 3.4:
 - adjust context
 - one more place in ceph needs to be changed
 - drop changes to overlayfs
 - drop showing vers in cifs]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>move d_rcu from overlapping d_child to overlapping d_alias</title>
<updated>2015-04-14T09:33:58+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2014-10-26T23:19:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6fd17def6d964c81205230e02b4208c653106d51'/>
<id>6fd17def6d964c81205230e02b4208c653106d51</id>
<content type='text'>
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[bwh: Backported to 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
[lizf: Backported to 3.4:
 - adjust context
 - need one more name change in debugfs]
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
[bwh: Backported to 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
[lizf: Backported to 3.4:
 - adjust context
 - need one more name change in debugfs]
</pre>
</div>
</content>
</entry>
<entry>
<title>introduce SIZE_MAX</title>
<updated>2014-07-31T19:54:53+00:00</updated>
<author>
<name>Xi Wang</name>
<email>xi.wang@gmail.com</email>
</author>
<published>2012-05-31T23:26:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b06b5c6204bdc7c571feedb6b16188ba62feb9a6'/>
<id>b06b5c6204bdc7c571feedb6b16188ba62feb9a6</id>
<content type='text'>
commit a3860c1c5dd1137db23d7786d284939c5761d517 upstream.

ULONG_MAX is often used to check for integer overflow when calculating
allocation size.  While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.

This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.

Signed-off-by: Xi Wang &lt;xi.wang@gmail.com&gt;
Acked-by: Alex Elder &lt;elder@dreamhost.com&gt;
Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Qiang Huang &lt;h.huangqiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a3860c1c5dd1137db23d7786d284939c5761d517 upstream.

ULONG_MAX is often used to check for integer overflow when calculating
allocation size.  While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.

This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.

Signed-off-by: Xi Wang &lt;xi.wang@gmail.com&gt;
Acked-by: Alex Elder &lt;elder@dreamhost.com&gt;
Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Qiang Huang &lt;h.huangqiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: Avoid data inconsistency due to d-cache aliasing in readpage()</title>
<updated>2014-01-08T17:42:11+00:00</updated>
<author>
<name>Li Wang</name>
<email>liwang@ubuntukylin.com</email>
</author>
<published>2013-11-13T07:22:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e0050ac1e4b0a9340e9fc3d9d918799613b72f9a'/>
<id>e0050ac1e4b0a9340e9fc3d9d918799613b72f9a</id>
<content type='text'>
commit 56f91aad69444d650237295f68c195b74d888d95 upstream.

If the length of data to be read in readpage() is exactly
PAGE_CACHE_SIZE, the original code does not flush d-cache
for data consistency after finishing reading. This patches fixes
this.

Signed-off-by: Li Wang &lt;liwang@ubuntukylin.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 56f91aad69444d650237295f68c195b74d888d95 upstream.

If the length of data to be read in readpage() is exactly
PAGE_CACHE_SIZE, the original code does not flush d-cache
for data consistency after finishing reading. This patches fixes
this.

Signed-off-by: Li Wang &lt;liwang@ubuntukylin.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: wake up 'safe' waiters when unregistering request</title>
<updated>2014-01-08T17:42:10+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2013-10-31T01:10:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bb4e90afb0b16996ff921762ab4212db0f40535f'/>
<id>bb4e90afb0b16996ff921762ab4212db0f40535f</id>
<content type='text'>
commit fc55d2c9448b34218ca58733a6f51fbede09575b upstream.

We also need to wake up 'safe' waiters if error occurs or request
aborted. Otherwise sync(2)/fsync(2) may hang forever.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit fc55d2c9448b34218ca58733a6f51fbede09575b upstream.

We also need to wake up 'safe' waiters if error occurs or request
aborted. Otherwise sync(2)/fsync(2) may hang forever.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: cleanup aborted requests when re-sending requests.</title>
<updated>2014-01-08T17:42:10+00:00</updated>
<author>
<name>Yan, Zheng</name>
<email>zheng.z.yan@intel.com</email>
</author>
<published>2013-09-26T06:25:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1461d4c794c2f98ebdd9d60f47a08c09dc4ce453'/>
<id>1461d4c794c2f98ebdd9d60f47a08c09dc4ce453</id>
<content type='text'>
commit eb1b8af33c2e42a9a57fc0a7588f4a7b255d2e79 upstream.

Aborted requests usually get cleared when the reply is received.
If MDS crashes, no reply will be received. So we need to cleanup
aborted requests when re-sending requests.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Greg Farnum &lt;greg@inktank.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit eb1b8af33c2e42a9a57fc0a7588f4a7b255d2e79 upstream.

Aborted requests usually get cleared when the reply is received.
If MDS crashes, no reply will be received. So we need to cleanup
aborted requests when re-sending requests.

Signed-off-by: Yan, Zheng &lt;zheng.z.yan@intel.com&gt;
Reviewed-by: Greg Farnum &lt;greg@inktank.com&gt;
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: fix statvfs fr_size</title>
<updated>2013-06-20T18:58:47+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-02-22T23:31:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=999899ad5860fb34e679ca2a853074e54fda9d93'/>
<id>999899ad5860fb34e679ca2a853074e54fda9d93</id>
<content type='text'>
commit 92a49fb0f79f3300e6e50ddf56238e70678e4202 upstream.

Different versions of glibc are broken in different ways, but the short of
it is that for the time being, frsize should == bsize, and be used as the
multiple for the blocks, free, and available fields.  This mirrors what is
done for NFS.  The previous reporting of the page size for frsize meant
that newer glibc and df would report a very small value for the fs size.

Fixes http://tracker.ceph.com/issues/3793.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Greg Farnum &lt;greg@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 92a49fb0f79f3300e6e50ddf56238e70678e4202 upstream.

Different versions of glibc are broken in different ways, but the short of
it is that for the time being, frsize should == bsize, and be used as the
multiple for the blocks, free, and available fields.  This mirrors what is
done for NFS.  The previous reporting of the page size for frsize meant
that newer glibc and df would report a very small value for the fs size.

Fixes http://tracker.ceph.com/issues/3793.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Greg Farnum &lt;greg@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: wrap auth ops in wrapper functions</title>
<updated>2013-06-20T18:58:47+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-03-25T17:26:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aa80dd9dbe86743ae6e52c836f6ab1472c469927'/>
<id>aa80dd9dbe86743ae6e52c836f6ab1472c469927</id>
<content type='text'>
commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream.

Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks.  Simplifies the external
interface.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 27859f9773e4a0b2042435b13400ee2c891a61f4 upstream.

Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks.  Simplifies the external
interface.

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: add update_authorizer auth method</title>
<updated>2013-06-20T18:58:46+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-03-25T17:26:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=29c65a277a64645af853e8c9a9b3dda0ddc421e0'/>
<id>29c65a277a64645af853e8c9a9b3dda0ddc421e0</id>
<content type='text'>
commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream.

Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one.  In the meantime, when
we rotate our service keys, the authorizer doesn't get updated.  Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret.  If it
is not, we will build a new one that does.  This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

	http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0bed9b5c523d577378b6f83eab5835fe30c27208 upstream.

Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one.  In the meantime, when
we rotate our service keys, the authorizer doesn't get updated.  Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret.  If it
is not, we will build a new one that does.  This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

	http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: ceph_pagelist_append might sleep while atomic</title>
<updated>2013-06-20T18:58:43+00:00</updated>
<author>
<name>Jim Schutt</name>
<email>jaschut@sandia.gov</email>
</author>
<published>2013-05-15T18:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c7ed0848c61d75c58f1c5b13c308f7fcaf94ed89'/>
<id>c7ed0848c61d75c58f1c5b13c308f7fcaf94ed89</id>
<content type='text'>
commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream.

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 upstream.

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [&lt;ffffffff81076f4c&gt;] __might_sleep+0xfc/0x110
[13439.384353]  [&lt;ffffffffa03f4ce0&gt;] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [&lt;ffffffffa0448fe9&gt;] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [&lt;ffffffff814ee849&gt;] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [&lt;ffffffff811cadf5&gt;] ? lock_flocks+0x15/0x20
[13439.409277]  [&lt;ffffffffa045e2af&gt;] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [&lt;ffffffff81196748&gt;] ? igrab+0x28/0x70
[13439.420610]  [&lt;ffffffffa045e9f8&gt;] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [&lt;ffffffffa045ea25&gt;] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [&lt;ffffffffa045de90&gt;] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [&lt;ffffffffa0462888&gt;] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [&lt;ffffffffa0464542&gt;] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [&lt;ffffffffa0462e42&gt;] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [&lt;ffffffffa04631ad&gt;] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [&lt;ffffffff814ebc7e&gt;] ? mutex_unlock+0xe/0x10
[13439.473934]  [&lt;ffffffffa043c612&gt;] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [&lt;ffffffffa03f6c2c&gt;] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [&lt;ffffffffa03eec3d&gt;] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [&lt;ffffffffa03f1498&gt;] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Signed-off-by: Jim Schutt &lt;jaschut@sandia.gov&gt;
Reviewed-by: Alex Elder &lt;elder@inktank.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
