<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/ceph/osd_client.c, branch linux-3.2.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>libceph: fix double __remove_osd() problem</title>
<updated>2015-05-09T22:16:19+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-02-17T16:37:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=45ee68db2cb25a673859140fa315d9f7e13cd461'/>
<id>45ee68db2cb25a673859140fa315d9f7e13cd461</id>
<content type='text'>
commit 7eb71e0351fbb1b242ae70abb7bb17107fe2f792 upstream.

It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

            &lt;osd3 is idle, on the osd lru list&gt;
&lt;con reset - osd3&gt;
con_fault_finish()
  osd_reset()
                              &lt;osdmap - osd3 down&gt;
                              ceph_osdc_handle_map()
                                &lt;takes map_sem&gt;
                                kick_requests()
                                  &lt;takes request_mutex&gt;
                                  reset_changed_osds()
                                    __reset_osd()
                                      __remove_osd()
                                  &lt;releases request_mutex&gt;
                                &lt;releases map_sem&gt;
    &lt;takes map_sem&gt;
    &lt;takes request_mutex&gt;
    __kick_osd_requests()
      __reset_osd()
        __remove_osd() &lt;-- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil &lt;sage@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7eb71e0351fbb1b242ae70abb7bb17107fe2f792 upstream.

It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

            &lt;osd3 is idle, on the osd lru list&gt;
&lt;con reset - osd3&gt;
con_fault_finish()
  osd_reset()
                              &lt;osdmap - osd3 down&gt;
                              ceph_osdc_handle_map()
                                &lt;takes map_sem&gt;
                                kick_requests()
                                  &lt;takes request_mutex&gt;
                                  reset_changed_osds()
                                    __reset_osd()
                                      __remove_osd()
                                  &lt;releases request_mutex&gt;
                                &lt;releases map_sem&gt;
    &lt;takes map_sem&gt;
    &lt;takes request_mutex&gt;
    __kick_osd_requests()
      __reset_osd()
        __remove_osd() &lt;-- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil &lt;sage@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
Reviewed-by: Alex Elder &lt;elder@linaro.org&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: unregister request in __map_request failed and nofail == false</title>
<updated>2013-10-26T20:05:58+00:00</updated>
<author>
<name>majianpeng</name>
<email>majianpeng@gmail.com</email>
</author>
<published>2013-07-16T07:45:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=be47dfad8e39e8b849fd0487292b3998284ddbc2'/>
<id>be47dfad8e39e8b849fd0487292b3998284ddbc2</id>
<content type='text'>
commit 73d9f7eef3d98c3920e144797cc1894c6b005a1e upstream.

For nofail == false request, if __map_request failed, the caller does
cleanup work, like releasing the relative pages.  It doesn't make any sense
to retry this request.

Signed-off-by: Jianpeng Ma &lt;majianpeng@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
[bwh: Backported to 3.2: adjust indentation]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 73d9f7eef3d98c3920e144797cc1894c6b005a1e upstream.

For nofail == false request, if __map_request failed, the caller does
cleanup work, like releasing the relative pages.  It doesn't make any sense
to retry this request.

Signed-off-by: Jianpeng Ma &lt;majianpeng@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
[bwh: Backported to 3.2: adjust indentation]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: Allocate larger oid buffer in request msgs</title>
<updated>2011-11-11T17:50:19+00:00</updated>
<author>
<name>Stratos Psomadakis</name>
<email>psomas@grnet.gr</email>
</author>
<published>2011-11-10T13:45:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=224736d9113ab4a7cf3f05c05377492bd99b4b02'/>
<id>224736d9113ab4a7cf3f05c05377492bd99b4b02</id>
<content type='text'>
ceph_osd_request struct allocates a 40-byte buffer for object names.
RBD image names can be up to 96 chars long (100 with the .rbd suffix),
which results in the object name for the image being truncated, and a
subsequent map failure.

Increase the oid buffer in request messages, in order to avoid the
truncation.

Signed-off-by: Stratos Psomadakis &lt;psomas@grnet.gr&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ceph_osd_request struct allocates a 40-byte buffer for object names.
RBD image names can be up to 96 chars long (100 with the .rbd suffix),
which results in the object name for the image being truncated, and a
subsequent map failure.

Increase the oid buffer in request messages, in order to avoid the
truncation.

Signed-off-by: Stratos Psomadakis &lt;psomas@grnet.gr&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: force resend of osd requests if we skip an osdmap</title>
<updated>2011-10-25T23:10:17+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-10-14T20:33:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=38d6453ca39ecce442628a93e1bb46d70d547512'/>
<id>38d6453ca39ecce442628a93e1bb46d70d547512</id>
<content type='text'>
If we skip over one or more map epochs, we need to resend all osd requests
because it is possible they remapped to other servers and then back.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we skip over one or more map epochs, we need to resend all osd requests
because it is possible they remapped to other servers and then back.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: don't complain on msgpool alloc failures</title>
<updated>2011-10-25T23:10:15+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-08-09T22:03:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b61c27636fffbaf1980e675282777b9467254a40'/>
<id>b61c27636fffbaf1980e675282777b9467254a40</id>
<content type='text'>
The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: fix linger request requeuing</title>
<updated>2011-09-16T18:13:17+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-09-16T18:13:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=935b639a049053d0ccbcf7422f2f9cd221642f58'/>
<id>935b639a049053d0ccbcf7422f2f9cd221642f58</id>
<content type='text'>
The r_req_lru_item list node moves between several lists, and that cycle
is not directly related (and does not begin) with __register_request().
Initialize it in the request constructor, not __register_request(). This
fixes later badness (below) when OSDs restart underneath an rbd mount.

Crashes we've seen due to this include:

[  213.974288] kernel BUG at net/ceph/messenger.c:2193!

and

[  144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  144.035278] IP: [&lt;ffffffffa036c053&gt;] con_work+0x1463/0x2ce0 [libceph]

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The r_req_lru_item list node moves between several lists, and that cycle
is not directly related (and does not begin) with __register_request().
Initialize it in the request constructor, not __register_request(). This
fixes later badness (below) when OSDs restart underneath an rbd mount.

Crashes we've seen due to this include:

[  213.974288] kernel BUG at net/ceph/messenger.c:2193!

and

[  144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  144.035278] IP: [&lt;ffffffffa036c053&gt;] con_work+0x1463/0x2ce0 [libceph]

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: fix leak of osd structs during shutdown</title>
<updated>2011-08-31T22:22:46+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-08-31T21:45:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aca420bc51f48b0701963ba3a6234442a0cabebd'/>
<id>aca420bc51f48b0701963ba3a6234442a0cabebd</id>
<content type='text'>
We want to remove all OSDs, not just those on the idle LRU.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We want to remove all OSDs, not just those on the idle LRU.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: don't time out osd requests that haven't been received</title>
<updated>2011-07-26T18:27:24+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-07-26T18:27:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4cf9d544631c92809cb94ea680c71df56e9437aa'/>
<id>4cf9d544631c92809cb94ea680c71df56e9437aa</id>
<content type='text'>
Keep track of when an outgoing message is ACKed (i.e., the server fully
received it and, presumably, queued it for processing).  Time out OSD
requests only if it's been too long since they've been received.

This prevents timeouts and connection thrashing when the OSDs are simply
busy and are throttling the requests they read off the network.

Reviewed-by: Yehuda Sadeh &lt;yehuda@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Keep track of when an outgoing message is ACKed (i.e., the server fully
received it and, presumably, queued it for processing).  Time out OSD
requests only if it's been too long since they've been received.

This prevents timeouts and connection thrashing when the OSDs are simply
busy and are throttling the requests they read off the network.

Reviewed-by: Yehuda Sadeh &lt;yehuda@hq.newdream.net&gt;
Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: fix page calculation for non-page-aligned io</title>
<updated>2011-06-13T23:26:17+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-06-13T23:20:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9bb0ce2b0b734f3325ea5cd6b351856eeac94f78'/>
<id>9bb0ce2b0b734f3325ea5cd6b351856eeac94f78</id>
<content type='text'>
Set the page count correctly for non-page-aligned IO.  We were already
doing this correctly for alignment, but not the page count.  Fixes
DIRECT_IO writes from unaligned pages.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Set the page count correctly for non-page-aligned IO.  We were already
doing this correctly for alignment, but not the page count.  Fixes
DIRECT_IO writes from unaligned pages.

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: fix sync vs canceled write</title>
<updated>2011-06-08T04:34:13+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@newdream.net</email>
</author>
<published>2011-06-03T16:37:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2584547230ae49b8de91ab3bb5e0a81898905b45'/>
<id>2584547230ae49b8de91ab3bb5e0a81898905b45</id>
<content type='text'>
If we cancel a write, trigger the safe completions to prevent a sync from
blocking indefinitely in ceph_osdc_sync().

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we cancel a write, trigger the safe completions to prevent a sync from
blocking indefinitely in ceph_osdc_sync().

Signed-off-by: Sage Weil &lt;sage@newdream.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
