<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/ceph/osdmap.c, branch linux-3.14.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>libceph: apply new_state before new_up_client on incrementals</title>
<updated>2016-08-10T08:21:04+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-07-19T01:50:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6b96b2d473701b45df3fea8dd9796b6ec39e6d54'/>
<id>6b96b2d473701b45df3fea8dd9796b6ec39e6d54</id>
<content type='text'>
commit 930c532869774ebf8af9efe9484c597f896a7d46 upstream.

Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i &lt; pg-&gt;pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg-&gt;pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp-&gt;osds[temp-&gt;size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Josh Durgin &lt;jdurgin@redhat.com&gt;
[idryomov@gmail.com: backport to 3.10-3.14: strip primary-affinity]
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 930c532869774ebf8af9efe9484c597f896a7d46 upstream.

Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i &lt; pg-&gt;pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg-&gt;pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp-&gt;osds[temp-&gt;size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Josh Durgin &lt;jdurgin@redhat.com&gt;
[idryomov@gmail.com: backport to 3.10-3.14: strip primary-affinity]
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>crush: fix a bug in tree bucket decode</title>
<updated>2015-08-03T16:29:57+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-06-29T16:30:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c9876d7419edb206d58db51440d616148b4d06c2'/>
<id>c9876d7419edb206d58db51440d616148b4d06c2</id>
<content type='text'>
commit 82cd003a77173c91b9acad8033fb7931dac8d751 upstream.

struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used.  -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews.  The actual problem (at least for
common crushmaps) isn't the u32 -&gt; u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.

Fixes: http://tracker.ceph.com/issues/2759

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Josh Durgin &lt;jdurgin@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 82cd003a77173c91b9acad8033fb7931dac8d751 upstream.

struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used.  -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews.  The actual problem (at least for
common crushmaps) isn't the u32 -&gt; u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.

Fixes: http://tracker.ceph.com/issues/2759

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Josh Durgin &lt;jdurgin@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: follow {read,write}_tier fields on osd request submission</title>
<updated>2014-01-27T21:57:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=17a13e4028e6ad7ded079cf32370c47bd0e0fc07'/>
<id>17a13e4028e6ad7ded079cf32370c47bd0e0fc07</id>
<content type='text'>
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and
write_tier for write and read+write ops (aka basic tiering support).
{read,write}_tier are part of pg_pool_t since v9.  This commit bumps
our pg_pool_t decode compat version from v7 to v9, all new fields
except for {read,write}_tier are ignored.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and
write_tier for write and read+write ops (aka basic tiering support).
{read,write}_tier are part of pg_pool_t since v9.  This commit bumps
our pg_pool_t decode compat version from v7 to v9, all new fields
except for {read,write}_tier are ignored.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: add ceph_pg_pool_by_id()</title>
<updated>2014-01-27T21:57:40+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ce7f6a2790464047199f54b66420243d433142bd'/>
<id>ce7f6a2790464047199f54b66420243d433142bd</id>
<content type='text'>
"Lookup pool info by ID" function is hidden in osdmap.c.  Expose it to
the rest of libceph.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
"Lookup pool info by ID" function is hidden in osdmap.c.  Expose it to
the rest of libceph.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()</title>
<updated>2014-01-27T21:57:32+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-01-27T15:40:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7c13cb64352230deac24d3cb058387a6c0676f83'/>
<id>7c13cb64352230deac24d3cb058387a6c0676f83</id>
<content type='text'>
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename
it to ceph_oloc_oid_to_pg() to make its purpose more clear.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename
it to ceph_oloc_oid_to_pg() to make its purpose more clear.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: eliminate CRUSH_MAX_SET result size limitation</title>
<updated>2013-12-31T18:32:14+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e8ef19c4ad161768e1d8309d5ae18481c098eb81'/>
<id>e8ef19c4ad161768e1d8309d5ae18481c098eb81</id>
<content type='text'>
This is only present to size the temporary scratch arrays that we put on
the stack.  Let the caller allocate them as they wish and remove the
limitation.

Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is only present to size the temporary scratch arrays that we put on
the stack.  Let the caller allocate them as they wish and remove the
limitation.

Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: pass weight vector size to map function</title>
<updated>2013-12-31T18:32:10+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b3b33b0e43323af4fb697f4378218d3c268d02cd'/>
<id>b3b33b0e43323af4fb697f4378218d3c268d02cd</id>
<content type='text'>
Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end.  This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.

Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state.  It's
also a bit more defensive.

Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end.  This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.

Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state.  It's
also a bit more defensive.

Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc</title>
<updated>2013-09-04T05:08:10+00:00</updated>
<author>
<name>Sage Weil</name>
<email>sage@inktank.com</email>
</author>
<published>2013-08-29T00:17:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9542cf0bf9b1a3adcc2ef271edbcbdba03abf345'/>
<id>9542cf0bf9b1a3adcc2ef271edbcbdba03abf345</id>
<content type='text'>
Fix a typo that used the wrong bitmask for the pg.seed calculation.  This
is normally unnoticed because in most cases pg_num == pgp_num.  It is, however,
a bug that is easily corrected.

CC: stable@vger.kernel.org
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;alex.elder@linary.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix a typo that used the wrong bitmask for the pg.seed calculation.  This
is normally unnoticed because in most cases pg_num == pgp_num.  It is, however,
a bug that is easily corrected.

CC: stable@vger.kernel.org
Signed-off-by: Sage Weil &lt;sage@inktank.com&gt;
Reviewed-by: Alex Elder &lt;alex.elder@linary.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: define ceph_decode_pgid() only once</title>
<updated>2013-05-02T04:17:52+00:00</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-04-01T23:58:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ef4859d6479d19bcc65c3156cf3b7dd747355c29'/>
<id>ef4859d6479d19bcc65c3156cf3b7dd747355c29</id>
<content type='text'>
There are two basically identical definitions of __decode_pgid()
in libceph, one in "net/ceph/osdmap.c" and the other in
"net/ceph/osd_client.c".  Get rid of both, and instead define
a single inline version in "include/linux/ceph/osdmap.h".

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are two basically identical definitions of __decode_pgid()
in libceph, one in "net/ceph/osdmap.c" and the other in
"net/ceph/osd_client.c".  Get rid of both, and instead define
a single inline version in "include/linux/ceph/osdmap.h".

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: rename ceph_calc_object_layout()</title>
<updated>2013-05-02T04:16:17+00:00</updated>
<author>
<name>Alex Elder</name>
<email>elder@inktank.com</email>
</author>
<published>2013-03-02T00:00:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=41766f87f54cc8bef023b4b0550f48753959345a'/>
<id>41766f87f54cc8bef023b4b0550f48753959345a</id>
<content type='text'>
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.

Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.

Change the function so it takes a pool number rather than the full
file layout structure.  Only update the ceph_pg if the pool is found
in the osd map.  Get rid of few useless lines of code from the
function while there.

Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.

Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.

Change the function so it takes a pool number rather than the full
file layout structure.  Only update the ceph_pg if the pool is found
in the osd map.  Get rid of few useless lines of code from the
function while there.

Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().

Signed-off-by: Alex Elder &lt;elder@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
