<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/block/rbd.c, branch v5.8</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>libceph: move away from global osd_req_flags</title>
<updated>2020-06-16T14:01:53+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-06-04T09:12:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=22d2cfdffa5bff3566e16cb7320e13ceb814674b'/>
<id>22d2cfdffa5bff3566e16cb7320e13ceb814674b</id>
<content type='text'>
osd_req_flags is overly general and doesn't suit its only user
(read_from_replica option) well:

- applying osd_req_flags in account_request() affects all OSD
  requests, including linger (i.e. watch and notify).  However,
  linger requests should always go to the primary even though
  some of them are reads (e.g. notify has side effects but it
  is a read because it doesn't result in mutation on the OSDs).

- calls to class methods that are reads are allowed to go to
  the replica, but most such calls issued for "rbd map" and/or
  exclusive lock transitions are requested to be resent to the
  primary via EAGAIN, doubling the latency.

Get rid of global osd_req_flags and set read_from_replica flag
only on specific OSD requests instead.

Fixes: 8ad44d5e0d1e ("libceph: read_from_replica option")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
osd_req_flags is overly general and doesn't suit its only user
(read_from_replica option) well:

- applying osd_req_flags in account_request() affects all OSD
  requests, including linger (i.e. watch and notify).  However,
  linger requests should always go to the primary even though
  some of them are reads (e.g. notify has side effects but it
  is a read because it doesn't result in mutation on the OSDs).

- calls to class methods that are reads are allowed to go to
  the replica, but most such calls issued for "rbd map" and/or
  exclusive lock transitions are requested to be resent to the
  primary via EAGAIN, doubling the latency.

Get rid of global osd_req_flags and set read_from_replica flag
only on specific OSD requests instead.

Fixes: 8ad44d5e0d1e ("libceph: read_from_replica option")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: compression_hint option</title>
<updated>2020-06-01T21:32:35+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-05-29T18:51:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dc1dad8e1a612650b1e786e992cb0c6e101e226a'/>
<id>dc1dad8e1a612650b1e786e992cb0c6e101e226a</id>
<content type='text'>
Allow hinting to bluestore if the data should/should not be compressed.
The default is to not hint (compression_hint=none).

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allow hinting to bluestore if the data should/should not be compressed.
The default is to not hint (compression_hint=none).

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: support for alloc hint flags</title>
<updated>2020-06-01T21:32:35+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-05-29T18:31:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d3798acc094c8ff2406e9acc7a9b2c09da994616'/>
<id>d3798acc094c8ff2406e9acc7a9b2c09da994616</id>
<content type='text'>
Allow indicating future I/O pattern via flags.  This is supported since
Kraken (and bluestore persists flags together with expected_object_size
and expected_write_size).

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allow indicating future I/O pattern via flags.  This is supported since
Kraken (and bluestore persists flags together with expected_object_size
and expected_write_size).

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: don't mess with a page vector in rbd_notify_op_lock()</title>
<updated>2020-04-13T06:55:49+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-03-17T14:18:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8ae0299a4b72f2f9ad2b755da91c6a2beabaee62'/>
<id>8ae0299a4b72f2f9ad2b755da91c6a2beabaee62</id>
<content type='text'>
rbd_notify_op_lock() isn't interested in a notify reply.  Instead of
accepting that page vector just to free it, have watch-notify code take
care of it.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rbd_notify_op_lock() isn't interested in a notify reply.  Instead of
accepting that page vector just to free it, have watch-notify code take
care of it.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: don't test rbd_dev-&gt;opts in rbd_dev_image_release()</title>
<updated>2020-04-13T06:55:49+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-03-16T16:16:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b8776051529230f76e464d5ffc5d1cf8465576bf'/>
<id>b8776051529230f76e464d5ffc5d1cf8465576bf</id>
<content type='text'>
rbd_dev-&gt;opts is used to distinguish between the image that is being
mapped and a parent.  However, because we no longer establish watch for
read-only mappings, this test is imprecise and results in unnecessary
rbd_unregister_watch() calls.

Make it consistent with need_watch in rbd_dev_image_probe().

Fixes: b9ef2b8858a0 ("rbd: don't establish watch for read-only mappings")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rbd_dev-&gt;opts is used to distinguish between the image that is being
mapped and a parent.  However, because we no longer establish watch for
read-only mappings, this test is imprecise and results in unnecessary
rbd_unregister_watch() calls.

Make it consistent with need_watch in rbd_dev_image_probe().

Fixes: b9ef2b8858a0 ("rbd: don't establish watch for read-only mappings")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: call rbd_dev_unprobe() after unwatching and flushing notifies</title>
<updated>2020-04-13T06:55:49+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-03-16T14:52:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=952c48b0ed18919bff7528501e9a3fff8a24f8cd'/>
<id>952c48b0ed18919bff7528501e9a3fff8a24f8cd</id>
<content type='text'>
rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(),
including rbd_dev_header_info(), which means that rbd_dev_header_info()
isn't supposed to be called after rbd_dev_unprobe().

However, rbd_dev_image_release() calls rbd_dev_unprobe() before
rbd_unregister_watch().  This is racy because a header update notify
can sneak in:

  "rbd unmap" thread                   ceph-watch-notify worker

  rbd_dev_image_release()
    rbd_dev_unprobe()
      free and zero out header
                                       rbd_watch_cb()
                                         rbd_dev_refresh()
                                           rbd_dev_header_info()
                                             read in header

The same goes for "rbd map" because rbd_dev_image_probe() calls
rbd_dev_unprobe() on errors.  In both cases this results in a memory
leak.

Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(),
including rbd_dev_header_info(), which means that rbd_dev_header_info()
isn't supposed to be called after rbd_dev_unprobe().

However, rbd_dev_image_release() calls rbd_dev_unprobe() before
rbd_unregister_watch().  This is racy because a header update notify
can sneak in:

  "rbd unmap" thread                   ceph-watch-notify worker

  rbd_dev_image_release()
    rbd_dev_unprobe()
      free and zero out header
                                       rbd_watch_cb()
                                         rbd_dev_refresh()
                                           rbd_dev_header_info()
                                             read in header

The same goes for "rbd map" because rbd_dev_image_probe() calls
rbd_dev_unprobe() on errors.  In both cases this results in a memory
leak.

Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: avoid a deadlock on header_rwsem when flushing notifies</title>
<updated>2020-04-13T06:55:49+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-03-13T10:20:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0e4e1de5b63fa423b13593337a27fd2d2b0bcf77'/>
<id>0e4e1de5b63fa423b13593337a27fd2d2b0bcf77</id>
<content type='text'>
rbd_unregister_watch() flushes notifies and therefore cannot be called
under header_rwsem because a header update notify takes header_rwsem to
synchronize with "rbd map".  If mapping an image fails after the watch
is established and a header update notify sneaks in, we deadlock when
erroring out from rbd_dev_image_probe().

Move watch registration and unregistration out of the critical section.
The only reason they were put there was to make header_rwsem management
slightly more obvious.

Fixes: 811c66887746 ("rbd: fix rbd map vs notify races")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rbd_unregister_watch() flushes notifies and therefore cannot be called
under header_rwsem because a header update notify takes header_rwsem to
synchronize with "rbd map".  If mapping an image fails after the watch
is established and a header update notify sneaks in, we deadlock when
erroring out from rbd_dev_image_probe().

Move watch registration and unregistration out of the critical section.
The only reason they were put there was to make header_rwsem management
slightly more obvious.

Fixes: 811c66887746 ("rbd: fix rbd map vs notify races")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: enable multiple blk-mq queues</title>
<updated>2020-03-30T10:42:40+00:00</updated>
<author>
<name>Hannes Reinecke</name>
<email>hare@suse.de</email>
</author>
<published>2020-01-31T10:37:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f9b6b98d24f7cec5b8269217f9d4fdec1ca43218'/>
<id>f9b6b98d24f7cec5b8269217f9d4fdec1ca43218</id>
<content type='text'>
Allocate one queue per CPU and get a performance boost from
higher parallelism.

Signed-off-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allocate one queue per CPU and get a performance boost from
higher parallelism.

Signed-off-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: embed image request in blk-mq pdu</title>
<updated>2020-03-30T10:42:40+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-02-12T14:23:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=59e542c869895fb37005b60058a342187bb63c61'/>
<id>59e542c869895fb37005b60058a342187bb63c61</id>
<content type='text'>
Avoid making allocations for !IMG_REQ_CHILD image requests.  Only
IMG_REQ_CHILD image requests need to be freed now.

Move the initial request checks to rbd_queue_rq().  Unfortunately we
can't fill the image request and kick the state machine directly from
rbd_queue_rq() because -&gt;queue_rq() isn't allowed to block.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Avoid making allocations for !IMG_REQ_CHILD image requests.  Only
IMG_REQ_CHILD image requests need to be freed now.

Move the initial request checks to rbd_queue_rq().  Unfortunately we
can't fill the image request and kick the state machine directly from
rbd_queue_rq() because -&gt;queue_rq() isn't allowed to block.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: acquire header_rwsem just once in rbd_queue_workfn()</title>
<updated>2020-03-30T10:42:40+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2020-02-12T14:08:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a52cc685753568e5bcbe762586c2bfbe7175255e'/>
<id>a52cc685753568e5bcbe762586c2bfbe7175255e</id>
<content type='text'>
Currently header_rwsem is acquired twice: once in rbd_dev_parent_get()
when the image request is being created and then in rbd_queue_workfn()
to capture mapping_size and snapc.  Introduce rbd_img_capture_header()
and move image request allocation so that header_rwsem can be acquired
just once.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently header_rwsem is acquired twice: once in rbd_dev_parent_get()
when the image request is being created and then in rbd_queue_workfn()
to capture mapping_size and snapc.  Introduce rbd_img_capture_header()
and move image request allocation so that header_rwsem can be acquired
just once.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
