<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/block/rbd.c, branch v5.3</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>rbd: restore zeroing past the overlap when reading from parent</title>
<updated>2019-08-28T10:34:11+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-08-27T14:45:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d435c9a7b85be1e820668d2f3718c2d9f24d5548'/>
<id>d435c9a7b85be1e820668d2f3718c2d9f24d5548</id>
<content type='text'>
The parent image is read only up to the overlap point, the rest of
the buffer should be zeroed.  This snuck in because as it turns out
the overlap test case has not been triggering this code path for
a while now.

Fixes: a9b67e69949d ("rbd: replace obj_req-&gt;tried_parent with obj_req-&gt;read_state")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The parent image is read only up to the overlap point, the rest of
the buffer should be zeroed.  This snuck in because as it turns out
the overlap test case has not been triggering this code path for
a while now.

Fixes: a9b67e69949d ("rbd: replace obj_req-&gt;tried_parent with obj_req-&gt;read_state")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jason Dillaman &lt;dillaman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client</title>
<updated>2019-07-18T18:05:25+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-18T18:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d9b9c893048e9d308a833619f0866f1f52778cf5'/>
<id>d9b9c893048e9d308a833619f0866f1f52778cf5</id>
<content type='text'>
Pull ceph updates from Ilya Dryomov:
 "Lots of exciting things this time!

   - support for rbd object-map and fast-diff features (myself). This
     will speed up reads, discards and things like snap diffs on sparse
     images.

   - ceph.snap.btime vxattr to expose snapshot creation time (David
     Disseldorp). This will be used to integrate with "Restore Previous
     Versions" feature added in Windows 7 for folks who reexport ceph
     through SMB.

   - security xattrs for ceph (Zheng Yan). Only selinux is supported for
     now due to the limitations of -&gt;dentry_init_security().

   - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
     Layton). This is actually a single feature bit which was missing
     because of the filesystem pieces. With this in, the kernel client
     will finally be reported as "luminous" by "ceph features" -- it is
     still being reported as "jewel" even though all required Luminous
     features were implemented in 4.13.

   - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
     with xattrs is to not terminate and this was causing
     inconsistencies with ceph-fuse.

   - change filesystem time granularity from 1 us to 1 ns, again fixing
     an inconsistency with ceph-fuse (Luis Henriques).

  On top of this there are some additional dentry name handling and cap
  flushing fixes from Zheng. Finally, Jeff is formally taking over for
  Zheng as the filesystem maintainer"

* tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
  ceph: fix end offset in truncate_inode_pages_range call
  ceph: use generic_delete_inode() for -&gt;drop_inode
  ceph: use ceph_evict_inode to cleanup inode's resource
  ceph: initialize superblock s_time_gran to 1
  MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
  rbd: setallochint only if object doesn't exist
  rbd: support for object-map and fast-diff
  rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
  libceph: export osd_req_op_data() macro
  libceph: change ceph_osdc_call() to take page vector for response
  libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
  rbd: new exclusive lock wait/wake code
  rbd: quiescing lock should wait for image requests
  rbd: lock should be quiesced on reacquire
  rbd: introduce copyup state machine
  rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
  rbd: move OSD request allocation into object request state machines
  rbd: factor out __rbd_osd_setup_discard_ops()
  rbd: factor out rbd_osd_setup_copyup()
  rbd: introduce obj_req-&gt;osd_reqs list
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull ceph updates from Ilya Dryomov:
 "Lots of exciting things this time!

   - support for rbd object-map and fast-diff features (myself). This
     will speed up reads, discards and things like snap diffs on sparse
     images.

   - ceph.snap.btime vxattr to expose snapshot creation time (David
     Disseldorp). This will be used to integrate with "Restore Previous
     Versions" feature added in Windows 7 for folks who reexport ceph
     through SMB.

   - security xattrs for ceph (Zheng Yan). Only selinux is supported for
     now due to the limitations of -&gt;dentry_init_security().

   - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
     Layton). This is actually a single feature bit which was missing
     because of the filesystem pieces. With this in, the kernel client
     will finally be reported as "luminous" by "ceph features" -- it is
     still being reported as "jewel" even though all required Luminous
     features were implemented in 4.13.

   - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
     with xattrs is to not terminate and this was causing
     inconsistencies with ceph-fuse.

   - change filesystem time granularity from 1 us to 1 ns, again fixing
     an inconsistency with ceph-fuse (Luis Henriques).

  On top of this there are some additional dentry name handling and cap
  flushing fixes from Zheng. Finally, Jeff is formally taking over for
  Zheng as the filesystem maintainer"

* tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
  ceph: fix end offset in truncate_inode_pages_range call
  ceph: use generic_delete_inode() for -&gt;drop_inode
  ceph: use ceph_evict_inode to cleanup inode's resource
  ceph: initialize superblock s_time_gran to 1
  MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
  rbd: setallochint only if object doesn't exist
  rbd: support for object-map and fast-diff
  rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
  libceph: export osd_req_op_data() macro
  libceph: change ceph_osdc_call() to take page vector for response
  libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
  rbd: new exclusive lock wait/wake code
  rbd: quiescing lock should wait for image requests
  rbd: lock should be quiesced on reacquire
  rbd: introduce copyup state machine
  rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
  rbd: move OSD request allocation into object request state machines
  rbd: factor out __rbd_osd_setup_discard_ops()
  rbd: factor out rbd_osd_setup_copyup()
  rbd: introduce obj_req-&gt;osd_reqs list
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: setallochint only if object doesn't exist</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-06-19T13:45:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8b5bec5c83e8e319d5ecb4efcbf04bdef85409e6'/>
<id>8b5bec5c83e8e319d5ecb4efcbf04bdef85409e6</id>
<content type='text'>
setallochint is really only useful on object creation.  Continue
hinting unconditionally if object map cannot be used.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
setallochint is really only useful on object creation.  Continue
hinting unconditionally if object map cannot be used.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: support for object-map and fast-diff</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-06-05T17:25:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=22e8bd51bb0469d1a524130a057f894ff632376a'/>
<id>22e8bd51bb0469d1a524130a057f894ff632376a</id>
<content type='text'>
Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST
and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map.

Invalid object maps are not trusted, but still updated.  Note that we
never iterate, resize or invalidate object maps.  If object-map feature
is enabled but object map fails to load, we just fail the requester
(either "rbd map" or I/O, by way of post-acquire action).

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST
and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map.

Invalid object maps are not trusted, but still updated.  Note that we
never iterate, resize or invalidate object maps.  If object-map feature
is enabled but object map fails to load, we just fail the requester
(either "rbd map" or I/O, by way of post-acquire action).

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-06-17T13:29:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=da5ef6be3467eb2d293790dea69b5b562490715a'/>
<id>da5ef6be3467eb2d293790dea69b5b562490715a</id>
<content type='text'>
Snapshot object map will be loaded in rbd_dev_image_probe(), so we need
to know snapshot's size (as opposed to HEAD's size) sooner.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Snapshot object map will be loaded in rbd_dev_image_probe(), so we need
to know snapshot's size (as opposed to HEAD's size) sooner.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: change ceph_osdc_call() to take page vector for response</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-06-14T16:16:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=68ada915eea10f36760ffe414810390a104df093'/>
<id>68ada915eea10f36760ffe414810390a104df093</id>
<content type='text'>
This will be used for loading object map.  rbd_obj_read_sync() isn't
suitable because object map must be accessed through class methods.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This will be used for loading object map.  rbd_obj_read_sync() isn't
suitable because object map must be accessed through class methods.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: new exclusive lock wait/wake code</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-06-06T15:14:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=637cd060537d0c40bcf4f164b8a2c6e9b747e1ad'/>
<id>637cd060537d0c40bcf4f164b8a2c6e9b747e1ad</id>
<content type='text'>
rbd_wait_state_locked() is built around rbd_dev-&gt;lock_waitq and blocks
rbd worker threads while waiting for the lock, potentially impacting
other rbd devices.  There is no good way to pass an error code into
image request state machines when acquisition fails, hence the use of
RBD_DEV_FLAG_BLACKLISTED for everything and various other issues.

Introduce rbd_dev-&gt;acquiring_list and move acquisition into image
request state machine.  Use rbd_img_schedule() for kicking and passing
error codes.  No blocking occurs while waiting for the lock, but
rbd_dev-&gt;lock_rwsem is still held across lock, unlock and set_cookie
calls.

Always acquire the lock on "rbd map" to avoid associating the latency
of acquiring the lock with the first I/O request.

A slight regression is that lock_timeout is now respected only if lock
acquisition is triggered by "rbd map" and not by I/O.  This is somewhat
compensated by the fact that we no longer block if the peer refuses to
release lock -- I/O is failed with EROFS right away.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rbd_wait_state_locked() is built around rbd_dev-&gt;lock_waitq and blocks
rbd worker threads while waiting for the lock, potentially impacting
other rbd devices.  There is no good way to pass an error code into
image request state machines when acquisition fails, hence the use of
RBD_DEV_FLAG_BLACKLISTED for everything and various other issues.

Introduce rbd_dev-&gt;acquiring_list and move acquisition into image
request state machine.  Use rbd_img_schedule() for kicking and passing
error codes.  No blocking occurs while waiting for the lock, but
rbd_dev-&gt;lock_rwsem is still held across lock, unlock and set_cookie
calls.

Always acquire the lock on "rbd map" to avoid associating the latency
of acquiring the lock with the first I/O request.

A slight regression is that lock_timeout is now respected only if lock
acquisition is triggered by "rbd map" and not by I/O.  This is somewhat
compensated by the fact that we no longer block if the peer refuses to
release lock -- I/O is failed with EROFS right away.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: quiescing lock should wait for image requests</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-05-30T14:07:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e1fddc8fdd22ed5a55fc7e7a81437c4663c7ba8c'/>
<id>e1fddc8fdd22ed5a55fc7e7a81437c4663c7ba8c</id>
<content type='text'>
Syncing OSD requests doesn't really work.  A single image request may
be comprised of multiple object requests, each of which can go through
a series of OSD requests (original, copyups, etc).  On top of that, the
OSD cliest may be shared with other rbd devices.

What we want is to ensure that all in-flight image requests complete.
Introduce rbd_dev-&gt;running_list and block in RBD_LOCK_STATE_RELEASING
until that happens.  New OSD requests may be started during this time.

Note that __rbd_img_handle_request() acquires rbd_dev-&gt;lock_rwsem only
if need_exclusive_lock() returns true.  This avoids a deadlock similar
to the one outlined in the previous commit between unlock and I/O that
doesn't require lock, such as a read with object-map feature disabled.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Syncing OSD requests doesn't really work.  A single image request may
be comprised of multiple object requests, each of which can go through
a series of OSD requests (original, copyups, etc).  On top of that, the
OSD cliest may be shared with other rbd devices.

What we want is to ensure that all in-flight image requests complete.
Introduce rbd_dev-&gt;running_list and block in RBD_LOCK_STATE_RELEASING
until that happens.  New OSD requests may be started during this time.

Note that __rbd_img_handle_request() acquires rbd_dev-&gt;lock_rwsem only
if need_exclusive_lock() returns true.  This avoids a deadlock similar
to the one outlined in the previous commit between unlock and I/O that
doesn't require lock, such as a read with object-map feature disabled.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: lock should be quiesced on reacquire</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-05-30T09:15:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a2b1da09793d003410b57f96eaf7e83e43b7a50a'/>
<id>a2b1da09793d003410b57f96eaf7e83e43b7a50a</id>
<content type='text'>
Quiesce exclusive lock at the top of rbd_reacquire_lock() instead
of only when ceph_cls_set_cookie() fails.  This avoids a deadlock on
rbd_dev-&gt;lock_rwsem.

If rbd_dev-&gt;lock_rwsem is needed for I/O completion, set_cookie can
hang ceph-msgr worker thread if set_cookie reply ends up behind an I/O
reply, because, like lock and unlock requests, set_cookie is sent and
waited upon with rbd_dev-&gt;lock_rwsem held for write.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Quiesce exclusive lock at the top of rbd_reacquire_lock() instead
of only when ceph_cls_set_cookie() fails.  This avoids a deadlock on
rbd_dev-&gt;lock_rwsem.

If rbd_dev-&gt;lock_rwsem is needed for I/O completion, set_cookie can
hang ceph-msgr worker thread if set_cookie reply ends up behind an I/O
reply, because, like lock and unlock requests, set_cookie is sent and
waited upon with rbd_dev-&gt;lock_rwsem held for write.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: introduce copyup state machine</title>
<updated>2019-07-08T12:01:45+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-06-13T15:44:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=793333a303c90174c317e3fa12e898bbc02daee4'/>
<id>793333a303c90174c317e3fa12e898bbc02daee4</id>
<content type='text'>
Both write and copyup paths will get more complex with object map.
Factor copyup code out into a separate state machine.

While at it, take advantage of obj_req-&gt;osd_reqs list and issue empty
and current snapc OSD requests together, one after another.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both write and copyup paths will get more complex with object map.
Factor copyup code out into a separate state machine.

While at it, take advantage of obj_req-&gt;osd_reqs list and issue empty
and current snapc OSD requests together, one after another.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
</feed>
