<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ceph/osd_client.c, branch for-next</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ceph: allocate sparse_ext map only for sparse reads</title>
<updated>2024-12-16T22:25:44+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2024-12-07T16:33:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=18d44c5d062b97b97bb0162d9742440518958dc1'/>
<id>18d44c5d062b97b97bb0162d9742440518958dc1</id>
<content type='text'>
If mounted with sparseread option, ceph_direct_read_write() ends up
making an unnecessarily allocation for O_DIRECT writes.

Fixes: 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Alex Markuze &lt;amarkuze@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If mounted with sparseread option, ceph_direct_read_write() ends up
making an unnecessarily allocation for O_DIRECT writes.

Fixes: 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Alex Markuze &lt;amarkuze@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: Remove unused ceph_osdc_watch_check</title>
<updated>2024-11-18T16:34:35+00:00</updated>
<author>
<name>Dr. David Alan Gilbert</name>
<email>linux@treblig.org</email>
</author>
<published>2024-10-06T01:19:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=32844fd72b879d02f1f6b4394025349d31a09fd3'/>
<id>32844fd72b879d02f1f6b4394025349d31a09fd3</id>
<content type='text'>
ceph_osdc_watch_check() has been unused since it was added in commit
b07d3c4bd727 ("libceph: support for checking on status of watch")

Remove it.

Signed-off-by: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ceph_osdc_watch_check() has been unused since it was added in commit
b07d3c4bd727 ("libceph: support for checking on status of watch")

Remove it.

Signed-off-by: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: just wait for more data to be available on the socket</title>
<updated>2024-02-07T13:43:29+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-12-14T08:01:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8e46a2d068c92a905d01cbb018b00d66991585ab'/>
<id>8e46a2d068c92a905d01cbb018b00d66991585ab</id>
<content type='text'>
A short read may occur while reading the message footer from the
socket.  Later, when the socket is ready for another read, the
messenger invokes all read_partial_*() handlers, including
read_partial_sparse_msg_data().  The expectation is that
read_partial_sparse_msg_data() would bail, allowing the messenger to
invoke read_partial() for the footer and pick up where it left off.

However read_partial_sparse_msg_data() violates that and ends up
calling into the state machine in the OSD client.  The sparse-read
state machine assumes that it's a new op and interprets some piece of
the footer as the sparse-read header and returns bogus extents/data
length, etc.

To determine whether read_partial_sparse_msg_data() should bail, let's
reuse cursor-&gt;total_resid.  Because once it reaches to zero that means
all the extents and data have been successfully received in last read,
else it could break out when partially reading any of the extents and
data.  And then osd_sparse_read() could continue where it left off.

[ idryomov: changelog ]

Link: https://tracker.ceph.com/issues/63586
Fixes: d396f89db39a ("libceph: add sparse read support to msgr1")
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A short read may occur while reading the message footer from the
socket.  Later, when the socket is ready for another read, the
messenger invokes all read_partial_*() handlers, including
read_partial_sparse_msg_data().  The expectation is that
read_partial_sparse_msg_data() would bail, allowing the messenger to
invoke read_partial() for the footer and pick up where it left off.

However read_partial_sparse_msg_data() violates that and ends up
calling into the state machine in the OSD client.  The sparse-read
state machine assumes that it's a new op and interprets some piece of
the footer as the sparse-read header and returns bogus extents/data
length, etc.

To determine whether read_partial_sparse_msg_data() should bail, let's
reuse cursor-&gt;total_resid.  Because once it reaches to zero that means
all the extents and data have been successfully received in last read,
else it could break out when partially reading any of the extents and
data.  And then osd_sparse_read() could continue where it left off.

[ idryomov: changelog ]

Link: https://tracker.ceph.com/issues/63586
Fixes: d396f89db39a ("libceph: add sparse read support to msgr1")
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: fail sparse-read if the data length doesn't match</title>
<updated>2024-02-07T13:43:29+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-10-13T05:55:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cd7d469c25704d414d71bf3644f163fb74e7996b'/>
<id>cd7d469c25704d414d71bf3644f163fb74e7996b</id>
<content type='text'>
Once this happens that means there have bugs.

Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Once this happens that means there have bugs.

Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: remove MAX_EXTENTS check for sparse reads</title>
<updated>2024-01-15T14:40:50+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-11-07T01:37:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b79e4a0aa902322756ced7361a2c637d462c3c1c'/>
<id>b79e4a0aa902322756ced7361a2c637d462c3c1c</id>
<content type='text'>
There is no any limit for the extent array size and it's possible
that when reading with a large size contents the total number of
extents will exceed 4096. Then the messager will fail by reseting
the connection and keeps resending the inflight IOs infinitely.

[ idryomov: adjust error message ]

Link: https://tracker.ceph.com/issues/62081
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is no any limit for the extent array size and it's possible
that when reading with a large size contents the total number of
extents will exceed 4096. Then the messager will fail by reseting
the connection and keeps resending the inflight IOs infinitely.

[ idryomov: adjust error message ]

Link: https://tracker.ceph.com/issues/62081
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: allow ceph_osdc_new_request to accept a multi-op read</title>
<updated>2023-08-24T09:24:35+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-08-25T13:31:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e8c4c235578b4d44bd6676df3a01dce98d0f7dd'/>
<id>4e8c4c235578b4d44bd6676df3a01dce98d0f7dd</id>
<content type='text'>
Currently we have some special-casing for multi-op writes, but in the
case of a read, we can't really handle it. All of the current multi-op
callers call it with CEPH_OSD_FLAG_WRITE set.

Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set,
allocate multiple reply ops instead of multiple request ops. If neither
flag is set, return -EINVAL.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently we have some special-casing for multi-op writes, but in the
case of a read, we can't really handle it. All of the current multi-op
callers call it with CEPH_OSD_FLAG_WRITE set.

Have ceph_osdc_new_request check for CEPH_OSD_FLAG_READ and if it's set,
allocate multiple reply ops instead of multiple request ops. If neither
flag is set, return -EINVAL.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: add CEPH_OSD_OP_ASSERT_VER support</title>
<updated>2023-08-24T09:24:35+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-08-25T13:31:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=69dd3b3930f96b624228000921f417fb0919a6ab'/>
<id>69dd3b3930f96b624228000921f417fb0919a6ab</id>
<content type='text'>
...and record the user_version in the reply in a new field in
ceph_osd_request, so we can populate the assert_ver appropriately.
Shuffle the fields a bit too so that the new field fits in an
existing hole on x86_64.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
...and record the user_version in the reply in a new field in
ceph_osd_request, so we can populate the assert_ver appropriately.
Shuffle the fields a bit too so that the new field fits in an
existing hole on x86_64.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: add new iov_iter-based ceph_msg_data_type and ceph_osd_data_type</title>
<updated>2023-08-22T07:01:48+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-07-01T10:30:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dee0c5f834605ce9b384ee8b9c7032ffd8db4eca'/>
<id>dee0c5f834605ce9b384ee8b9c7032ffd8db4eca</id>
<content type='text'>
Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor.
Instead of requiring a list of pages or bvecs, we can just use an
iov_iter directly, and avoid extra allocations.

We assume that the pages represented by the iter are pinned such that
they shouldn't incur page faults, which is the case for the iov_iters
created by netfs.

While working on this, Al Viro informed me that he was going to change
iov_iter_get_pages to auto-advance the iterator as that pattern is more
or less required for ITER_PIPE anyway. We emulate that here for now by
advancing in the _next op and tracking that amount in the "lastlen"
field.

In the event that _next is called twice without an intervening
_advance, we revert the iov_iter by the remaining lastlen before
calling iov_iter_get_pages.

Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor.
Instead of requiring a list of pages or bvecs, we can just use an
iov_iter directly, and avoid extra allocations.

We assume that the pages represented by the iter are pinned such that
they shouldn't incur page faults, which is the case for the iov_iters
created by netfs.

While working on this, Al Viro informed me that he was going to change
iov_iter_get_pages to auto-advance the iterator as that pattern is more
or less required for ITER_PIPE anyway. We emulate that here for now by
advancing in the _next op and tracking that amount in the "lastlen"
field.

In the event that _next is called twice without an intervening
_advance, we revert the iov_iter by the remaining lastlen before
calling iov_iter_get_pages.

Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: add sparse read support to OSD client</title>
<updated>2023-08-22T07:01:47+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-02-11T16:38:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f628d799972799023d32c2542bb2639eb8c4f84e'/>
<id>f628d799972799023d32c2542bb2639eb8c4f84e</id>
<content type='text'>
Have get_reply check for the presence of sparse read ops in the
request and set the sparse_read boolean in the msg. That will queue the
messenger layer to use the sparse read codepath instead of the normal
data receive.

Add a new sparse_read operation for the OSD client, driven by its own
state machine. The messenger will repeatedly call the sparse_read
operation, and it will pass back the necessary info to set up to read
the next extent of data, while zero-filling the sparse regions.

The state machine will stop at the end of the last extent, and will
attach the extent map buffer to the ceph_osd_req_op so that the caller
can use it.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Have get_reply check for the presence of sparse read ops in the
request and set the sparse_read boolean in the msg. That will queue the
messenger layer to use the sparse read codepath instead of the normal
data receive.

Add a new sparse_read operation for the OSD client, driven by its own
state machine. The messenger will repeatedly call the sparse_read
operation, and it will pass back the necessary info to set up to read
the next extent of data, while zero-filling the sparse regions.

The state machine will stop at the end of the last extent, and will
attach the extent map buffer to the ceph_osd_req_op so that the caller
can use it.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph: define struct ceph_sparse_extent and add some helpers</title>
<updated>2023-08-22T07:01:47+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-03-16T19:23:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a679e50f728648f7b2f3b349e082448abd388038'/>
<id>a679e50f728648f7b2f3b349e082448abd388038</id>
<content type='text'>
When the OSD sends back a sparse read reply, it contains an array of
these structures. Define the structure and add a couple of helpers for
dealing with them.

Also add a place in struct ceph_osd_req_op to store the extent buffer,
and code to free it if it's populated when the req is torn down.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the OSD sends back a sparse read reply, it contains an array of
these structures. Define the structure and add a couple of helpers for
dealing with them.

Also add a place in struct ceph_osd_req_op to store the extent buffer,
and code to free it if it's populated when the req is torn down.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Luís Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
