<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs/ceph, branch v6.5.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>vfs: get rid of old '-&gt;iterate' directory operation</title>
<updated>2023-08-06T13:08:35+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-08-05T19:25:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3e3271549670783be20e233a2b78a87a0b04c715'/>
<id>3e3271549670783be20e233a2b78a87a0b04c715</id>
<content type='text'>
All users now just use '-&gt;iterate_shared()', which only takes the
directory inode lock for reading.

Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.

This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.

The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.

Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All users now just use '-&gt;iterate_shared()', which only takes the
directory inode lock for reading.

Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.

This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.

The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.

Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: defer stopping mdsc delayed_work</title>
<updated>2023-08-01T22:13:02+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-07-25T04:03:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e7e607bd00481745550389a29ecabe33e13d67cf'/>
<id>e7e607bd00481745550389a29ecabe33e13d67cf</id>
<content type='text'>
Flushing the dirty buffer may take a long time if the cluster is
overloaded or if there is network issue. So we should ping the
MDSs periodically to keep alive, else the MDS will blocklist
the kclient.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61843
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Flushing the dirty buffer may take a long time if the cluster is
overloaded or if there is network issue. So we should ping the
MDSs periodically to keep alive, else the MDS will blocklist
the kclient.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61843
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: never send metrics if disable_send_metrics is set</title>
<updated>2023-07-24T11:15:39+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-07-20T03:33:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=50164507f6b7b7ed85d8c3ac0266849fbd908db7'/>
<id>50164507f6b7b7ed85d8c3ac0266849fbd908db7</id>
<content type='text'>
Even the 'disable_send_metrics' is true so when the session is
being opened it will always trigger to send the metric for the
first time.

Cc: stable@vger.kernel.org
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Venky Shankar &lt;vshankar@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Even the 'disable_send_metrics' is true so when the session is
being opened it will always trigger to send the metric for the
first time.

Cc: stable@vger.kernel.org
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Venky Shankar &lt;vshankar@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ceph-for-6.5-rc1' of https://github.com/ceph/ceph-client</title>
<updated>2023-07-07T22:07:20+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-07-07T22:07:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3290badd1bb8c9ea91db5c0b2e1a635178119856'/>
<id>3290badd1bb8c9ea91db5c0b2e1a635178119856</id>
<content type='text'>
Pull ceph updates from Ilya Dryomov:
 "A bunch of CephFS fixups from Xiubo, mostly around dropping caps,
  along with a fix for a regression in the readahead handling code which
  sneaked in with the switch to netfs helpers"

* tag 'ceph-for-6.5-rc1' of https://github.com/ceph/ceph-client:
  ceph: don't let check_caps skip sending responses for revoke msgs
  ceph: issue a cap release immediately if no cap exists
  ceph: trigger to flush the buffer when making snapshot
  ceph: fix blindly expanding the readahead windows
  ceph: add a dedicated private data for netfs rreq
  ceph: voluntarily drop Xx caps for requests those touch parent mtime
  ceph: try to dump the msgs when decoding fails
  ceph: only send metrics when the MDS rank is ready
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull ceph updates from Ilya Dryomov:
 "A bunch of CephFS fixups from Xiubo, mostly around dropping caps,
  along with a fix for a regression in the readahead handling code which
  sneaked in with the switch to netfs helpers"

* tag 'ceph-for-6.5-rc1' of https://github.com/ceph/ceph-client:
  ceph: don't let check_caps skip sending responses for revoke msgs
  ceph: issue a cap release immediately if no cap exists
  ceph: trigger to flush the buffer when making snapshot
  ceph: fix blindly expanding the readahead windows
  ceph: add a dedicated private data for netfs rreq
  ceph: voluntarily drop Xx caps for requests those touch parent mtime
  ceph: try to dump the msgs when decoding fails
  ceph: only send metrics when the MDS rank is ready
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: don't let check_caps skip sending responses for revoke msgs</title>
<updated>2023-06-30T10:08:55+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-06-27T23:57:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=257e6172ab36ebbe295a6c9ee9a9dd0fe54c1dc2'/>
<id>257e6172ab36ebbe295a6c9ee9a9dd0fe54c1dc2</id>
<content type='text'>
If a client sends out a cap update dropping caps with the prior 'seq'
just before an incoming cap revoke request, then the client may drop
the revoke because it believes it's already released the requested
capabilities.

This causes the MDS to wait indefinitely for the client to respond
to the revoke. It's therefore always a good idea to ack the cap
revoke request with the bumped up 'seq'.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61782
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Reviewed-by: Patrick Donnelly &lt;pdonnell@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a client sends out a cap update dropping caps with the prior 'seq'
just before an incoming cap revoke request, then the client may drop
the revoke because it believes it's already released the requested
capabilities.

This causes the MDS to wait indefinitely for the client to respond
to the revoke. It's therefore always a good idea to ack the cap
revoke request with the bumped up 'seq'.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61782
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Reviewed-by: Patrick Donnelly &lt;pdonnell@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: issue a cap release immediately if no cap exists</title>
<updated>2023-06-30T10:08:55+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-06-13T04:49:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ce72d4e0f179340cece90d5b826eb63bbf9fefc0'/>
<id>ce72d4e0f179340cece90d5b826eb63bbf9fefc0</id>
<content type='text'>
In case:

           mds                             client
                                - Releases cap and put Inode
  - Increase cap-&gt;seq and sends
    revokes req to the client
  - Receives release req and    - Receives &amp; drops the revoke req
    skip removing the cap and
    then eval the CInode and
    issue or revoke caps again.
                                - Receives &amp; drops the caps update
                                  or revoke req
  - Health warning for client
    isn't responding to
    mclientcaps(revoke)

All the IMPORT/REVOKE/GRANT cap ops will increase the session seq
in MDS side and then the client need to issue a cap release to
unblock MDS to remove the corresponding cap to unblock possible
waiters.

Link: https://tracker.ceph.com/issues/61332
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In case:

           mds                             client
                                - Releases cap and put Inode
  - Increase cap-&gt;seq and sends
    revokes req to the client
  - Receives release req and    - Receives &amp; drops the revoke req
    skip removing the cap and
    then eval the CInode and
    issue or revoke caps again.
                                - Receives &amp; drops the caps update
                                  or revoke req
  - Health warning for client
    isn't responding to
    mclientcaps(revoke)

All the IMPORT/REVOKE/GRANT cap ops will increase the session seq
in MDS side and then the client need to issue a cap release to
unblock MDS to remove the corresponding cap to unblock possible
waiters.

Link: https://tracker.ceph.com/issues/61332
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: trigger to flush the buffer when making snapshot</title>
<updated>2023-06-30T10:08:55+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-05-11T05:19:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2d12ad950b0c2a89d82f5d258309ad23aa70fc38'/>
<id>2d12ad950b0c2a89d82f5d258309ad23aa70fc38</id>
<content type='text'>
The 'i_wr_ref' is used to track the 'Fb' caps, while whenever the 'Fb'
caps is took the kclient will always take the 'Fw' caps at the same
time. That means it will always be a false check in __ceph_finish_cap_snap().

When writing to buffer the kclient will take both 'Fb|Fw' caps and then
write the contents to the buffer pages by increasing the 'i_wrbuffer_ref'
and then just release both 'Fb|Fw'. This is different with the user
space libcephfs, which will keep the 'Fb' being took and use 'i_wr_ref'
instead of 'i_wrbuffer_ref' to track this until the buffer is flushed
to Rados.

We need to defer flushing the capsnap until the corresponding buffer
pages are all flushed to Rados, and at the same time just trigger to
flush the buffer pages immediately.

Link: https://tracker.ceph.com/issues/48640
Link: https://tracker.ceph.com/issues/59343
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The 'i_wr_ref' is used to track the 'Fb' caps, while whenever the 'Fb'
caps is took the kclient will always take the 'Fw' caps at the same
time. That means it will always be a false check in __ceph_finish_cap_snap().

When writing to buffer the kclient will take both 'Fb|Fw' caps and then
write the contents to the buffer pages by increasing the 'i_wrbuffer_ref'
and then just release both 'Fb|Fw'. This is different with the user
space libcephfs, which will keep the 'Fb' being took and use 'i_wr_ref'
instead of 'i_wrbuffer_ref' to track this until the buffer is flushed
to Rados.

We need to defer flushing the capsnap until the corresponding buffer
pages are all flushed to Rados, and at the same time just trigger to
flush the buffer pages immediately.

Link: https://tracker.ceph.com/issues/48640
Link: https://tracker.ceph.com/issues/59343
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: fix blindly expanding the readahead windows</title>
<updated>2023-06-30T10:08:55+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-05-04T11:00:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dc94bb8f271c079f69583d0f12a489aaf5202751'/>
<id>dc94bb8f271c079f69583d0f12a489aaf5202751</id>
<content type='text'>
Blindly expanding the readahead windows will cause unneccessary
pagecache thrashing and also will introduce the network workload.
We should disable expanding the windows if the readahead is disabled
and also shouldn't expand the windows too much.

Expanding forward firstly instead of expanding backward for possible
sequential reads.

Bound `rreq-&gt;len` to the actual file size to restore the previous page
cache usage.

The posix_fadvise may change the maximum size of a file readahead.

Cc: stable@vger.kernel.org
Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead")
Link: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu.cn
Link: https://www.spinics.net/lists/ceph-users/msg76183.html
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Hu Weiwen &lt;sehuww@mail.scut.edu.cn&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Blindly expanding the readahead windows will cause unneccessary
pagecache thrashing and also will introduce the network workload.
We should disable expanding the windows if the readahead is disabled
and also shouldn't expand the windows too much.

Expanding forward firstly instead of expanding backward for possible
sequential reads.

Bound `rreq-&gt;len` to the actual file size to restore the previous page
cache usage.

The posix_fadvise may change the maximum size of a file readahead.

Cc: stable@vger.kernel.org
Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead")
Link: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu.cn
Link: https://www.spinics.net/lists/ceph-users/msg76183.html
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Hu Weiwen &lt;sehuww@mail.scut.edu.cn&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: add a dedicated private data for netfs rreq</title>
<updated>2023-06-30T10:08:55+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-05-10T11:55:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=23ee27dce30e7d3091d6c3143b79f48dab6f9a3e'/>
<id>23ee27dce30e7d3091d6c3143b79f48dab6f9a3e</id>
<content type='text'>
We need to save the 'f_ra.ra_pages' to expand the readahead window
later.

Cc: stable@vger.kernel.org
Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead")
Link: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu.cn
Link: https://www.spinics.net/lists/ceph-users/msg76183.html
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Hu Weiwen &lt;sehuww@mail.scut.edu.cn&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need to save the 'f_ra.ra_pages' to expand the readahead window
later.

Cc: stable@vger.kernel.org
Fixes: 49870056005c ("ceph: convert ceph_readpages to ceph_readahead")
Link: https://lore.kernel.org/ceph-devel/20230504082510.247-1-sehuww@mail.scut.edu.cn
Link: https://www.spinics.net/lists/ceph-users/msg76183.html
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-and-tested-by: Hu Weiwen &lt;sehuww@mail.scut.edu.cn&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: voluntarily drop Xx caps for requests those touch parent mtime</title>
<updated>2023-06-30T10:08:55+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-06-05T06:58:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d9d00f71ab5a2b5a47b228f678a8817e8687387f'/>
<id>d9d00f71ab5a2b5a47b228f678a8817e8687387f</id>
<content type='text'>
For write requests the parent's mtime will be updated correspondingly.
And if the 'Xx' caps is issued and when releasing other caps together
with the write requests the MDS Locker will try to eval the xattr lock,
which need to change the locker state excl --&gt; sync and need to do Xx
caps revocation.

Just voluntarily dropping CEPH_CAP_XATTR_EXCL caps to avoid a cap
revoke message, which could cause the mtime will be overwrote by stale
one.

[ idryomov: break unnecessarily long lines ]

Link: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For write requests the parent's mtime will be updated correspondingly.
And if the 'Xx' caps is issued and when releasing other caps together
with the write requests the MDS Locker will try to eval the xattr lock,
which need to change the locker state excl --&gt; sync and need to do Xx
caps revocation.

Just voluntarily dropping CEPH_CAP_XATTR_EXCL caps to avoid a cap
revoke message, which could cause the mtime will be overwrote by stale
one.

[ idryomov: break unnecessarily long lines ]

Link: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
