<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ceph/addr.c, branch v5.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>netfs: do not unlock and put the folio twice</title>
<updated>2022-07-14T08:10:12+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-07-11T04:11:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fac47b43c760ea90e64b895dba60df0327be7775'/>
<id>fac47b43c760ea90e64b895dba60df0327be7775</id>
<content type='text'>
check_write_begin() will unlock and put the folio when return
non-zero.  So we should avoid unlocking and putting it twice in
netfs layer.

Change the way -&gt;check_write_begin() works in the following two ways:

 (1) Pass it a pointer to the folio pointer, allowing it to unlock and put
     the folio prior to doing the stuff it wants to do, provided it clears
     the folio pointer.

 (2) Change the return values such that 0 with folio pointer set means
     continue, 0 with folio pointer cleared means re-get and all error
     codes indicating an error (no special treatment for -EAGAIN).

[ bagasdotme: use Sphinx code text syntax for *foliop pointer ]

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/56423
Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com
Co-developed-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
check_write_begin() will unlock and put the folio when return
non-zero.  So we should avoid unlocking and putting it twice in
netfs layer.

Change the way -&gt;check_write_begin() works in the following two ways:

 (1) Pass it a pointer to the folio pointer, allowing it to unlock and put
     the folio prior to doing the stuff it wants to do, provided it clears
     the folio pointer.

 (2) Change the return values such that 0 with folio pointer set means
     continue, 0 with folio pointer cleared means re-get and all error
     codes indicating an error (no special treatment for -EAGAIN).

[ bagasdotme: use Sphinx code text syntax for *foliop pointer ]

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/56423
Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com
Co-developed-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs: Rename the netfs_io_request cleanup op and give it an op pointer</title>
<updated>2022-06-10T19:55:21+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2022-02-25T11:19:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=40a81101202300df7db273f77dda9fbe6271b1d2'/>
<id>40a81101202300df7db273f77dda9fbe6271b1d2</id>
<content type='text'>
The netfs_io_request cleanup op is now always in a position to be given a
pointer to a netfs_io_request struct, so this can be passed in instead of
the mapping and private data arguments (both of which are included in the
struct).

So rename the -&gt;cleanup op to -&gt;free_request (to match -&gt;init_request) and
pass in the I/O pointer.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: linux-cachefs@redhat.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The netfs_io_request cleanup op is now always in a position to be given a
pointer to a netfs_io_request struct, so this can be passed in instead of
the mapping and private data arguments (both of which are included in the
struct).

So rename the -&gt;cleanup op to -&gt;free_request (to match -&gt;init_request) and
pass in the I/O pointer.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: linux-cachefs@redhat.com
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs: Further cleanups after struct netfs_inode wrapper introduced</title>
<updated>2022-06-10T19:55:21+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-06-09T22:04:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e81fb4198e27925b151aad1450e0fd607d6733f8'/>
<id>e81fb4198e27925b151aad1450e0fd607d6733f8</id>
<content type='text'>
Change the signature of netfs helper functions to take a struct netfs_inode
pointer rather than a struct inode pointer where appropriate, thereby
relieving the need for the network filesystem to convert its internal inode
format down to the VFS inode only for netfslib to bounce it back up.  For
type safety, it's better not to do that (and it's less typing too).

Give netfs_write_begin() an extra argument to pass in a pointer to the
netfs_inode struct rather than deriving it internally from the file
pointer.  Note that the -&gt;write_begin() and -&gt;write_end() ops are intended
to be replaced in the future by netfslib code that manages this without the
need to call in twice for each page.

netfs_readpage() and similar are intended to be pointed at directly by the
address_space_operations table, so must stick to the signature dictated by
the function pointers there.

Changes
=======
- Updated the kerneldoc comments and documentation [DH].

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change the signature of netfs helper functions to take a struct netfs_inode
pointer rather than a struct inode pointer where appropriate, thereby
relieving the need for the network filesystem to convert its internal inode
format down to the VFS inode only for netfslib to bounce it back up.  For
type safety, it's better not to do that (and it's less typing too).

Give netfs_write_begin() an extra argument to pass in a pointer to the
netfs_inode struct rather than deriving it internally from the file
pointer.  Note that the -&gt;write_begin() and -&gt;write_end() ops are intended
to be replaced in the future by netfslib code that manages this without the
need to call in twice for each page.

netfs_readpage() and similar are intended to be pointed at directly by the
address_space_operations table, so must stick to the signature dictated by
the function pointers there.

Changes
=======
- Updated the kerneldoc comments and documentation [DH].

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context</title>
<updated>2022-06-09T20:55:00+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2022-06-09T20:46:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=874c8ca1e60b2c564a48f7e7acc40d328d5c8733'/>
<id>874c8ca1e60b2c564a48f7e7acc40d328d5c8733</id>
<content type='text'>
While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context &lt;-&gt; inode casting, __builtin_object_size() as
used by FORTIFY_SOURCE was not as easily fooled.  This was causing the
following complaint[1] from gcc v12:

  In file included from include/linux/string.h:253,
                   from include/linux/ceph/ceph_debug.h:7,
                   from fs/ceph/inode.c:2:
  In function 'fortify_memset_chk',
      inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
      inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
  include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    242 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by embedding a struct inode into struct netfs_i_context (which
should perhaps be renamed to struct netfs_inode).  The struct inode
vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
structs and vfs_inode is then simply changed to "netfs.inode" in those
filesystems.

Further, rename netfs_i_context to netfs_inode, get rid of the
netfs_inode() function that converted a netfs_i_context pointer to an
inode pointer (that can now be done with &amp;ctx-&gt;inode) and rename the
netfs_i_context() function to netfs_inode() (which is now a wrapper
around container_of()).

Most of the changes were done with:

  perl -p -i -e 's/vfs_inode/netfs.inode/'g \
        `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`

Kees suggested doing it with a pair structure[2] and a special
declarator to insert that into the network filesystem's inode
wrapper[3], but I think it's cleaner to embed it - and then it doesn't
matter if struct randomisation reorders things.

Dave Chinner suggested using a filesystem-specific VFS_I() function in
each filesystem to convert that filesystem's own inode wrapper struct
into the VFS inode struct[4].

Version #2:
 - Fix a couple of missed name changes due to a disabled cifs option.
 - Rename nfs_i_context to nfs_inode
 - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
   structs.

[ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily
  disable '-Wattribute-warning' for now") that is no longer needed ]

Fixes: bc899ee1c898 ("netfs: Add a netfs inode context")
Reported-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
cc: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
cc: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
cc: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
cc: Christian Schoenebeck &lt;linux_oss@crudebyte.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Steve French &lt;smfrench@gmail.com&gt;
cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
cc: Dave Chinner &lt;david@fromorbit.com&gt;
cc: linux-doc@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: samba-technical@lists.samba.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context &lt;-&gt; inode casting, __builtin_object_size() as
used by FORTIFY_SOURCE was not as easily fooled.  This was causing the
following complaint[1] from gcc v12:

  In file included from include/linux/string.h:253,
                   from include/linux/ceph/ceph_debug.h:7,
                   from fs/ceph/inode.c:2:
  In function 'fortify_memset_chk',
      inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
      inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
  include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    242 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by embedding a struct inode into struct netfs_i_context (which
should perhaps be renamed to struct netfs_inode).  The struct inode
vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
structs and vfs_inode is then simply changed to "netfs.inode" in those
filesystems.

Further, rename netfs_i_context to netfs_inode, get rid of the
netfs_inode() function that converted a netfs_i_context pointer to an
inode pointer (that can now be done with &amp;ctx-&gt;inode) and rename the
netfs_i_context() function to netfs_inode() (which is now a wrapper
around container_of()).

Most of the changes were done with:

  perl -p -i -e 's/vfs_inode/netfs.inode/'g \
        `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`

Kees suggested doing it with a pair structure[2] and a special
declarator to insert that into the network filesystem's inode
wrapper[3], but I think it's cleaner to embed it - and then it doesn't
matter if struct randomisation reorders things.

Dave Chinner suggested using a filesystem-specific VFS_I() function in
each filesystem to convert that filesystem's own inode wrapper struct
into the VFS inode struct[4].

Version #2:
 - Fix a couple of missed name changes due to a disabled cifs option.
 - Rename nfs_i_context to nfs_inode
 - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
   structs.

[ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily
  disable '-Wattribute-warning' for now") that is no longer needed ]

Fixes: bc899ee1c898 ("netfs: Add a netfs inode context")
Reported-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
cc: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
cc: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
cc: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
cc: Christian Schoenebeck &lt;linux_oss@crudebyte.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Steve French &lt;smfrench@gmail.com&gt;
cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
cc: Dave Chinner &lt;david@fromorbit.com&gt;
cc: linux-doc@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: samba-technical@lists.samba.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client</title>
<updated>2022-06-02T15:59:39+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-06-02T15:59:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=17d8e3d90b6989419806c1926b894d7d7483a25b'/>
<id>17d8e3d90b6989419806c1926b894d7d7483a25b</id>
<content type='text'>
Pull ceph updates from Ilya Dryomov:
 "A big pile of assorted fixes and improvements for the filesystem with
  nothing in particular standing out, except perhaps that the fact that
  the MDS never really maintained atime was made official and thus it's
  no longer updated on the client either.

  We also have a MAINTAINERS update: Jeff is transitioning his
  filesystem maintainership duties to Xiubo"

* tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client: (23 commits)
  MAINTAINERS: move myself from ceph "Maintainer" to "Reviewer"
  ceph: fix decoding of client session messages flags
  ceph: switch TASK_INTERRUPTIBLE to TASK_KILLABLE
  ceph: remove redundant variable ino
  ceph: try to queue a writeback if revoking fails
  ceph: fix statfs for subdir mounts
  ceph: fix possible deadlock when holding Fwb to get inline_data
  ceph: redirty the page for writepage on failure
  ceph: try to choose the auth MDS if possible for getattr
  ceph: disable updating the atime since cephfs won't maintain it
  ceph: flush the mdlog for filesystem sync
  ceph: rename unsafe_request_wait()
  libceph: use swap() macro instead of taking tmp variable
  ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check
  ceph: no need to invalidate the fscache twice
  ceph: replace usage of found with dedicated list iterator variable
  ceph: use dedicated list iterator variable
  ceph: update the dlease for the hashed dentry when removing
  ceph: stop retrying the request when exceeding 256 times
  ceph: stop forwarding the request when exceeding 256 times
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull ceph updates from Ilya Dryomov:
 "A big pile of assorted fixes and improvements for the filesystem with
  nothing in particular standing out, except perhaps that the fact that
  the MDS never really maintained atime was made official and thus it's
  no longer updated on the client either.

  We also have a MAINTAINERS update: Jeff is transitioning his
  filesystem maintainership duties to Xiubo"

* tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client: (23 commits)
  MAINTAINERS: move myself from ceph "Maintainer" to "Reviewer"
  ceph: fix decoding of client session messages flags
  ceph: switch TASK_INTERRUPTIBLE to TASK_KILLABLE
  ceph: remove redundant variable ino
  ceph: try to queue a writeback if revoking fails
  ceph: fix statfs for subdir mounts
  ceph: fix possible deadlock when holding Fwb to get inline_data
  ceph: redirty the page for writepage on failure
  ceph: try to choose the auth MDS if possible for getattr
  ceph: disable updating the atime since cephfs won't maintain it
  ceph: flush the mdlog for filesystem sync
  ceph: rename unsafe_request_wait()
  libceph: use swap() macro instead of taking tmp variable
  ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check
  ceph: no need to invalidate the fscache twice
  ceph: replace usage of found with dedicated list iterator variable
  ceph: use dedicated list iterator variable
  ceph: update the dlease for the hashed dentry when removing
  ceph: stop retrying the request when exceeding 256 times
  ceph: stop forwarding the request when exceeding 256 times
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: fix possible deadlock when holding Fwb to get inline_data</title>
<updated>2022-05-25T18:45:14+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-04-25T08:08:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=825978fd6a0defc3c29d8a38b6cea76a0938d21e'/>
<id>825978fd6a0defc3c29d8a38b6cea76a0938d21e</id>
<content type='text'>
1, mount with wsync.
2, create a file with O_RDWR, and the request was sent to mds.0:

   ceph_atomic_open()--&gt;
     ceph_mdsc_do_request(openc)
     finish_open(file, dentry, ceph_open)--&gt;
       ceph_open()--&gt;
         ceph_init_file()--&gt;
           ceph_init_file_info()--&gt;
             ceph_uninline_data()--&gt;
             {
               ...
               if (inline_version == 1 || /* initial version, no data */
                   inline_version == CEPH_INLINE_NONE)
                     goto out_unlock;
               ...
             }

The inline_version will be 1, which is the initial version for the
new create file. And here the ci-&gt;i_inline_version will keep with 1,
it's buggy.

3, buffer write to the file immediately:

   ceph_write_iter()--&gt;
     ceph_get_caps(file, need=Fw, want=Fb, ...);
     generic_perform_write()--&gt;
       a_ops-&gt;write_begin()--&gt;
         ceph_write_begin()--&gt;
           netfs_write_begin()--&gt;
             netfs_begin_read()--&gt;
               netfs_rreq_submit_slice()--&gt;
                 netfs_read_from_server()--&gt;
                   rreq-&gt;netfs_ops-&gt;issue_read()--&gt;
                     ceph_netfs_issue_read()--&gt;
                     {
                       ...
                       if (ci-&gt;i_inline_version != CEPH_INLINE_NONE &amp;&amp;
                           ceph_netfs_issue_op_inline(subreq))
                         return;
                       ...
                     }
     ceph_put_cap_refs(ci, Fwb);

The ceph_netfs_issue_op_inline() will send a getattr(Fsr) request to
mds.1.

4, then the mds.1 will request the rd lock for CInode::filelock from
the auth mds.0, the mds.0 will do the CInode::filelock state transation
from excl --&gt; sync, but it need to revoke the Fxwb caps back from the
clients.

While the kernel client has aleady held the Fwb caps and waiting for
the getattr(Fsr).

It's deadlock!

URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
1, mount with wsync.
2, create a file with O_RDWR, and the request was sent to mds.0:

   ceph_atomic_open()--&gt;
     ceph_mdsc_do_request(openc)
     finish_open(file, dentry, ceph_open)--&gt;
       ceph_open()--&gt;
         ceph_init_file()--&gt;
           ceph_init_file_info()--&gt;
             ceph_uninline_data()--&gt;
             {
               ...
               if (inline_version == 1 || /* initial version, no data */
                   inline_version == CEPH_INLINE_NONE)
                     goto out_unlock;
               ...
             }

The inline_version will be 1, which is the initial version for the
new create file. And here the ci-&gt;i_inline_version will keep with 1,
it's buggy.

3, buffer write to the file immediately:

   ceph_write_iter()--&gt;
     ceph_get_caps(file, need=Fw, want=Fb, ...);
     generic_perform_write()--&gt;
       a_ops-&gt;write_begin()--&gt;
         ceph_write_begin()--&gt;
           netfs_write_begin()--&gt;
             netfs_begin_read()--&gt;
               netfs_rreq_submit_slice()--&gt;
                 netfs_read_from_server()--&gt;
                   rreq-&gt;netfs_ops-&gt;issue_read()--&gt;
                     ceph_netfs_issue_read()--&gt;
                     {
                       ...
                       if (ci-&gt;i_inline_version != CEPH_INLINE_NONE &amp;&amp;
                           ceph_netfs_issue_op_inline(subreq))
                         return;
                       ...
                     }
     ceph_put_cap_refs(ci, Fwb);

The ceph_netfs_issue_op_inline() will send a getattr(Fsr) request to
mds.1.

4, then the mds.1 will request the rd lock for CInode::filelock from
the auth mds.0, the mds.0 will do the CInode::filelock state transation
from excl --&gt; sync, but it need to revoke the Fxwb caps back from the
clients.

While the kernel client has aleady held the Fwb caps and waiting for
the getattr(Fsr).

It's deadlock!

URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: redirty the page for writepage on failure</title>
<updated>2022-05-25T18:45:14+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-04-24T09:35:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3459bd0c55ed28f86a1fd2e01e376492cb6d4f91'/>
<id>3459bd0c55ed28f86a1fd2e01e376492cb6d4f91</id>
<content type='text'>
When run out of memories we should redirty the page before failing
the writepage. Or we will hit BUG_ON(folio_get_private(folio)) in
ceph_dirty_folio().

URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When run out of memories we should redirty the page before failing
the writepage. Or we will hit BUG_ON(folio_get_private(folio)) in
ceph_dirty_folio().

URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: try to choose the auth MDS if possible for getattr</title>
<updated>2022-05-25T18:45:14+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-04-21T03:26:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5eed80fba65cd707075892450bc5d6bd464862a0'/>
<id>5eed80fba65cd707075892450bc5d6bd464862a0</id>
<content type='text'>
If any 'x' caps is issued we can just choose the auth MDS instead
of the random replica MDSes. Because only when the Locker is in
LOCK_EXEC state will the loner client could get the 'x' caps. And
if we send the getattr requests to any replica MDS it must auth pin
and tries to rdlock from the auth MDS, and then the auth MDS need
to do the Locker state transition to LOCK_SYNC. And after that the
lock state will change back.

This cost much when doing the Locker state transition and usually
will need to revoke caps from clients.

URL: https://tracker.ceph.com/issues/55240
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If any 'x' caps is issued we can just choose the auth MDS instead
of the random replica MDSes. Because only when the Locker is in
LOCK_EXEC state will the loner client could get the 'x' caps. And
if we send the getattr requests to any replica MDS it must auth pin
and tries to rdlock from the auth MDS, and then the auth MDS need
to do the Locker state transition to LOCK_SYNC. And after that the
lock state will change back.

This cost much when doing the Locker state transition and usually
will need to revoke caps from clients.

URL: https://tracker.ceph.com/issues/55240
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: disable updating the atime since cephfs won't maintain it</title>
<updated>2022-05-25T18:45:14+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-04-20T05:13:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f7a2d0688a3b2bb4769402b4c962f54f7b0fc23c'/>
<id>f7a2d0688a3b2bb4769402b4c962f54f7b0fc23c</id>
<content type='text'>
Since CephFS makes no attempt to maintain atime, we shouldn't
try to update it in mmap and generic read cases and ignore updating
it in direct and sync read cases.

And even we update it in mmap and generic read cases we will drop
it and won't sync it to MDS. And we are seeing the atime will be
updated and then dropped to the floor again and again.

URL: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VSJM7T4CS5TDRFF6XFPIYMHP75K73PZ6/
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Acked-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since CephFS makes no attempt to maintain atime, we shouldn't
try to update it in mmap and generic read cases and ignore updating
it in direct and sync read cases.

And even we update it in mmap and generic read cases we will drop
it and won't sync it to MDS. And we are seeing the atime will be
updated and then dropped to the floor again and again.

URL: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VSJM7T4CS5TDRFF6XFPIYMHP75K73PZ6/
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Acked-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache</title>
<updated>2022-05-25T02:55:07+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-25T02:55:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fdaf9a5840acaab18694a19e0eb0aa51162eeeed'/>
<id>fdaf9a5840acaab18694a19e0eb0aa51162eeeed</id>
<content type='text'>
Pull page cache updates from Matthew Wilcox:

 - Appoint myself page cache maintainer

 - Fix how scsicam uses the page cache

 - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS

 - Remove the AOP flags entirely

 - Remove pagecache_write_begin() and pagecache_write_end()

 - Documentation updates

 - Convert several address_space operations to use folios:
     - is_dirty_writeback
     - readpage becomes read_folio
     - releasepage becomes release_folio
     - freepage becomes free_folio

 - Change filler_t to require a struct file pointer be the first
   argument like -&gt;read_folio

* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
  nilfs2: Fix some kernel-doc comments
  Appoint myself page cache maintainer
  fs: Remove aops-&gt;freepage
  secretmem: Convert to free_folio
  nfs: Convert to free_folio
  orangefs: Convert to free_folio
  fs: Add free_folio address space operation
  fs: Convert drop_buffers() to use a folio
  fs: Change try_to_free_buffers() to take a folio
  jbd2: Convert release_buffer_page() to use a folio
  jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
  reiserfs: Convert release_buffer_page() to use a folio
  fs: Remove last vestiges of releasepage
  ubifs: Convert to release_folio
  reiserfs: Convert to release_folio
  orangefs: Convert to release_folio
  ocfs2: Convert to release_folio
  nilfs2: Remove comment about releasepage
  nfs: Convert to release_folio
  jfs: Convert to release_folio
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull page cache updates from Matthew Wilcox:

 - Appoint myself page cache maintainer

 - Fix how scsicam uses the page cache

 - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS

 - Remove the AOP flags entirely

 - Remove pagecache_write_begin() and pagecache_write_end()

 - Documentation updates

 - Convert several address_space operations to use folios:
     - is_dirty_writeback
     - readpage becomes read_folio
     - releasepage becomes release_folio
     - freepage becomes free_folio

 - Change filler_t to require a struct file pointer be the first
   argument like -&gt;read_folio

* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
  nilfs2: Fix some kernel-doc comments
  Appoint myself page cache maintainer
  fs: Remove aops-&gt;freepage
  secretmem: Convert to free_folio
  nfs: Convert to free_folio
  orangefs: Convert to free_folio
  fs: Add free_folio address space operation
  fs: Convert drop_buffers() to use a folio
  fs: Change try_to_free_buffers() to take a folio
  jbd2: Convert release_buffer_page() to use a folio
  jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
  reiserfs: Convert release_buffer_page() to use a folio
  fs: Remove last vestiges of releasepage
  ubifs: Convert to release_folio
  reiserfs: Convert to release_folio
  orangefs: Convert to release_folio
  ocfs2: Convert to release_folio
  nilfs2: Remove comment about releasepage
  nfs: Convert to release_folio
  jfs: Convert to release_folio
  ...
</pre>
</div>
</content>
</entry>
</feed>
