<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ceph/addr.c, branch v6.11</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>9p: Fix DIO read through netfs</title>
<updated>2024-08-13T11:53:09+00:00</updated>
<author>
<name>Dominique Martinet</name>
<email>asmadeus@codewreck.org</email>
</author>
<published>2024-08-08T13:29:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e3786b29c54cdae3490b07180a54e2461f42144c'/>
<id>e3786b29c54cdae3490b07180a54e2461f42144c</id>
<content type='text'>
If a program is watching a file on a 9p mount, it won't see any change in
size if the file being exported by the server is changed directly in the
source filesystem, presumably because 9p doesn't have change notifications,
and because netfs skips the reads if the file is empty.

Fix this by attempting to read the full size specified when a DIO read is
requested (such as when 9p is operating in unbuffered mode) and dealing
with a short read if the EOF was less than the expected read.

To make this work, filesystems using netfslib must not set
NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
I don't want to mandatorily clear this flag in netfslib for DIO because,
say, ceph might make a read from an object that is not completely filled,
but does not reside at the end of file - and so we need to clear the
excess.

This can be tested by watching an empty file over 9p within a VM (such as
in the ktest framework):

        while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done &lt; /host/tmp/foo

then writing something into the empty file.  The watcher should immediately
display the file content and break out of the loop.  Without this fix, it
remains in the loop indefinitely.

Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Link: https://lore.kernel.org/r/1229195.1723211769@warthog.procyon.org.uk
cc: Eric Van Hensbergen &lt;ericvh@kernel.org&gt;
cc: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
cc: Christian Schoenebeck &lt;linux_oss@crudebyte.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Steve French &lt;sfrench@samba.org&gt;
cc: Paulo Alcantara &lt;pc@manguebit.com&gt;
cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a program is watching a file on a 9p mount, it won't see any change in
size if the file being exported by the server is changed directly in the
source filesystem, presumably because 9p doesn't have change notifications,
and because netfs skips the reads if the file is empty.

Fix this by attempting to read the full size specified when a DIO read is
requested (such as when 9p is operating in unbuffered mode) and dealing
with a short read if the EOF was less than the expected read.

To make this work, filesystems using netfslib must not set
NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF.
I don't want to mandatorily clear this flag in netfslib for DIO because,
say, ceph might make a read from an object that is not completely filled,
but does not reside at the end of file - and so we need to clear the
excess.

This can be tested by watching an empty file over 9p within a VM (such as
in the ktest framework):

        while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done &lt; /host/tmp/foo

then writing something into the empty file.  The watcher should immediately
display the file content and break out of the loop.  Without this fix, it
remains in the loop indefinitely.

Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Link: https://lore.kernel.org/r/1229195.1723211769@warthog.procyon.org.uk
cc: Eric Van Hensbergen &lt;ericvh@kernel.org&gt;
cc: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
cc: Christian Schoenebeck &lt;linux_oss@crudebyte.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Steve French &lt;sfrench@samba.org&gt;
cc: Paulo Alcantara &lt;pc@manguebit.com&gt;
cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags</title>
<updated>2024-08-12T20:03:27+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-08-07T18:38:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7b589a9b45ae32aa9d7bece597490e141198d7a6'/>
<id>7b589a9b45ae32aa9d7bece597490e141198d7a6</id>
<content type='text'>
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly.  The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct.  Further, we secondarily sample the
cache state and make contradictory decisions later.

The issue arises because we set up the cache resources, which allows the
cache's -&gt;prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.

Fix this in the following way:

 (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
     -&gt;init_request() rather than trying to juggle that in
     netfs_alloc_request().

 (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
     to be done, then PG_private_2 is to be used rather than only setting
     it if we decide to cache and then having netfs_rreq_unlock_folios()
     set the non-PG_private_2 writeback-to-cache if it wasn't set.

 (3) Split netfs_rreq_unlock_folios() into two functions, one of which
     contains the deprecated code for using PG_private_2 to avoid
     accidentally doing the writeback path - and always use it if
     USE_PGPRIV2 is set.

 (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
     wait for PG_private_2.  This function is deprecated and only used by
     ceph anyway, and so label it so.

 (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
     fscache_operation_valid() on the cache_resources instead.  This has
     the advantage of picking up the result of netfs_begin_cache_read() and
     fscache_begin_write_operation() - which are called after the object is
     initialised and will wait for the cache to come to a usable state.

Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that.  Without this as well, things like:

 rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {

and:

 WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386

may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.

Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio-&gt;private and marking dirty")
Reported-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Hristo Venev &lt;hristo@venev.name&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox &lt;willy@infradead.org&gt;
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.uk
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly.  The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct.  Further, we secondarily sample the
cache state and make contradictory decisions later.

The issue arises because we set up the cache resources, which allows the
cache's -&gt;prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.

Fix this in the following way:

 (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
     -&gt;init_request() rather than trying to juggle that in
     netfs_alloc_request().

 (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
     to be done, then PG_private_2 is to be used rather than only setting
     it if we decide to cache and then having netfs_rreq_unlock_folios()
     set the non-PG_private_2 writeback-to-cache if it wasn't set.

 (3) Split netfs_rreq_unlock_folios() into two functions, one of which
     contains the deprecated code for using PG_private_2 to avoid
     accidentally doing the writeback path - and always use it if
     USE_PGPRIV2 is set.

 (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
     wait for PG_private_2.  This function is deprecated and only used by
     ceph anyway, and so label it so.

 (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
     fscache_operation_valid() on the cache_resources instead.  This has
     the advantage of picking up the result of netfs_begin_cache_read() and
     fscache_begin_write_operation() - which are called after the object is
     initialised and will wait for the cache to come to a usable state.

Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that.  Without this as well, things like:

 rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {

and:

 WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386

may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.

Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio-&gt;private and marking dirty")
Reported-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Hristo Venev &lt;hristo@venev.name&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox &lt;willy@infradead.org&gt;
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.uk
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"</title>
<updated>2024-08-12T20:03:27+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-07-30T16:01:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8e5ced7804cb9184c4a23f8054551240562a8eda'/>
<id>8e5ced7804cb9184c4a23f8054551240562a8eda</id>
<content type='text'>
This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a.

Revert the patch that removes the deprecated use of PG_private_2 in
netfslib for the moment as Ceph is actually still using this to track
data copied to the cache.

Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag")
Reported-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox &lt;willy@infradead.org&gt;
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a.

Revert the patch that removes the deprecated use of PG_private_2 in
netfslib for the moment as Ceph is actually still using this to track
data copied to the cache.

Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag")
Reported-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox &lt;willy@infradead.org&gt;
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs: Switch to using unsigned long long rather than loff_t</title>
<updated>2024-05-01T17:07:35+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-03-18T16:57:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7ba167c4c73ed96eb002c98a9d7d49317dfb0191'/>
<id>7ba167c4c73ed96eb002c98a9d7d49317dfb0191</id>
<content type='text'>
Switch to using unsigned long long rather than loff_t in netfslib to avoid
problems with the sign flipping in the maths when we're dealing with the
byte at position 0x7fffffffffffffff.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switch to using unsigned long long rather than loff_t in netfslib to avoid
problems with the sign flipping in the maths when we're dealing with the
byte at position 0x7fffffffffffffff.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs: Remove deprecated use of PG_private_2 as a second writeback flag</title>
<updated>2024-04-29T14:01:43+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2023-11-24T13:02:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ae678317b95e760607c7b20b97c9cd4ca9ed6e1a'/>
<id>ae678317b95e760607c7b20b97c9cd4ca9ed6e1a</id>
<content type='text'>
Remove the deprecated use of PG_private_2 in netfslib.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the deprecated use of PG_private_2 in netfslib.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: Remove the PG_fscache alias for PG_private_2</title>
<updated>2024-04-29T14:01:42+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-03-19T11:13:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2e9d7e4b984a61823c41ba65e1b58b98ca9912bb'/>
<id>2e9d7e4b984a61823c41ba65e1b58b98ca9912bb</id>
<content type='text'>
Remove the PG_fscache alias for PG_private_2 and use the latter directly.
Use of this flag for marking pages undergoing writing to the cache should
be considered deprecated and the folios should be marked dirty instead and
the write done in -&gt;writepages().

Note that PG_private_2 itself should be considered deprecated and up for
future removal by the MM folks too.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Steve French &lt;sfrench@samba.org&gt;
cc: Paulo Alcantara &lt;pc@manguebit.com&gt;
cc: Ronnie Sahlberg &lt;ronniesahlberg@gmail.com&gt;
cc: Shyam Prasad N &lt;sprasad@microsoft.com&gt;
cc: Tom Talpey &lt;tom@talpey.com&gt;
cc: Bharath SM &lt;bharathsm@microsoft.com&gt;
cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
cc: Anna Schumaker &lt;anna@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the PG_fscache alias for PG_private_2 and use the latter directly.
Use of this flag for marking pages undergoing writing to the cache should
be considered deprecated and the folios should be marked dirty instead and
the write done in -&gt;writepages().

Note that PG_private_2 itself should be considered deprecated and up for
future removal by the MM folks too.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Steve French &lt;sfrench@samba.org&gt;
cc: Paulo Alcantara &lt;pc@manguebit.com&gt;
cc: Ronnie Sahlberg &lt;ronniesahlberg@gmail.com&gt;
cc: Shyam Prasad N &lt;sprasad@microsoft.com&gt;
cc: Tom Talpey &lt;tom@talpey.com&gt;
cc: Bharath SM &lt;bharathsm@microsoft.com&gt;
cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
cc: Anna Schumaker &lt;anna@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
</pre>
</div>
</content>
</entry>
<entry>
<title>netfs: Replace PG_fscache by setting folio-&gt;private and marking dirty</title>
<updated>2024-04-29T14:01:42+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-03-19T10:00:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2ff1e97587f4d398686f52c07afde3faf3da4e5c'/>
<id>2ff1e97587f4d398686f52c07afde3faf3da4e5c</id>
<content type='text'>
When dirty data is being written to the cache, setting/waiting on/clearing
the fscache flag is always done in tandem with setting/waiting on/clearing
the writeback flag.  The netfslib buffered write routines wait on and set
both flags and the write request cleanup clears both flags, so the fscache
flag is almost superfluous.

The reason it isn't superfluous is because the fscache flag is also used to
indicate that data just read from the server is being written to the cache.
The flag is used to prevent a race involving overlapping direct-I/O writes
to the cache.

Change this to indicate that a page is in need of being copied to the cache
by placing a magic value in folio-&gt;private and marking the folios dirty.
Then when the writeback code sees a folio marked in this way, it only
writes it to the cache and not to the server.

If a folio that has this magic value set is modified, the value is just
replaced and the folio will then be uplodaded too.

With this, PG_fscache is no longer required by the netfslib core, 9p and
afs.

Ceph and nfs, however, still need to use the old PG_fscache-based tracking.
To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the
flags in the netfs_inode struct for those filesystems.  This reenables the
use of PG_fscache in that inode.  9p and afs use the netfslib write helpers
so get switched over; cifs, for the moment, does page-by-page manual access
to the cache, so doesn't use PG_fscache and is unaffected.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
cc: Eric Van Hensbergen &lt;ericvh@kernel.org&gt;
cc: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
cc: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
cc: Christian Schoenebeck &lt;linux_oss@crudebyte.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Steve French &lt;sfrench@samba.org&gt;
cc: Paulo Alcantara &lt;pc@manguebit.com&gt;
cc: Ronnie Sahlberg &lt;ronniesahlberg@gmail.com&gt;
cc: Shyam Prasad N &lt;sprasad@microsoft.com&gt;
cc: Tom Talpey &lt;tom@talpey.com&gt;
cc: Bharath SM &lt;bharathsm@microsoft.com&gt;
cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
cc: Anna Schumaker &lt;anna@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When dirty data is being written to the cache, setting/waiting on/clearing
the fscache flag is always done in tandem with setting/waiting on/clearing
the writeback flag.  The netfslib buffered write routines wait on and set
both flags and the write request cleanup clears both flags, so the fscache
flag is almost superfluous.

The reason it isn't superfluous is because the fscache flag is also used to
indicate that data just read from the server is being written to the cache.
The flag is used to prevent a race involving overlapping direct-I/O writes
to the cache.

Change this to indicate that a page is in need of being copied to the cache
by placing a magic value in folio-&gt;private and marking the folios dirty.
Then when the writeback code sees a folio marked in this way, it only
writes it to the cache and not to the server.

If a folio that has this magic value set is modified, the value is just
replaced and the folio will then be uplodaded too.

With this, PG_fscache is no longer required by the netfslib core, 9p and
afs.

Ceph and nfs, however, still need to use the old PG_fscache-based tracking.
To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the
flags in the netfs_inode struct for those filesystems.  This reenables the
use of PG_fscache in that inode.  9p and afs use the netfslib write helpers
so get switched over; cifs, for the moment, does page-by-page manual access
to the cache, so doesn't use PG_fscache and is unaffected.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
cc: Eric Van Hensbergen &lt;ericvh@kernel.org&gt;
cc: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
cc: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
cc: Christian Schoenebeck &lt;linux_oss@crudebyte.com&gt;
cc: Marc Dionne &lt;marc.dionne@auristor.com&gt;
cc: Ilya Dryomov &lt;idryomov@gmail.com&gt;
cc: Xiubo Li &lt;xiubli@redhat.com&gt;
cc: Steve French &lt;sfrench@samba.org&gt;
cc: Paulo Alcantara &lt;pc@manguebit.com&gt;
cc: Ronnie Sahlberg &lt;ronniesahlberg@gmail.com&gt;
cc: Shyam Prasad N &lt;sprasad@microsoft.com&gt;
cc: Tom Talpey &lt;tom@talpey.com&gt;
cc: Bharath SM &lt;bharathsm@microsoft.com&gt;
cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
cc: Anna Schumaker &lt;anna@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
</pre>
</div>
</content>
</entry>
<entry>
<title>ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE</title>
<updated>2024-04-11T17:17:02+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2024-03-24T22:21:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b372e96bd0a32729d55d27f613c8bc80708a82e1'/>
<id>b372e96bd0a32729d55d27f613c8bc80708a82e1</id>
<content type='text'>
The page has been marked clean before writepage is called.  If we don't
redirty it before postponing the write, it might never get written.

Cc: stable@vger.kernel.org
Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The page has been marked clean before writepage is called.  If we don't
redirty it before postponing the write, it might never get written.

Cc: stable@vger.kernel.org
Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Xiubo Li &lt;xiubli@redhat.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ceph-for-6.8-rc1' of https://github.com/ceph/ceph-client</title>
<updated>2024-01-19T17:58:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-01-19T17:58:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=556e2d17cae620d549c5474b1ece053430cd50bc'/>
<id>556e2d17cae620d549c5474b1ece053430cd50bc</id>
<content type='text'>
Pull ceph updates from Ilya Dryomov:
 "Assorted CephFS fixes and cleanups with nothing standing out"

* tag 'ceph-for-6.8-rc1' of https://github.com/ceph/ceph-client:
  ceph: get rid of passing callbacks in __dentry_leases_walk()
  ceph: d_obtain_{alias,root}(ERR_PTR(...)) will do the right thing
  ceph: fix invalid pointer access if get_quota_realm return ERR_PTR
  ceph: remove duplicated code in ceph_netfs_issue_read()
  ceph: send oldest_client_tid when renewing caps
  ceph: rename create_session_open_msg() to create_session_full_msg()
  ceph: select FS_ENCRYPTION_ALGS if FS_ENCRYPTION
  ceph: fix deadlock or deadcode of misusing dget()
  ceph: try to allocate a smaller extent map for sparse read
  libceph: remove MAX_EXTENTS check for sparse reads
  ceph: reinitialize mds feature bit even when session in open
  ceph: skip reconnecting if MDS is not ready
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull ceph updates from Ilya Dryomov:
 "Assorted CephFS fixes and cleanups with nothing standing out"

* tag 'ceph-for-6.8-rc1' of https://github.com/ceph/ceph-client:
  ceph: get rid of passing callbacks in __dentry_leases_walk()
  ceph: d_obtain_{alias,root}(ERR_PTR(...)) will do the right thing
  ceph: fix invalid pointer access if get_quota_realm return ERR_PTR
  ceph: remove duplicated code in ceph_netfs_issue_read()
  ceph: send oldest_client_tid when renewing caps
  ceph: rename create_session_open_msg() to create_session_full_msg()
  ceph: select FS_ENCRYPTION_ALGS if FS_ENCRYPTION
  ceph: fix deadlock or deadcode of misusing dget()
  ceph: try to allocate a smaller extent map for sparse read
  libceph: remove MAX_EXTENTS check for sparse reads
  ceph: reinitialize mds feature bit even when session in open
  ceph: skip reconnecting if MDS is not ready
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-01-19T17:10:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-01-19T17:10:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=16df6e07d6a88dc3049a5674654ed44dfbc74d81'/>
<id>16df6e07d6a88dc3049a5674654ed44dfbc74d81</id>
<content type='text'>
Pull netfs updates from Christian Brauner:
 "This extends the netfs helper library that network filesystems can use
  to replace their own implementations. Both afs and 9p are ported. cifs
  is ready as well but the patches are way bigger and will be routed
  separately once this is merged. That will remove lots of code as well.

  The overal goal is to get high-level I/O and knowledge of the page
  cache and ouf of the filesystem drivers. This includes knowledge about
  the existence of pages and folios

  The pull request converts afs and 9p. This removes about 800 lines of
  code from afs and 300 from 9p. For 9p it is now possible to do writes
  in larger than a page chunks. Additionally, multipage folio support
  can be turned on for 9p. Separate patches exist for cifs removing
  another 2000+ lines. I've included detailed information in the
  individual pulls I took.

  Summary:

   - Add NFS-style (and Ceph-style) locking around DIO vs buffered I/O
     calls to prevent these from happening at the same time.

   - Support for direct and unbuffered I/O.

   - Support for write-through caching in the page cache.

   - O_*SYNC and RWF_*SYNC writes use write-through rather than writing
     to the page cache and then flushing afterwards.

   - Support for write-streaming.

   - Support for write grouping.

   - Skip reads for which the server could only return zeros or EOF.

   - The fscache module is now part of the netfs library and the
     corresponding maintainer entry is updated.

   - Some helpers from the fscache subsystem are renamed to mark them as
     belonging to the netfs library.

   - Follow-up fixes for the netfs library.

   - Follow-up fixes for the 9p conversion"

* tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (50 commits)
  netfs: Fix wrong #ifdef hiding wait
  cachefiles: Fix signed/unsigned mixup
  netfs: Fix the loop that unmarks folios after writing to the cache
  netfs: Fix interaction between write-streaming and cachefiles culling
  netfs: Count DIO writes
  netfs: Mark netfs_unbuffered_write_iter_locked() static
  netfs: Fix proc/fs/fscache symlink to point to "netfs" not "../netfs"
  netfs: Rearrange netfs_io_subrequest to put request pointer first
  9p: Use length of data written to the server in preference to error
  9p: Do a couple of cleanups
  9p: Fix initialisation of netfs_inode for 9p
  cachefiles: Fix __cachefiles_prepare_write()
  9p: Use netfslib read/write_iter
  afs: Use the netfs write helpers
  netfs: Export the netfs_sreq tracepoint
  netfs: Optimise away reads above the point at which there can be no data
  netfs: Implement a write-through caching option
  netfs: Provide a launder_folio implementation
  netfs: Provide a writepages implementation
  netfs, cachefiles: Pass upper bound length to allow expansion
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull netfs updates from Christian Brauner:
 "This extends the netfs helper library that network filesystems can use
  to replace their own implementations. Both afs and 9p are ported. cifs
  is ready as well but the patches are way bigger and will be routed
  separately once this is merged. That will remove lots of code as well.

  The overal goal is to get high-level I/O and knowledge of the page
  cache and ouf of the filesystem drivers. This includes knowledge about
  the existence of pages and folios

  The pull request converts afs and 9p. This removes about 800 lines of
  code from afs and 300 from 9p. For 9p it is now possible to do writes
  in larger than a page chunks. Additionally, multipage folio support
  can be turned on for 9p. Separate patches exist for cifs removing
  another 2000+ lines. I've included detailed information in the
  individual pulls I took.

  Summary:

   - Add NFS-style (and Ceph-style) locking around DIO vs buffered I/O
     calls to prevent these from happening at the same time.

   - Support for direct and unbuffered I/O.

   - Support for write-through caching in the page cache.

   - O_*SYNC and RWF_*SYNC writes use write-through rather than writing
     to the page cache and then flushing afterwards.

   - Support for write-streaming.

   - Support for write grouping.

   - Skip reads for which the server could only return zeros or EOF.

   - The fscache module is now part of the netfs library and the
     corresponding maintainer entry is updated.

   - Some helpers from the fscache subsystem are renamed to mark them as
     belonging to the netfs library.

   - Follow-up fixes for the netfs library.

   - Follow-up fixes for the 9p conversion"

* tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (50 commits)
  netfs: Fix wrong #ifdef hiding wait
  cachefiles: Fix signed/unsigned mixup
  netfs: Fix the loop that unmarks folios after writing to the cache
  netfs: Fix interaction between write-streaming and cachefiles culling
  netfs: Count DIO writes
  netfs: Mark netfs_unbuffered_write_iter_locked() static
  netfs: Fix proc/fs/fscache symlink to point to "netfs" not "../netfs"
  netfs: Rearrange netfs_io_subrequest to put request pointer first
  9p: Use length of data written to the server in preference to error
  9p: Do a couple of cleanups
  9p: Fix initialisation of netfs_inode for 9p
  cachefiles: Fix __cachefiles_prepare_write()
  9p: Use netfslib read/write_iter
  afs: Use the netfs write helpers
  netfs: Export the netfs_sreq tracepoint
  netfs: Optimise away reads above the point at which there can be no data
  netfs: Implement a write-through caching option
  netfs: Provide a launder_folio implementation
  netfs: Provide a writepages implementation
  netfs, cachefiles: Pass upper bound length to allow expansion
  ...
</pre>
</div>
</content>
</entry>
</feed>
