<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs/fuse/inode.c, branch linux-6.3.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>Merge tag 'fuse-update-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse</title>
<updated>2023-02-27T17:53:58+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-27T17:53:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d40b2f4c94f221bd5aab205f945e6f88d3df0929'/>
<id>d40b2f4c94f221bd5aab205f945e6f88d3df0929</id>
<content type='text'>
Pull fuse updates from Miklos Szeredi:

 - Fix regression in fileattr permission checking

 - Fix possible hang during PID namespace destruction

 - Add generic support for request extensions

 - Add supplementary group list extension

 - Add limited support for supplying supplementary groups in create
   requests

 - Documentation fixes

* tag 'fuse-update-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add inode/permission checks to fileattr_get/fileattr_set
  fuse: fix all W=1 kernel-doc warnings
  fuse: in fuse_flush only wait if someone wants the return code
  fuse: optional supplementary group in create requests
  fuse: add request extension
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull fuse updates from Miklos Szeredi:

 - Fix regression in fileattr permission checking

 - Fix possible hang during PID namespace destruction

 - Add generic support for request extensions

 - Add supplementary group list extension

 - Add limited support for supplying supplementary groups in create
   requests

 - Documentation fixes

* tag 'fuse-update-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add inode/permission checks to fileattr_get/fileattr_set
  fuse: fix all W=1 kernel-doc warnings
  fuse: in fuse_flush only wait if someone wants the return code
  fuse: optional supplementary group in create requests
  fuse: add request extension
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: optional supplementary group in create requests</title>
<updated>2023-01-26T16:10:38+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2022-11-10T14:46:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8ed7cb3f279fe67a93f407ee2ec3ea661a483a65'/>
<id>8ed7cb3f279fe67a93f407ee2ec3ea661a483a65</id>
<content type='text'>
Permission to create an object (create, mkdir, symlink, mknod) needs to
take supplementary groups into account.

Add a supplementary group request extension.  This can contain an arbitrary
number of group IDs and can be added to any request.  This extension is not
added to any request by default.

Add FUSE_CREATE_SUPP_GROUP init flag to enable supplementary group info in
creation requests.  This adds just a single supplementary group that
matches the parent group in the case described above.  In other cases the
extension is not added.

Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Permission to create an object (create, mkdir, symlink, mknod) needs to
take supplementary groups into account.

Add a supplementary group request extension.  This can contain an arbitrary
number of group IDs and can be added to any request.  This extension is not
added to any request by default.

Add FUSE_CREATE_SUPP_GROUP init flag to enable supplementary group info in
creation requests.  This adds just a single supplementary group that
matches the parent group in the case described above.  In other cases the
extension is not added.

Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: fixes after adapting to new posix acl api</title>
<updated>2023-01-24T15:33:37+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-01-20T11:55:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=facd61053cff100973921d4d45d47cf53c747ec6'/>
<id>facd61053cff100973921d4d45d47cf53c747ec6</id>
<content type='text'>
This cycle we ported all filesystems to the new posix acl api. While
looking at further simplifications in this area to remove the last
remnants of the generic dummy posix acl handlers we realized that we
regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
of posix acls.

With the change to a dedicated posix acl api interacting with posix acls
doesn't go through the old xattr codepaths anymore and instead only
relies the get acl and set acl inode operations.

Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
to get and set posix acl albeit with two caveats. First, that posix acls
aren't cached. And second, that they aren't used for permission checking
in the vfs.

We regressed that use-case as we currently refuse to retrieve any posix
acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
would see a change in behavior.

We can restore the old behavior in multiple ways. We could change the
new posix acl api and look for a dedicated xattr handler and if we find
one prefer that over the dedicated posix acl api. That would break the
consistency of the new posix acl api so we would very much prefer not to
do that.

We could introduce a new ACL_*_CACHE sentinel that would instruct the
vfs permission checking codepath to not call into the filesystem and
ignore acls.

But a more straightforward fix for v6.2 is to do the same thing that
Overlayfs does and give fuse a separate get acl method for permission
checking. Overlayfs uses this to express different needs for vfs
permission lookup and acl based retrieval via the regular system call
path as well. Let fuse do the same for now. This way fuse can continue
to refuse to retrieve posix acls for daemons that don't set
FUSE_POSXI_ACL for permission checking while allowing a fuse server to
retrieve it via the usual system calls.

In the future, we could extend the get acl inode operation to not just
pass a simple boolean to indicate rcu lookup but instead make it a flag
argument. Then in addition to passing the information that this is an
rcu lookup to the filesystem we could also introduce a flag that tells
the filesystem that this is a request from the vfs to use these acls for
permission checking. Then fuse could refuse the get acl request for
permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
the same get acl method. This would also help Overlayfs and allow us to
remove the second method for it as well.

But since that change is more invasive as we need to update the get acl
inode operation for multiple filesystems we should not do this as a fix
for v6.2. Instead we will do this for the v6.3 merge window.

Fwiw, since posix acls are now always correctly translated in the new
posix acl api we could also allow them to be used for daemons without
FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
change and again if dones should be done for v6.3. For now, let's just
restore the original behavior.

A nice side-effect of this change is that for fuse daemons with and
without FUSE_POSIX_ACL the same code is used for posix acls in a
backwards compatible way. This also means we can remove the legacy xattr
handlers completely. We've also added comments to explain the expected
behavior for daemons without FUSE_POSIX_ACL into the code.

Fixes: 318e66856dde ("xattr: use posix acl api")
Signed-off-by: Seth Forshee (Digital Ocean) &lt;sforshee@kernel.org&gt;
Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This cycle we ported all filesystems to the new posix acl api. While
looking at further simplifications in this area to remove the last
remnants of the generic dummy posix acl handlers we realized that we
regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
of posix acls.

With the change to a dedicated posix acl api interacting with posix acls
doesn't go through the old xattr codepaths anymore and instead only
relies the get acl and set acl inode operations.

Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
to get and set posix acl albeit with two caveats. First, that posix acls
aren't cached. And second, that they aren't used for permission checking
in the vfs.

We regressed that use-case as we currently refuse to retrieve any posix
acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
would see a change in behavior.

We can restore the old behavior in multiple ways. We could change the
new posix acl api and look for a dedicated xattr handler and if we find
one prefer that over the dedicated posix acl api. That would break the
consistency of the new posix acl api so we would very much prefer not to
do that.

We could introduce a new ACL_*_CACHE sentinel that would instruct the
vfs permission checking codepath to not call into the filesystem and
ignore acls.

But a more straightforward fix for v6.2 is to do the same thing that
Overlayfs does and give fuse a separate get acl method for permission
checking. Overlayfs uses this to express different needs for vfs
permission lookup and acl based retrieval via the regular system call
path as well. Let fuse do the same for now. This way fuse can continue
to refuse to retrieve posix acls for daemons that don't set
FUSE_POSXI_ACL for permission checking while allowing a fuse server to
retrieve it via the usual system calls.

In the future, we could extend the get acl inode operation to not just
pass a simple boolean to indicate rcu lookup but instead make it a flag
argument. Then in addition to passing the information that this is an
rcu lookup to the filesystem we could also introduce a flag that tells
the filesystem that this is a request from the vfs to use these acls for
permission checking. Then fuse could refuse the get acl request for
permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
the same get acl method. This would also help Overlayfs and allow us to
remove the second method for it as well.

But since that change is more invasive as we need to update the get acl
inode operation for multiple filesystems we should not do this as a fix
for v6.2. Instead we will do this for the v6.3 merge window.

Fwiw, since posix acls are now always correctly translated in the new
posix acl api we could also allow them to be used for daemons without
FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
change and again if dones should be done for v6.3. For now, let's just
restore the original behavior.

A nice side-effect of this change is that for fuse daemons with and
without FUSE_POSIX_ACL the same code is used for posix acls in a
backwards compatible way. This also means we can remove the legacy xattr
handlers completely. We've also added comments to explain the expected
behavior for daemons without FUSE_POSIX_ACL into the code.

Fixes: 318e66856dde ("xattr: use posix acl api")
Signed-off-by: Seth Forshee (Digital Ocean) &lt;sforshee@kernel.org&gt;
Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: retire block-device-based superblock on force unmount</title>
<updated>2022-07-27T09:30:31+00:00</updated>
<author>
<name>Daniil Lunev</name>
<email>dlunev@chromium.org</email>
</author>
<published>2022-07-27T06:44:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=247861c325c2e4f5ad3c2f9a77ab9d85d15cbcfc'/>
<id>247861c325c2e4f5ad3c2f9a77ab9d85d15cbcfc</id>
<content type='text'>
Force unmount of FUSE severes the connection with the user space, even if
there are still open files.  Subsequent remount tries to re-use the
superblock held by the open files, which is meaningless in the FUSE case
after disconnect - reused super block doesn't have userspace counterpart
attached to it and is incapable of doing any IO.

This patch adds the functionality only for the block-device-based supers,
since the primary use case of the feature is to gracefully handle force
unmount of external devices, mounted with FUSE.  This can be further
extended to cover all superblocks, if the need arises.

Signed-off-by: Daniil Lunev &lt;dlunev@chromium.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Force unmount of FUSE severes the connection with the user space, even if
there are still open files.  Subsequent remount tries to re-use the
superblock held by the open files, which is meaningless in the FUSE case
after disconnect - reused super block doesn't have userspace counterpart
attached to it and is incapable of doing any IO.

This patch adds the functionality only for the block-device-based supers,
since the primary use case of the feature is to gracefully handle force
unmount of external devices, mounted with FUSE.  This can be further
extended to cover all superblocks, if the need arises.

Signed-off-by: Daniil Lunev &lt;dlunev@chromium.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: limit nsec</title>
<updated>2022-07-21T14:06:18+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2022-07-21T14:06:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=47912eaa061a6a81e4aa790591a1874c650733c0'/>
<id>47912eaa061a6a81e4aa790591a1874c650733c0</id>
<content type='text'>
Limit nanoseconds to 0..999999999.

Fixes: d8a5ba45457e ("[PATCH] FUSE - core")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Limit nanoseconds to 0..999999999.

Fixes: d8a5ba45457e ("[PATCH] FUSE - core")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: allocate inode by using alloc_inode_sb()</title>
<updated>2022-03-22T22:57:03+00:00</updated>
<author>
<name>Muchun Song</name>
<email>songmuchun@bytedance.com</email>
</author>
<published>2022-03-22T21:41:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fd60b28842df833477c42da6a6d63d0d114a5fcc'/>
<id>fd60b28842df833477c42da6a6d63d0d114a5fcc</id>
<content type='text'>
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Acked-by: Theodore Ts'o &lt;tytso@mit.edu&gt;		[ext4]
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Alex Shi &lt;alexs@kernel.org&gt;
Cc: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
Cc: Chao Yu &lt;chao@kernel.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Fam Zheng &lt;fam.zheng@bytedance.com&gt;
Cc: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kari Argillander &lt;kari.argillander@gmail.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Xiongchun Duan &lt;duanxiongchun@bytedance.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Acked-by: Theodore Ts'o &lt;tytso@mit.edu&gt;		[ext4]
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Alex Shi &lt;alexs@kernel.org&gt;
Cc: Anna Schumaker &lt;Anna.Schumaker@Netapp.com&gt;
Cc: Chao Yu &lt;chao@kernel.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Fam Zheng &lt;fam.zheng@bytedance.com&gt;
Cc: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kari Argillander &lt;kari.argillander@gmail.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Xiongchun Duan &lt;duanxiongchun@bytedance.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: move FUSE_SUPER_MAGIC definition to magic.h</title>
<updated>2022-02-21T13:57:26+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-01-10T23:52:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c086df4902573e2f06c6a2a83452c13a8bc603f5'/>
<id>c086df4902573e2f06c6a2a83452c13a8bc603f5</id>
<content type='text'>
...to help userland apps that need to identify FUSE mounts.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
...to help userland apps that need to identify FUSE mounts.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: mark inode DONT_CACHE when per inode DAX hint changes</title>
<updated>2021-12-14T10:09:37+00:00</updated>
<author>
<name>Jeffle Xu</name>
<email>jefflexu@linux.alibaba.com</email>
</author>
<published>2021-11-25T07:05:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c3cb6f935e322fa183988032e318b293d9e4fe53'/>
<id>c3cb6f935e322fa183988032e318b293d9e4fe53</id>
<content type='text'>
When the per inode DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per inode DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per inode DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later.

Signed-off-by: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Reviewed-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the per inode DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per inode DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per inode DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later.

Signed-off-by: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Reviewed-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: negotiate per inode DAX in FUSE_INIT</title>
<updated>2021-12-14T10:09:37+00:00</updated>
<author>
<name>Jeffle Xu</name>
<email>jefflexu@linux.alibaba.com</email>
</author>
<published>2021-11-25T07:05:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2ee019fadcca343c3deea6a1767965bdf23fc3d0'/>
<id>2ee019fadcca343c3deea6a1767965bdf23fc3d0</id>
<content type='text'>
Among the FUSE_INIT phase, client shall advertise per inode DAX if it's
mounted with "dax=inode". Then server is aware that client is in per
inode DAX mode, and will construct per-inode DAX attribute accordingly.

Server shall also advertise support for per inode DAX. If server doesn't
support it while client is mounted with "dax=inode", client will
silently fallback to "dax=never" since "dax=inode" is advisory only.

Signed-off-by: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Reviewed-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Among the FUSE_INIT phase, client shall advertise per inode DAX if it's
mounted with "dax=inode". Then server is aware that client is in per
inode DAX mode, and will construct per-inode DAX attribute accordingly.

Server shall also advertise support for per inode DAX. If server doesn't
support it while client is mounted with "dax=inode", client will
silently fallback to "dax=never" since "dax=inode" is advisory only.

Signed-off-by: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Reviewed-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: enable per inode DAX</title>
<updated>2021-12-14T10:09:37+00:00</updated>
<author>
<name>Jeffle Xu</name>
<email>jefflexu@linux.alibaba.com</email>
</author>
<published>2021-11-25T07:05:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=93a497b9ad695bb2f38a302c5b29dbc9b555ff3f'/>
<id>93a497b9ad695bb2f38a302c5b29dbc9b555ff3f</id>
<content type='text'>
DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per inode DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.

The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
DAX should be enabled or not for corresponding file. Currently the state
whether DAX is enabled or not for the file is initialized only when
inode is instantiated.

Signed-off-by: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Reviewed-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per inode DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.

The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
DAX should be enabled or not for corresponding file. Currently the state
whether DAX is enabled or not for the file is initialized only when
inode is instantiated.

Signed-off-by: Jeffle Xu &lt;jefflexu@linux.alibaba.com&gt;
Reviewed-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
