<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/fuse, branch v5.2-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>treewide: Add SPDX license identifier - Makefile/Kconfig</title>
<updated>2019-05-21T08:50:46+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-19T12:07:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ec8f24b7faaf3d4799a7c3f4c1b87f6b02778ad1'/>
<id>ec8f24b7faaf3d4799a7c3f4c1b87f6b02778ad1</id>
<content type='text'>
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse</title>
<updated>2019-05-14T15:59:14+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-05-14T15:59:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4856118f4953627e9a087253766b9e7361f5f4a0'/>
<id>4856118f4953627e9a087253766b9e7361f5f4a0</id>
<content type='text'>
Pull fuse update from Miklos Szeredi:
 "Add more caching controls for userspace filesystems to use, as well as
  bug fixes and cleanups"

* tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: clean up fuse_alloc_inode
  fuse: Add ioctl flag for x32 compat ioctl
  fuse: Convert fusectl to use the new mount API
  fuse: fix changelog entry for protocol 7.9
  fuse: fix changelog entry for protocol 7.12
  fuse: document fuse_fsync_in.fsync_flags
  fuse: Add FOPEN_STREAM to use stream_open()
  fuse: require /dev/fuse reads to have enough buffer capacity
  fuse: retrieve: cap requested size to negotiated max_write
  fuse: allow filesystems to have precise control over data cache
  fuse: convert printk -&gt; pr_*
  fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
  fuse: fix writepages on 32bit
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull fuse update from Miklos Szeredi:
 "Add more caching controls for userspace filesystems to use, as well as
  bug fixes and cleanups"

* tag 'fuse-update-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: clean up fuse_alloc_inode
  fuse: Add ioctl flag for x32 compat ioctl
  fuse: Convert fusectl to use the new mount API
  fuse: fix changelog entry for protocol 7.9
  fuse: fix changelog entry for protocol 7.12
  fuse: document fuse_fsync_in.fsync_flags
  fuse: Add FOPEN_STREAM to use stream_open()
  fuse: require /dev/fuse reads to have enough buffer capacity
  fuse: retrieve: cap requested size to negotiated max_write
  fuse: allow filesystems to have precise control over data cache
  fuse: convert printk -&gt; pr_*
  fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
  fuse: fix writepages on 32bit
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: clean up fuse_alloc_inode</title>
<updated>2019-05-08T11:58:29+00:00</updated>
<author>
<name>zhangliguang</name>
<email>zhangliguang@linux.alibaba.com</email>
</author>
<published>2019-05-06T08:52:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9031a69cf9f024a3040c0ed8b8ab01aecd196388'/>
<id>9031a69cf9f024a3040c0ed8b8ab01aecd196388</id>
<content type='text'>
This patch cleans up fuse_alloc_inode function, just simply the code, no
logic change.

Signed-off-by: zhangliguang &lt;zhangliguang@linux.alibaba.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch cleans up fuse_alloc_inode function, just simply the code, no
logic change.

Signed-off-by: zhangliguang &lt;zhangliguang@linux.alibaba.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2019-05-07T17:57:05+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-05-07T17:57:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=168e153d5ebbdd6a3fa85db1cc4879ed4b7030e0'/>
<id>168e153d5ebbdd6a3fa85db1cc4879ed4b7030e0</id>
<content type='text'>
Pull vfs inode freeing updates from Al Viro:
 "Introduction of separate method for RCU-delayed part of
  -&gt;destroy_inode() (if any).

  Pretty much as posted, except that destroy_inode() stashes
  -&gt;free_inode into the victim (anon-unioned with -&gt;i_fops) before
  scheduling i_callback() and the last two patches (sockfs conversion
  and folding struct socket_wq into struct socket) are excluded - that
  pair should go through netdev once davem reopens his tree"

* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
  orangefs: make use of -&gt;free_inode()
  shmem: make use of -&gt;free_inode()
  hugetlb: make use of -&gt;free_inode()
  overlayfs: make use of -&gt;free_inode()
  jfs: switch to -&gt;free_inode()
  fuse: switch to -&gt;free_inode()
  ext4: make use of -&gt;free_inode()
  ecryptfs: make use of -&gt;free_inode()
  ceph: use -&gt;free_inode()
  btrfs: use -&gt;free_inode()
  afs: switch to use of -&gt;free_inode()
  dax: make use of -&gt;free_inode()
  ntfs: switch to -&gt;free_inode()
  securityfs: switch to -&gt;free_inode()
  apparmor: switch to -&gt;free_inode()
  rpcpipe: switch to -&gt;free_inode()
  bpf: switch to -&gt;free_inode()
  mqueue: switch to -&gt;free_inode()
  ufs: switch to -&gt;free_inode()
  coda: switch to -&gt;free_inode()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull vfs inode freeing updates from Al Viro:
 "Introduction of separate method for RCU-delayed part of
  -&gt;destroy_inode() (if any).

  Pretty much as posted, except that destroy_inode() stashes
  -&gt;free_inode into the victim (anon-unioned with -&gt;i_fops) before
  scheduling i_callback() and the last two patches (sockfs conversion
  and folding struct socket_wq into struct socket) are excluded - that
  pair should go through netdev once davem reopens his tree"

* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
  orangefs: make use of -&gt;free_inode()
  shmem: make use of -&gt;free_inode()
  hugetlb: make use of -&gt;free_inode()
  overlayfs: make use of -&gt;free_inode()
  jfs: switch to -&gt;free_inode()
  fuse: switch to -&gt;free_inode()
  ext4: make use of -&gt;free_inode()
  ecryptfs: make use of -&gt;free_inode()
  ceph: use -&gt;free_inode()
  btrfs: use -&gt;free_inode()
  afs: switch to use of -&gt;free_inode()
  dax: make use of -&gt;free_inode()
  ntfs: switch to -&gt;free_inode()
  securityfs: switch to -&gt;free_inode()
  apparmor: switch to -&gt;free_inode()
  rpcpipe: switch to -&gt;free_inode()
  bpf: switch to -&gt;free_inode()
  mqueue: switch to -&gt;free_inode()
  ufs: switch to -&gt;free_inode()
  coda: switch to -&gt;free_inode()
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: switch to -&gt;free_inode()</title>
<updated>2019-05-02T02:43:26+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2019-04-15T23:37:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9baf28bbfea165a62211dd90986d993d77631372'/>
<id>9baf28bbfea165a62211dd90986d993d77631372</id>
<content type='text'>
fuse_destroy_inode() is gone - sanity checks that need the stack
trace of the caller get moved into -&gt;evict_inode(), the rest joins
the RCU-delayed part which becomes -&gt;free_inode().

While we are at it, don't just pass the address of what happens
to be the first member of structure to kmem_cache_free() -
get_fuse_inode() is there for purpose and it gives the proper
container_of() use.  No behaviour change, but verifying correctness
is easier that way.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fuse_destroy_inode() is gone - sanity checks that need the stack
trace of the caller get moved into -&gt;evict_inode(), the rest joins
the RCU-delayed part which becomes -&gt;free_inode().

While we are at it, don't just pass the address of what happens
to be the first member of structure to kmem_cache_free() -
get_fuse_inode() is there for purpose and it gives the proper
container_of() use.  No behaviour change, but verifying correctness
is easier that way.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: Add ioctl flag for x32 compat ioctl</title>
<updated>2019-04-24T15:05:07+00:00</updated>
<author>
<name>Ian Abbott</name>
<email>abbotti@mev.co.uk</email>
</author>
<published>2019-04-24T14:14:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6407f44aaf2a39b5ccbb1cc1d342b906dcfa8a87'/>
<id>6407f44aaf2a39b5ccbb1cc1d342b906dcfa8a87</id>
<content type='text'>
Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl
request comes from a process running a 32-bit ABI, but cannot tell whether
the requesting process is using legacy IA32 emulation or x32 ABI.  In
particular, the server does not know the size of the client process's
`time_t` type.

For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags
are currently set in the ioctl input request (`struct fuse_ioctl_in` member
`flags`) for a 32-bit requesting process.  This patch defines a new flag
`FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is
using the x32 ABI.  This allows the server process to distinguish between
requests coming from client processes using IA32 emulation or the x32 ABI
and so infer the size of the client process's `time_t` type and any other
IA32/x32 differences.

Signed-off-by: Ian Abbott &lt;abbotti@mev.co.uk&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl
request comes from a process running a 32-bit ABI, but cannot tell whether
the requesting process is using legacy IA32 emulation or x32 ABI.  In
particular, the server does not know the size of the client process's
`time_t` type.

For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags
are currently set in the ioctl input request (`struct fuse_ioctl_in` member
`flags`) for a 32-bit requesting process.  This patch defines a new flag
`FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is
using the x32 ABI.  This allows the server process to distinguish between
requests coming from client processes using IA32 emulation or the x32 ABI
and so infer the size of the client process's `time_t` type and any other
IA32/x32 differences.

Signed-off-by: Ian Abbott &lt;abbotti@mev.co.uk&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: Convert fusectl to use the new mount API</title>
<updated>2019-04-24T15:05:07+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2019-03-27T23:44:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=29cc02d949b19fdeba9de9f54b2641005f5865c6'/>
<id>29cc02d949b19fdeba9de9f54b2641005f5865c6</id>
<content type='text'>
Convert the fusectl filesystem to the new internal mount API as the old
one will be obsoleted and removed.  This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert the fusectl filesystem to the new internal mount API as the old
one will be obsoleted and removed.  This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: document fuse_fsync_in.fsync_flags</title>
<updated>2019-04-24T15:05:07+00:00</updated>
<author>
<name>Alan Somers</name>
<email>asomers@FreeBSD.org</email>
</author>
<published>2019-04-19T21:42:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=154603fe3ec4620a4c229a127ddbccf5c69f9463'/>
<id>154603fe3ec4620a4c229a127ddbccf5c69f9463</id>
<content type='text'>
The FUSE_FSYNC_DATASYNC flag was introduced by commit b6aeadeda22a
("[PATCH] FUSE - file operations") as a magic number.  No new values have
been added to fsync_flags since.

Signed-off-by: Alan Somers &lt;asomers@FreeBSD.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The FUSE_FSYNC_DATASYNC flag was introduced by commit b6aeadeda22a
("[PATCH] FUSE - file operations") as a magic number.  No new values have
been added to fsync_flags since.

Signed-off-by: Alan Somers &lt;asomers@FreeBSD.org&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: Add FOPEN_STREAM to use stream_open()</title>
<updated>2019-04-24T15:05:07+00:00</updated>
<author>
<name>Kirill Smelkov</name>
<email>kirr@nexedi.com</email>
</author>
<published>2019-04-24T07:13:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bbd84f33652f852ce5992d65db4d020aba21f882'/>
<id>bbd84f33652f852ce5992d65db4d020aba21f882</id>
<content type='text'>
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client.  See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.

To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

so if we would do such a change it will break a real user.

Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.

This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be &lt; v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov &lt;kirr@nexedi.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client.  See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.

To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

so if we would do such a change it will break a real user.

Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.

This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be &lt; v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov &lt;kirr@nexedi.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fuse: require /dev/fuse reads to have enough buffer capacity</title>
<updated>2019-04-24T15:05:07+00:00</updated>
<author>
<name>Kirill Smelkov</name>
<email>kirr@nexedi.com</email>
</author>
<published>2019-03-27T10:15:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d4b13963f217dd947da5c0cabd1569e914d21699'/>
<id>d4b13963f217dd947da5c0cabd1569e914d21699</id>
<content type='text'>
A FUSE filesystem server queues /dev/fuse sys_read calls to get
filesystem requests to handle. It does not know in advance what would be
that request as it can be anything that client issues - LOOKUP, READ,
WRITE, ... Many requests are short and retrieve data from the
filesystem. However WRITE and NOTIFY_REPLY write data into filesystem.

Before getting into operation phase, FUSE filesystem server and kernel
client negotiate what should be the maximum write size the client will
ever issue. After negotiation the contract in between server/client is
that the filesystem server then should queue /dev/fuse sys_read calls with
enough buffer capacity to receive any client request - WRITE in
particular, while FUSE client should not, in particular, send WRITE
requests with &gt; negotiated max_write payload. FUSE client in kernel and
libfuse historically reserve 4K for request header. This way the
contract is that filesystem server should queue sys_reads with
4K+max_write buffer.

If the filesystem server does not follow this contract, what can happen
is that fuse_dev_do_read will see that request size is &gt; buffer size,
and then it will return EIO to client who issued the request but won't
indicate in any way that there is a problem to filesystem server.
This can be hard to diagnose because for some requests, e.g. for
NOTIFY_REPLY which mimics WRITE, there is no client thread that is
waiting for request completion and that EIO goes nowhere, while on
filesystem server side things look like the kernel is not replying back
after successful NOTIFY_RETRIEVE request made by the server.

We can make the problem easy to diagnose if we indicate via error return to
filesystem server when it is violating the contract.  This should not
practically cause problems because if a filesystem server is using shorter
buffer, writes to it were already very likely to cause EIO, and if the
filesystem is read-only it should be too following FUSE_MIN_READ_BUFFER
minimum buffer size.

Please see [1] for context where the problem of stuck filesystem was hit
for real (because kernel client was incorrectly sending more than
max_write data with NOTIFY_REPLY; see also previous patch), how the
situation was traced and for more involving patch that did not make it
into the tree.

[1] https://marc.info/?l=linux-fsdevel&amp;m=155057023600853&amp;w=2

Signed-off-by: Kirill Smelkov &lt;kirr@nexedi.com&gt;
Cc: Han-Wen Nienhuys &lt;hanwen@google.com&gt;
Cc: Jakob Unterwurzacher &lt;jakobunt@gmail.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A FUSE filesystem server queues /dev/fuse sys_read calls to get
filesystem requests to handle. It does not know in advance what would be
that request as it can be anything that client issues - LOOKUP, READ,
WRITE, ... Many requests are short and retrieve data from the
filesystem. However WRITE and NOTIFY_REPLY write data into filesystem.

Before getting into operation phase, FUSE filesystem server and kernel
client negotiate what should be the maximum write size the client will
ever issue. After negotiation the contract in between server/client is
that the filesystem server then should queue /dev/fuse sys_read calls with
enough buffer capacity to receive any client request - WRITE in
particular, while FUSE client should not, in particular, send WRITE
requests with &gt; negotiated max_write payload. FUSE client in kernel and
libfuse historically reserve 4K for request header. This way the
contract is that filesystem server should queue sys_reads with
4K+max_write buffer.

If the filesystem server does not follow this contract, what can happen
is that fuse_dev_do_read will see that request size is &gt; buffer size,
and then it will return EIO to client who issued the request but won't
indicate in any way that there is a problem to filesystem server.
This can be hard to diagnose because for some requests, e.g. for
NOTIFY_REPLY which mimics WRITE, there is no client thread that is
waiting for request completion and that EIO goes nowhere, while on
filesystem server side things look like the kernel is not replying back
after successful NOTIFY_RETRIEVE request made by the server.

We can make the problem easy to diagnose if we indicate via error return to
filesystem server when it is violating the contract.  This should not
practically cause problems because if a filesystem server is using shorter
buffer, writes to it were already very likely to cause EIO, and if the
filesystem is read-only it should be too following FUSE_MIN_READ_BUFFER
minimum buffer size.

Please see [1] for context where the problem of stuck filesystem was hit
for real (because kernel client was incorrectly sending more than
max_write data with NOTIFY_REPLY; see also previous patch), how the
situation was traced and for more involving patch that did not make it
into the tree.

[1] https://marc.info/?l=linux-fsdevel&amp;m=155057023600853&amp;w=2

Signed-off-by: Kirill Smelkov &lt;kirr@nexedi.com&gt;
Cc: Han-Wen Nienhuys &lt;hanwen@google.com&gt;
Cc: Jakob Unterwurzacher &lt;jakobunt@gmail.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
