<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/aio.c, branch v4.9</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>aio: fix freeze protection of aio writes</title>
<updated>2016-10-30T17:09:42+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-10-30T16:42:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=70fe2f48152e60664809e2fed76bbb50c9fa2aa3'/>
<id>70fe2f48152e60664809e2fed76bbb50c9fa2aa3</id>
<content type='text'>
Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i&lt;60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &amp;
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite

Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().

Reported-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
[hch: forward ported on top of various VFS and aio changes]
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i&lt;60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &amp;
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite

Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().

Reported-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
[hch: forward ported on top of various VFS and aio changes]
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: remove aio_run_iocb</title>
<updated>2016-10-30T17:09:42+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2016-10-30T16:42:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=89319d31d2d097da8e27fb0e0ae9d532f4f16827'/>
<id>89319d31d2d097da8e27fb0e0ae9d532f4f16827</id>
<content type='text'>
Pass the ABI iocb structure to aio_setup_rw and let it handle the
non-vectored I/O case as well.  With that and a new helper for the AIO
return value handling we can now define new aio_read and aio_write
helpers that implement reads and writes in a self-contained way without
duplicating too much code.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pass the ABI iocb structure to aio_setup_rw and let it handle the
non-vectored I/O case as well.  With that and a new helper for the AIO
return value handling we can now define new aio_read and aio_write
helpers that implement reads and writes in a self-contained way without
duplicating too much code.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: remove the never implemented aio_fsync file operation</title>
<updated>2016-10-30T17:09:42+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2016-10-30T16:42:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=723c038475b78edc9327eb952f95f9881cc9d79d'/>
<id>723c038475b78edc9327eb952f95f9881cc9d79d</id>
<content type='text'>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: hold an extra file reference over AIO read/write operations</title>
<updated>2016-10-30T17:09:42+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2016-10-30T16:42:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0b944d3a4bba6b25f43aed530f4fa85c04d162a6'/>
<id>0b944d3a4bba6b25f43aed530f4fa85c04d162a6</id>
<content type='text'>
Otherwise we might dereference an already freed file and/or inode
when aio_complete is called before we return from the read_iter or
write_iter method.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Otherwise we might dereference an already freed file and/or inode
when aio_complete is called before we return from the read_iter or
write_iter method.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/aio.c: eliminate redundant loads in put_aio_ring_file</title>
<updated>2016-09-28T01:45:46+00:00</updated>
<author>
<name>Rasmus Villemoes</name>
<email>linux@rasmusvillemoes.dk</email>
</author>
<published>2016-09-14T22:25:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=de04e76935ad5985d318fbce298a17e9dd2092b7'/>
<id>de04e76935ad5985d318fbce298a17e9dd2092b7</id>
<content type='text'>
Using a local variable we can prevent gcc from reloading
aio_ring_file-&gt;f_inode-&gt;i_mapping twice, eliminating 2x2 dependent
loads.

Signed-off-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using a local variable we can prevent gcc from reloading
aio_ring_file-&gt;f_inode-&gt;i_mapping twice, eliminating 2x2 dependent
loads.

Signed-off-by: Rasmus Villemoes &lt;linux@rasmusvillemoes.dk&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: mark AIO pseudo-fs noexec</title>
<updated>2016-09-15T22:49:28+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jann@thejh.net</email>
</author>
<published>2016-09-15T22:31:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=22f6b4d34fcf039c63a94e7670e0da24f8575a5a'/>
<id>22f6b4d34fcf039c63a94e7670e0da24f8575a5a</id>
<content type='text'>
This ensures that do_mmap() won't implicitly make AIO memory mappings
executable if the READ_IMPLIES_EXEC personality flag is set.  Such
behavior is problematic because the security_mmap_file LSM hook doesn't
catch this case, potentially permitting an attacker to bypass a W^X
policy enforced by SELinux.

I have tested the patch on my machine.

To test the behavior, compile and run this:

    #define _GNU_SOURCE
    #include &lt;unistd.h&gt;
    #include &lt;sys/personality.h&gt;
    #include &lt;linux/aio_abi.h&gt;
    #include &lt;err.h&gt;
    #include &lt;stdlib.h&gt;
    #include &lt;stdio.h&gt;
    #include &lt;sys/syscall.h&gt;

    int main(void) {
        personality(READ_IMPLIES_EXEC);
        aio_context_t ctx = 0;
        if (syscall(__NR_io_setup, 1, &amp;ctx))
            err(1, "io_setup");

        char cmd[1000];
        sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'",
            (int)getpid());
        system(cmd);
        return 0;
    }

In the output, "rw-s" is good, "rwxs" is bad.

Signed-off-by: Jann Horn &lt;jann@thejh.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This ensures that do_mmap() won't implicitly make AIO memory mappings
executable if the READ_IMPLIES_EXEC personality flag is set.  Such
behavior is problematic because the security_mmap_file LSM hook doesn't
catch this case, potentially permitting an attacker to bypass a W^X
policy enforced by SELinux.

I have tested the patch on my machine.

To test the behavior, compile and run this:

    #define _GNU_SOURCE
    #include &lt;unistd.h&gt;
    #include &lt;sys/personality.h&gt;
    #include &lt;linux/aio_abi.h&gt;
    #include &lt;err.h&gt;
    #include &lt;stdlib.h&gt;
    #include &lt;stdio.h&gt;
    #include &lt;sys/syscall.h&gt;

    int main(void) {
        personality(READ_IMPLIES_EXEC);
        aio_context_t ctx = 0;
        if (syscall(__NR_io_setup, 1, &amp;ctx))
            err(1, "io_setup");

        char cmd[1000];
        sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'",
            (int)getpid());
        system(cmd);
        return 0;
    }

In the output, "rw-s" is good, "rwxs" is bad.

Signed-off-by: Jann Horn &lt;jann@thejh.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: make aio_setup_ring killable</title>
<updated>2016-05-24T00:04:14+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-05-23T23:25:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=013373e8b86735086ac86aff782637f7e58ff84d'/>
<id>013373e8b86735086ac86aff782637f7e58ff84d</id>
<content type='text'>
aio_setup_ring waits for mmap_sem in writable mode.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.  This will also expedite the
return to the userspace and do_exit.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Benamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
aio_setup_ring waits for mmap_sem in writable mode.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.  This will also expedite the
return to the userspace and do_exit.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Benamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: remove a pointless assignment</title>
<updated>2016-04-03T23:51:33+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2016-04-01T01:34:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2958ec177e400be1e26fc37e1759f84fa2c1761c'/>
<id>2958ec177e400be1e26fc37e1759f84fa2c1761c</id>
<content type='text'>
the value is never used after that point

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the value is never used after that point

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: move -&gt;mremap() from file_operations to vm_operations_struct</title>
<updated>2015-09-04T23:54:41+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-09-04T22:48:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5477e70a6420a6b7ca96c8e21413ee1c96a84260'/>
<id>5477e70a6420a6b7ca96c8e21413ee1c96a84260</id>
<content type='text'>
vma-&gt;vm_ops-&gt;mremap() looks more natural and clean in move_vma(), and this
way -&gt;mremap() can have more users.  Say, vdso.

While at it, s/aio_ring_remap/aio_ring_mremap/.

Note: this is the minimal change before -&gt;mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
vma-&gt;vm_ops-&gt;mremap() looks more natural and clean in move_vma(), and this
way -&gt;mremap() can have more users.  Say, vdso.

While at it, s/aio_ring_remap/aio_ring_mremap/.

Note: this is the minimal change before -&gt;mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2015-04-17T03:27:56+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-04-17T03:27:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4fc8adcfec3da639da76e8314c9ccefe5bf9a045'/>
<id>4fc8adcfec3da639da76e8314c9ccefe5bf9a045</id>
<content type='text'>
Pull third hunk of vfs changes from Al Viro:
 "This contains the -&gt;direct_IO() changes from Omar + saner
  generic_write_checks() + dealing with fcntl()/{read,write}() races
  (mirroring O_APPEND/O_DIRECT into iocb-&gt;ki_flags and instead of
  repeatedly looking at -&gt;f_flags, which can be changed by fcntl(2),
  check -&gt;ki_flags - which cannot) + infrastructure bits for dhowells'
  d_inode annotations + Christophs switch of /dev/loop to
  vfs_iter_write()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
  block: loop: switch to VFS ITER_BVEC
  configfs: Fix inconsistent use of file_inode() vs file-&gt;f_path.dentry-&gt;d_inode
  VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
  VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
  VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
  NFS: Don't use d_inode as a variable name
  VFS: Impose ordering on accesses of d_inode and d_flags
  VFS: Add owner-filesystem positive/negative dentry checks
  nfs: generic_write_checks() shouldn't be done on swapout...
  ocfs2: use __generic_file_write_iter()
  mirror O_APPEND and O_DIRECT into iocb-&gt;ki_flags
  switch generic_write_checks() to iocb and iter
  ocfs2: move generic_write_checks() before the alignment checks
  ocfs2_file_write_iter: stop messing with ppos
  udf_file_write_iter: reorder and simplify
  fuse: -&gt;direct_IO() doesn't need generic_write_checks()
  ext4_file_write_iter: move generic_write_checks() up
  xfs_file_aio_write_checks: switch to iocb/iov_iter
  generic_write_checks(): drop isblk argument
  blkdev_write_iter: expand generic_file_checks() call in there
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull third hunk of vfs changes from Al Viro:
 "This contains the -&gt;direct_IO() changes from Omar + saner
  generic_write_checks() + dealing with fcntl()/{read,write}() races
  (mirroring O_APPEND/O_DIRECT into iocb-&gt;ki_flags and instead of
  repeatedly looking at -&gt;f_flags, which can be changed by fcntl(2),
  check -&gt;ki_flags - which cannot) + infrastructure bits for dhowells'
  d_inode annotations + Christophs switch of /dev/loop to
  vfs_iter_write()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
  block: loop: switch to VFS ITER_BVEC
  configfs: Fix inconsistent use of file_inode() vs file-&gt;f_path.dentry-&gt;d_inode
  VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
  VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
  VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
  NFS: Don't use d_inode as a variable name
  VFS: Impose ordering on accesses of d_inode and d_flags
  VFS: Add owner-filesystem positive/negative dentry checks
  nfs: generic_write_checks() shouldn't be done on swapout...
  ocfs2: use __generic_file_write_iter()
  mirror O_APPEND and O_DIRECT into iocb-&gt;ki_flags
  switch generic_write_checks() to iocb and iter
  ocfs2: move generic_write_checks() before the alignment checks
  ocfs2_file_write_iter: stop messing with ppos
  udf_file_write_iter: reorder and simplify
  fuse: -&gt;direct_IO() doesn't need generic_write_checks()
  ext4_file_write_iter: move generic_write_checks() up
  xfs_file_aio_write_checks: switch to iocb/iov_iter
  generic_write_checks(): drop isblk argument
  blkdev_write_iter: expand generic_file_checks() call in there
  ...
</pre>
</div>
</content>
</entry>
</feed>
