<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/aio.c, branch v4.16</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>fs/aio: Use RCU accessors for kioctx_table-&gt;table[]</title>
<updated>2018-03-14T19:10:17+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-03-14T19:10:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d0264c01e7587001a8c4608a5d1818dba9a4c11a'/>
<id>d0264c01e7587001a8c4608a5d1818dba9a4c11a</id>
<content type='text'>
While converting ioctx index from a list to a table, db446a08c23d
("aio: convert the ioctx list to table lookup v3") missed tagging
kioctx_table-&gt;table[] as an array of RCU pointers and using the
appropriate RCU accessors.  This introduces a small window in the
lookup path where init and access may race.

Mark kioctx_table-&gt;table[] with __rcu and use the approriate RCU
accessors when using the field.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3")
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: stable@vger.kernel.org # v3.12+
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While converting ioctx index from a list to a table, db446a08c23d
("aio: convert the ioctx list to table lookup v3") missed tagging
kioctx_table-&gt;table[] as an array of RCU pointers and using the
appropriate RCU accessors.  This introduces a small window in the
lookup path where init and access may race.

Mark kioctx_table-&gt;table[] with __rcu and use the approriate RCU
accessors when using the field.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3")
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: stable@vger.kernel.org # v3.12+
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/aio: Add explicit RCU grace period when freeing kioctx</title>
<updated>2018-03-14T19:10:17+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-03-14T19:10:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a6d7cff472eea87d96899a20fa718d2bab7109f3'/>
<id>a6d7cff472eea87d96899a20fa718d2bab7109f3</id>
<content type='text'>
While fixing refcounting, e34ecee2ae79 ("aio: Fix a trinity splat")
incorrectly removed explicit RCU grace period before freeing kioctx.
The intention seems to be depending on the internal RCU grace periods
of percpu_ref; however, percpu_ref uses a different flavor of RCU,
sched-RCU.  This can lead to kioctx being freed while RCU read
protected dereferences are still in progress.

Fix it by updating free_ioctx() to go through call_rcu() explicitly.

v2: Comment added to explain double bouncing.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Fixes: e34ecee2ae79 ("aio: Fix a trinity splat")
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: stable@vger.kernel.org # v3.13+
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While fixing refcounting, e34ecee2ae79 ("aio: Fix a trinity splat")
incorrectly removed explicit RCU grace period before freeing kioctx.
The intention seems to be depending on the internal RCU grace periods
of percpu_ref; however, percpu_ref uses a different flavor of RCU,
sched-RCU.  This can lead to kioctx being freed while RCU read
protected dereferences are still in progress.

Fix it by updating free_ioctx() to go through call_rcu() explicitly.

v2: Comment added to explain double bouncing.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Fixes: e34ecee2ae79 ("aio: Fix a trinity splat")
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: stable@vger.kernel.org # v3.13+
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2017-11-17T19:54:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-17T19:54:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=93f30c73ecd0281cf3685ef0e4e384980a176176'/>
<id>93f30c73ecd0281cf3685ef0e4e384980a176176</id>
<content type='text'>
Pull compat and uaccess updates from Al Viro:

 - {get,put}_compat_sigset() series

 - assorted compat ioctl stuff

 - more set_fs() elimination

 - a few more timespec64 conversions

 - several removals of pointless access_ok() in places where it was
   followed only by non-__ variants of primitives

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  coredump: call do_unlinkat directly instead of sys_unlink
  fs: expose do_unlinkat for built-in callers
  ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
  ipmi: get rid of pointless access_ok()
  pi433: sanitize ioctl
  cxlflash: get rid of pointless access_ok()
  mtdchar: get rid of pointless access_ok()
  r128: switch compat ioctls to drm_ioctl_kernel()
  selection: get rid of field-by-field copyin
  VT_RESIZEX: get rid of field-by-field copyin
  i2c compat ioctls: move to -&gt;compat_ioctl()
  sched_rr_get_interval(): move compat to native, get rid of set_fs()
  mips: switch to {get,put}_compat_sigset()
  sparc: switch to {get,put}_compat_sigset()
  s390: switch to {get,put}_compat_sigset()
  ppc: switch to {get,put}_compat_sigset()
  parisc: switch to {get,put}_compat_sigset()
  get_compat_sigset()
  get rid of {get,put}_compat_itimerspec()
  io_getevents: Use timespec64 to represent timeouts
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull compat and uaccess updates from Al Viro:

 - {get,put}_compat_sigset() series

 - assorted compat ioctl stuff

 - more set_fs() elimination

 - a few more timespec64 conversions

 - several removals of pointless access_ok() in places where it was
   followed only by non-__ variants of primitives

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  coredump: call do_unlinkat directly instead of sys_unlink
  fs: expose do_unlinkat for built-in callers
  ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
  ipmi: get rid of pointless access_ok()
  pi433: sanitize ioctl
  cxlflash: get rid of pointless access_ok()
  mtdchar: get rid of pointless access_ok()
  r128: switch compat ioctls to drm_ioctl_kernel()
  selection: get rid of field-by-field copyin
  VT_RESIZEX: get rid of field-by-field copyin
  i2c compat ioctls: move to -&gt;compat_ioctl()
  sched_rr_get_interval(): move compat to native, get rid of set_fs()
  mips: switch to {get,put}_compat_sigset()
  sparc: switch to {get,put}_compat_sigset()
  s390: switch to {get,put}_compat_sigset()
  ppc: switch to {get,put}_compat_sigset()
  parisc: switch to {get,put}_compat_sigset()
  get_compat_sigset()
  get rid of {get,put}_compat_itimerspec()
  io_getevents: Use timespec64 to represent timeouts
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()</title>
<updated>2017-10-25T09:01:08+00:00</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2017-10-23T21:07:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6aa7de059173a986114ac43b8f50b297a86f09a8'/>
<id>6aa7de059173a986114ac43b8f50b297a86f09a8</id>
<content type='text'>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_getevents: Use timespec64 to represent timeouts</title>
<updated>2017-09-19T21:55:59+00:00</updated>
<author>
<name>Deepa Dinamani</name>
<email>deepa.kernel@gmail.com</email>
</author>
<published>2017-08-05T04:12:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fa2e62a54003419b06f1de7836dca51b368d0872'/>
<id>fa2e62a54003419b06f1de7836dca51b368d0872</id>
<content type='text'>
struct timespec is not y2038 safe. Use y2038 safe
struct timespec64 to represent timeouts.
The system call interface itself will be changed as
part of different series.

Timeouts will not really need more than 32 bits.
But, replacing these with timespec64 helps verification
of a y2038 safe kernel by getting rid of timespec
internally.

Signed-off-by: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Cc: linux-aio@kvack.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
struct timespec is not y2038 safe. Use y2038 safe
struct timespec64 to represent timeouts.
The system call interface itself will be changed as
part of different series.

Timeouts will not really need more than 32 bits.
But, replacing these with timespec64 helps verification
of a y2038 safe kernel by getting rid of timespec
internally.

Signed-off-by: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Cc: linux-aio@kvack.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2017-09-15T02:29:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-09-15T02:29:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e253d98f5babbec7e6ced810f7335b265a7f7e83'/>
<id>e253d98f5babbec7e6ced810f7335b265a7f7e83</id>
<content type='text'>
Pull nowait read support from Al Viro:
 "Support IOCB_NOWAIT for buffered reads and block devices"

* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  block_dev: support RFW_NOWAIT on block device nodes
  fs: support RWF_NOWAIT for buffered reads
  fs: support IOCB_NOWAIT in generic_file_buffered_read
  fs: pass iocb to do_generic_file_read
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull nowait read support from Al Viro:
 "Support IOCB_NOWAIT for buffered reads and block devices"

* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  block_dev: support RFW_NOWAIT on block device nodes
  fs: support RWF_NOWAIT for buffered reads
  fs: support IOCB_NOWAIT in generic_file_buffered_read
  fs: pass iocb to do_generic_file_read
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY</title>
<updated>2017-09-09T01:26:46+00:00</updated>
<author>
<name>Jérôme Glisse</name>
<email>jglisse@redhat.com</email>
</author>
<published>2017-09-08T23:12:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2916ecc0f9d435d849c98f4da50e453124c87531'/>
<id>2916ecc0f9d435d849c98f4da50e453124c87531</id>
<content type='text'>
Introduce a new migration mode that allow to offload the copy to a device
DMA engine.  This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed.  I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch).  The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations.  But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
 1 - lock page
 2 - call migratepage() callback
 3 - (extra locking in some migratepage() callback)
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
 5 - copy page
 6 - (unlock any extra lock of migratepage() callback)
 7 - return from migratepage() callback
 8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
 1 - lock multiple pages
For each page {
 2 - call migratepage() callback
 3 - abort in all problematic migratepage() callback
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
} // finished all calls to migratepage() callback
 5 - DMA copy multiple pages
 6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Cc: Aneesh Kumar &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Evgeny Baskakov &lt;ebaskakov@nvidia.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mark Hairgrove &lt;mhairgrove@nvidia.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Sherry Cheung &lt;SCheung@nvidia.com&gt;
Cc: Subhash Gutti &lt;sgutti@nvidia.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Bob Liu &lt;liubo95@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a new migration mode that allow to offload the copy to a device
DMA engine.  This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed.  I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch).  The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations.  But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
 1 - lock page
 2 - call migratepage() callback
 3 - (extra locking in some migratepage() callback)
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
 5 - copy page
 6 - (unlock any extra lock of migratepage() callback)
 7 - return from migratepage() callback
 8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
 1 - lock multiple pages
For each page {
 2 - call migratepage() callback
 3 - abort in all problematic migratepage() callback
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
} // finished all calls to migratepage() callback
 5 - DMA copy multiple pages
 6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Cc: Aneesh Kumar &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Evgeny Baskakov &lt;ebaskakov@nvidia.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mark Hairgrove &lt;mhairgrove@nvidia.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Sherry Cheung &lt;SCheung@nvidia.com&gt;
Cc: Subhash Gutti &lt;sgutti@nvidia.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Bob Liu &lt;liubo95@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: aio: fix the increment of aio-nr and counting against aio-max-nr</title>
<updated>2017-09-07T16:28:28+00:00</updated>
<author>
<name>Mauricio Faria de Oliveira</name>
<email>mauricfo@linux.vnet.ibm.com</email>
</author>
<published>2017-07-05T13:53:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2a8a98673c13cb2a61a6476153acf8344adfa992'/>
<id>2a8a98673c13cb2a61a6476153acf8344adfa992</id>
<content type='text'>
Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8'
for io_setup(nr_events, ..) with 'nr_events &lt; num_possible_cpus() * 4':

    ioctx_alloc()
    ...
        nr_events = max(nr_events, num_possible_cpus() * 4);
        nr_events *= 2;
    ...
        ctx-&gt;max_reqs = nr_events;
    ...
        aio_nr += ctx-&gt;max_reqs;
    ....

This limits the number of aio contexts actually available to much less
than aio-max-nr, and is increasingly worse with greater number of CPUs.

For example, with 64 CPUs, only 256 aio contexts are actually available
(with aio-max-nr = 65536) because the increment is 512 in that scenario.

Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context]
is 128, but make it 256 (double) as counting against 'aio-max-nr * 2':

    ioctx_alloc()
    ...
        if (aio_nr + nr_events &gt; (aio_max_nr * 2UL) ||
        ...
            goto err_ctx;
    ...

This patch uses the original value of nr_events (from userspace) to
increment aio-nr and count against aio-max-nr, which resolves those.

Signed-off-by: Mauricio Faria de Oliveira &lt;mauricfo@linux.vnet.ibm.com&gt;
Reported-by: Lekshmi C. Pillai &lt;lekshmi.cpillai@in.ibm.com&gt;
Tested-by: Lekshmi C. Pillai &lt;lekshmi.cpillai@in.ibm.com&gt;
Tested-by: Paul Nguyen &lt;nguyenp@us.ibm.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8'
for io_setup(nr_events, ..) with 'nr_events &lt; num_possible_cpus() * 4':

    ioctx_alloc()
    ...
        nr_events = max(nr_events, num_possible_cpus() * 4);
        nr_events *= 2;
    ...
        ctx-&gt;max_reqs = nr_events;
    ...
        aio_nr += ctx-&gt;max_reqs;
    ....

This limits the number of aio contexts actually available to much less
than aio-max-nr, and is increasingly worse with greater number of CPUs.

For example, with 64 CPUs, only 256 aio contexts are actually available
(with aio-max-nr = 65536) because the increment is 512 in that scenario.

Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context]
is 128, but make it 256 (double) as counting against 'aio-max-nr * 2':

    ioctx_alloc()
    ...
        if (aio_nr + nr_events &gt; (aio_max_nr * 2UL) ||
        ...
            goto err_ctx;
    ...

This patch uses the original value of nr_events (from userspace) to
increment aio-nr and count against aio-max-nr, which resolves those.

Signed-off-by: Mauricio Faria de Oliveira &lt;mauricfo@linux.vnet.ibm.com&gt;
Reported-by: Lekshmi C. Pillai &lt;lekshmi.cpillai@in.ibm.com&gt;
Tested-by: Lekshmi C. Pillai &lt;lekshmi.cpillai@in.ibm.com&gt;
Tested-by: Paul Nguyen &lt;nguyenp@us.ibm.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: support RWF_NOWAIT for buffered reads</title>
<updated>2017-09-04T23:04:23+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-08-29T14:13:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=91f9943e1c7b6638f27312d03fe71fcc67b23571'/>
<id>91f9943e1c7b6638f27312d03fe71fcc67b23571</id>
<content type='text'>
This is based on the old idea and code from Milosz Tanski.  With the aio
nowait code it becomes mostly trivial now.  Buffered writes continue to
return -EOPNOTSUPP if RWF_NOWAIT is passed.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is based on the old idea and code from Milosz Tanski.  With the aio
nowait code it becomes mostly trivial now.  Buffered writes continue to
return -EOPNOTSUPP if RWF_NOWAIT is passed.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: add O_DIRECT and aio support for sending down write life time hints</title>
<updated>2017-06-27T18:05:36+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2017-06-27T17:01:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=45d06cf701a3866e0d246789039a46370af60223'/>
<id>45d06cf701a3866e0d246789039a46370af60223</id>
<content type='text'>
Reviewed-by: Andreas Dilger &lt;adilger@dilger.ca&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reviewed-by: Andreas Dilger &lt;adilger@dilger.ca&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
