<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/aio.c, branch v3.15</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>aio: fix potential leak in aio_run_iocb().</title>
<updated>2014-05-01T12:37:43+00:00</updated>
<author>
<name>Leon Yu</name>
<email>chianglungyu@gmail.com</email>
</author>
<published>2014-05-01T03:31:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=754320d6e166d3a12cb4810a452bde00afbd4e9a'/>
<id>754320d6e166d3a12cb4810a452bde00afbd4e9a</id>
<content type='text'>
iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
but it doesn't hold when failure happens right after aio_setup_vectored_rw().

Fix that in a such way to avoid hairy goto.

Signed-off-by: Leon Yu &lt;chianglungyu@gmail.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
iovec should be reclaimed whenever caller of rw_copy_check_uvector() returns,
but it doesn't hold when failure happens right after aio_setup_vectored_rw().

Fix that in a such way to avoid hairy goto.

Signed-off-by: Leon Yu &lt;chianglungyu@gmail.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: block io_destroy() until all context requests are completed</title>
<updated>2014-04-16T17:38:04+00:00</updated>
<author>
<name>Anatol Pomozov</name>
<email>anatol.pomozov@gmail.com</email>
</author>
<published>2014-04-15T18:31:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e02ba72aabfade4c9cd6e3263e9b57bf890ad25c'/>
<id>e02ba72aabfade4c9cd6e3263e9b57bf890ad25c</id>
<content type='text'>
deletes aio context and all resources related to. It makes sense that
no IO operations connected to the context should be running after the context
is destroyed. As we removed io_context we have no chance to
get requests status or call io_getevents().

man page for io_destroy says that this function may block until
all context's requests are completed. Before kernel 3.11 io_destroy()
blocked indeed, but since aio refactoring in 3.11 it is not true anymore.

Here is a pseudo-code that shows a testcase for a race condition discovered
in 3.11:

  initialize io_context
  io_submit(read to buffer)
  io_destroy()

  // context is destroyed so we can free the resources
  free(buffers);

  // if the buffer is allocated by some other user he'll be surprised
  // to learn that the buffer still filled by an outstanding operation
  // from the destroyed io_context

The fix is straight-forward - add a completion struct and wait on it
in io_destroy, complete() should be called when number of in-fligh requests
reaches zero.

If two or more io_destroy() called for the same context simultaneously then
only the first one waits for IO completion, other calls behaviour is undefined.

Tested: ran http://pastebin.com/LrPsQ4RL testcase for several hours and
  do not see the race condition anymore.

Signed-off-by: Anatol Pomozov &lt;anatol.pomozov@gmail.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
deletes aio context and all resources related to. It makes sense that
no IO operations connected to the context should be running after the context
is destroyed. As we removed io_context we have no chance to
get requests status or call io_getevents().

man page for io_destroy says that this function may block until
all context's requests are completed. Before kernel 3.11 io_destroy()
blocked indeed, but since aio refactoring in 3.11 it is not true anymore.

Here is a pseudo-code that shows a testcase for a race condition discovered
in 3.11:

  initialize io_context
  io_submit(read to buffer)
  io_destroy()

  // context is destroyed so we can free the resources
  free(buffers);

  // if the buffer is allocated by some other user he'll be surprised
  // to learn that the buffer still filled by an outstanding operation
  // from the destroyed io_context

The fix is straight-forward - add a completion struct and wait on it
in io_destroy, complete() should be called when number of in-fligh requests
reaches zero.

If two or more io_destroy() called for the same context simultaneously then
only the first one waits for IO completion, other calls behaviour is undefined.

Tested: ran http://pastebin.com/LrPsQ4RL testcase for several hours and
  do not see the race condition anymore.

Signed-off-by: Anatol Pomozov &lt;anatol.pomozov@gmail.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: v4 ensure access to ctx-&gt;ring_pages is correctly serialised for migration</title>
<updated>2014-03-28T14:14:45+00:00</updated>
<author>
<name>Benjamin LaHaise</name>
<email>bcrl@kvack.org</email>
</author>
<published>2014-03-28T14:14:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fa8a53c39f3fdde98c9eace6a9b412143f0f6ed6'/>
<id>fa8a53c39f3fdde98c9eace6a9b412143f0f6ed6</id>
<content type='text'>
As reported by Tang Chen, Gu Zheng and Yasuaki Isimatsu, the following issues
exist in the aio ring page migration support.

As a result, for example, we have the following problem:

            thread 1                      |              thread 2
                                          |
aio_migratepage()                         |
 |-&gt; take ctx-&gt;completion_lock            |
 |-&gt; migrate_page_copy(new, old)          |
 |   *NOW*, ctx-&gt;ring_pages[idx] == old   |
                                          |
                                          |    *NOW*, ctx-&gt;ring_pages[idx] == old
                                          |    aio_read_events_ring()
                                          |     |-&gt; ring = kmap_atomic(ctx-&gt;ring_pages[0])
                                          |     |-&gt; ring-&gt;head = head;          *HERE, write to the old ring page*
                                          |     |-&gt; kunmap_atomic(ring);
                                          |
 |-&gt; ctx-&gt;ring_pages[idx] = new           |
 |   *BUT NOW*, the content of            |
 |    ring_pages[idx] is old.             |
 |-&gt; release ctx-&gt;completion_lock         |

As above, the new ring page will not be updated.

Fix this issue, as well as prevent races in aio_ring_setup() by holding
the ring_lock mutex during kioctx setup and page migration.  This avoids
the overhead of taking another spinlock in aio_read_events_ring() as Tang's
and Gu's original fix did, pushing the overhead into the migration code.

Note that to handle the nesting of ring_lock inside of mmap_sem, the
migratepage operation uses mutex_trylock().  Page migration is not a 100%
critical operation in this case, so the ocassional failure can be
tolerated.  This issue was reported by Sasha Levin.

Based on feedback from Linus, avoid the extra taking of ctx-&gt;completion_lock.
Instead, make page migration fully serialised by mapping-&gt;private_lock, and
have aio_free_ring() simply disconnect the kioctx from the mapping by calling
put_aio_ring_file() before touching ctx-&gt;ring_pages[].  This simplifies the
error handling logic in aio_migratepage(), and should improve robustness.

v4: always do mutex_unlock() in cases when kioctx setup fails.

Reported-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As reported by Tang Chen, Gu Zheng and Yasuaki Isimatsu, the following issues
exist in the aio ring page migration support.

As a result, for example, we have the following problem:

            thread 1                      |              thread 2
                                          |
aio_migratepage()                         |
 |-&gt; take ctx-&gt;completion_lock            |
 |-&gt; migrate_page_copy(new, old)          |
 |   *NOW*, ctx-&gt;ring_pages[idx] == old   |
                                          |
                                          |    *NOW*, ctx-&gt;ring_pages[idx] == old
                                          |    aio_read_events_ring()
                                          |     |-&gt; ring = kmap_atomic(ctx-&gt;ring_pages[0])
                                          |     |-&gt; ring-&gt;head = head;          *HERE, write to the old ring page*
                                          |     |-&gt; kunmap_atomic(ring);
                                          |
 |-&gt; ctx-&gt;ring_pages[idx] = new           |
 |   *BUT NOW*, the content of            |
 |    ring_pages[idx] is old.             |
 |-&gt; release ctx-&gt;completion_lock         |

As above, the new ring page will not be updated.

Fix this issue, as well as prevent races in aio_ring_setup() by holding
the ring_lock mutex during kioctx setup and page migration.  This avoids
the overhead of taking another spinlock in aio_read_events_ring() as Tang's
and Gu's original fix did, pushing the overhead into the migration code.

Note that to handle the nesting of ring_lock inside of mmap_sem, the
migratepage operation uses mutex_trylock().  Page migration is not a 100%
critical operation in this case, so the ocassional failure can be
tolerated.  This issue was reported by Sasha Levin.

Based on feedback from Linus, avoid the extra taking of ctx-&gt;completion_lock.
Instead, make page migration fully serialised by mapping-&gt;private_lock, and
have aio_free_ring() simply disconnect the kioctx from the mapping by calling
put_aio_ring_file() before touching ctx-&gt;ring_pages[].  This simplifies the
error handling logic in aio_migratepage(), and should improve robustness.

v4: always do mutex_unlock() in cases when kioctx setup fails.

Reported-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kvack.org/~bcrl/aio-next</title>
<updated>2013-12-22T19:03:49+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-12-22T19:03:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a8472b4bb1aaa421b10f2cc40f70097e67f971ae'/>
<id>a8472b4bb1aaa421b10f2cc40f70097e67f971ae</id>
<content type='text'>
Pull AIO leak fixes from Ben LaHaise:
 "I've put these two patches plus Linus's change through a round of
  tests, and it passes millions of iterations of the aio numa
  migratepage test, as well as a number of repetitions of a few simple
  read and write tests.

  The first patch fixes the memory leak Kent introduced, while the
  second patch makes aio_migratepage() much more paranoid and robust"

* git://git.kvack.org/~bcrl/aio-next:
  aio/migratepages: make aio migrate pages sane
  aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull AIO leak fixes from Ben LaHaise:
 "I've put these two patches plus Linus's change through a round of
  tests, and it passes millions of iterations of the aio numa
  migratepage test, as well as a number of repetitions of a few simple
  read and write tests.

  The first patch fixes the memory leak Kent introduced, while the
  second patch makes aio_migratepage() much more paranoid and robust"

* git://git.kvack.org/~bcrl/aio-next:
  aio/migratepages: make aio migrate pages sane
  aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: clean up and fix aio_setup_ring page mapping</title>
<updated>2013-12-22T19:03:08+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-12-19T20:11:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3dc9acb67600393249a795934ccdfc291a200e6b'/>
<id>3dc9acb67600393249a795934ccdfc291a200e6b</id>
<content type='text'>
Since commit 36bc08cc01709 ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.

However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them.  And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.

Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).

So clean it all up, making it much more straightforward.  Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit d6c355c7dabc ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.

Tested-and-acked-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 36bc08cc01709 ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.

However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them.  And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.

Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).

So clean it all up, making it much more straightforward.  Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit d6c355c7dabc ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.

Tested-and-acked-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio/migratepages: make aio migrate pages sane</title>
<updated>2013-12-21T22:56:08+00:00</updated>
<author>
<name>Benjamin LaHaise</name>
<email>bcrl@kvack.org</email>
</author>
<published>2013-12-21T22:56:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8e321fefb0e60bae4e2a28d20fc4fa30758d27c6'/>
<id>8e321fefb0e60bae4e2a28d20fc4fa30758d27c6</id>
<content type='text'>
The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.

Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.

Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: fix kioctx leak introduced by "aio: Fix a trinity splat"</title>
<updated>2013-12-21T20:57:09+00:00</updated>
<author>
<name>Benjamin LaHaise</name>
<email>bcrl@kvack.org</email>
</author>
<published>2013-12-21T20:49:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1881686f842065d2f92ec9c6424830ffc17d23b0'/>
<id>1881686f842065d2f92ec9c6424830ffc17d23b0</id>
<content type='text'>
e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference
counting to correct a bug trinity found.  Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put.  Add that reference count back in to fix things.

Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference
counting to correct a bug trinity found.  Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put.  Add that reference count back in to fix things.

Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kvack.org/~bcrl/aio-next</title>
<updated>2013-12-06T16:32:59+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-12-06T16:32:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c537aba00e3f1df8ce6c7c9fcb98b82c0c2d1d2c'/>
<id>c537aba00e3f1df8ce6c7c9fcb98b82c0c2d1d2c</id>
<content type='text'>
Pull aio fix from Benjamin LaHaise:
 "AIO fix from Gu Zheng that fixes a GPF that Dave Jones uncovered with
  trinity"

* git://git.kvack.org/~bcrl/aio-next:
  aio: clean up aio ring in the fail path
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull aio fix from Benjamin LaHaise:
 "AIO fix from Gu Zheng that fixes a GPF that Dave Jones uncovered with
  trinity"

* git://git.kvack.org/~bcrl/aio-next:
  aio: clean up aio ring in the fail path
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: clean up aio ring in the fail path</title>
<updated>2013-12-06T15:22:55+00:00</updated>
<author>
<name>Gu Zheng</name>
<email>guz.fnst@cn.fujitsu.com</email>
</author>
<published>2013-12-04T10:19:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d1b9432712a25eeb06114fb4b587133525a47de5'/>
<id>d1b9432712a25eeb06114fb4b587133525a47de5</id>
<content type='text'>
Clean up the aio ring file in the fail path of aio_setup_ring
and ioctx_alloc. And maybe it can fix the GPF issue reported by
Dave Jones:
https://lkml.org/lkml/2013/11/25/898

Signed-off-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Clean up the aio ring file in the fail path of aio_setup_ring
and ioctx_alloc. And maybe it can fix the GPF issue reported by
Dave Jones:
https://lkml.org/lkml/2013/11/25/898

Signed-off-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kvack.org/~bcrl/aio-next</title>
<updated>2013-11-22T16:42:14+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-11-22T16:42:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d0f278c1dd0175093ed37ce132395dc689e6987e'/>
<id>d0f278c1dd0175093ed37ce132395dc689e6987e</id>
<content type='text'>
Pull aio fixes from Benjamin LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
  aio: nullify aio-&gt;ring_pages after freeing it
  aio: prevent double free in ioctx_alloc
  aio: Fix a trinity splat
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull aio fixes from Benjamin LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
  aio: nullify aio-&gt;ring_pages after freeing it
  aio: prevent double free in ioctx_alloc
  aio: Fix a trinity splat
</pre>
</div>
</content>
</entry>
</feed>
