<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs/aio.c, branch linux-3.12.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>AIO: properly check iovec sizes</title>
<updated>2016-02-24T09:23:18+00:00</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2016-02-20T01:36:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2698377daeb469a9d68021979d2e506922f788da'/>
<id>2698377daeb469a9d68021979d2e506922f788da</id>
<content type='text'>
In Linus's tree, the iovec code has been reworked massively, but in
older kernels the AIO layer should be checking this before passing the
request on to other layers.

Many thanks to Ben Hawkes of Google Project Zero for pointing out the
issue.

Reported-by: Ben Hawkes &lt;hawkes@google.com&gt;
Acked-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Tested-by: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In Linus's tree, the iovec code has been reworked massively, but in
older kernels the AIO layer should be checking this before passing the
request on to other layers.

Many thanks to Ben Hawkes of Google Project Zero for pointing out the
issue.

Reported-by: Ben Hawkes &lt;hawkes@google.com&gt;
Acked-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Tested-by: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>aio: fix reqs_available handling</title>
<updated>2015-09-02T16:20:16+00:00</updated>
<author>
<name>Benjamin LaHaise</name>
<email>bcrl@kvack.org</email>
</author>
<published>2014-08-24T17:14:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=09c59fc80e03795406c648c28dba4aa1a365bc0e'/>
<id>09c59fc80e03795406c648c28dba4aa1a365bc0e</id>
<content type='text'>
commit d856f32a86b2b015ab180ab7a55e455ed8d3ccc5 upstream.

As reported by Dan Aloni, commit f8567a3845ac ("aio: fix aio request
leak when events are reaped by userspace") introduces a regression when
user code attempts to perform io_submit() with more events than are
available in the ring buffer.  Reverting that commit would reintroduce a
regression when user space event reaping is used.

Fixing this bug is a bit more involved than the previous attempts to fix
this regression.  Since we do not have a single point at which we can
count events as being reaped by user space and io_getevents(), we have
to track event completion by looking at the number of events left in the
event ring.  So long as there are as many events in the ring buffer as
there have been completion events generate, we cannot call
put_reqs_available().  The code to check for this is now placed in
refill_reqs_available().

A test program from Dan and modified by me for verifying this bug is available
at http://www.kvack.org/~bcrl/20140824-aio_bug.c .

Reported-by: Dan Aloni &lt;dan@kernelim.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Acked-by: Dan Aloni &lt;dan@kernelim.com&gt;
Cc: Kent Overstreet &lt;kmo@daterainc.com&gt;
Cc: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Cc: Petr Matousek &lt;pmatouse@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d856f32a86b2b015ab180ab7a55e455ed8d3ccc5 upstream.

As reported by Dan Aloni, commit f8567a3845ac ("aio: fix aio request
leak when events are reaped by userspace") introduces a regression when
user code attempts to perform io_submit() with more events than are
available in the ring buffer.  Reverting that commit would reintroduce a
regression when user space event reaping is used.

Fixing this bug is a bit more involved than the previous attempts to fix
this regression.  Since we do not have a single point at which we can
count events as being reaped by user space and io_getevents(), we have
to track event completion by looking at the number of events left in the
event ring.  So long as there are as many events in the ring buffer as
there have been completion events generate, we cannot call
put_reqs_available().  The code to check for this is now placed in
refill_reqs_available().

A test program from Dan and modified by me for verifying this bug is available
at http://www.kvack.org/~bcrl/20140824-aio_bug.c .

Reported-by: Dan Aloni &lt;dan@kernelim.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Acked-by: Dan Aloni &lt;dan@kernelim.com&gt;
Cc: Kent Overstreet &lt;kmo@daterainc.com&gt;
Cc: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Cc: Petr Matousek &lt;pmatouse@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: fix serial draining in exit_aio()</title>
<updated>2015-05-26T12:33:45+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2015-04-15T17:17:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0b8b97704fbe442b3bba7a2c7eba9113122abafd'/>
<id>0b8b97704fbe442b3bba7a2c7eba9113122abafd</id>
<content type='text'>
commit dc48e56d761610da4ea1088d1bea0a030b8e3e43 upstream.

exit_aio() currently serializes killing io contexts. Each context
killing ends up having to do percpu_ref_kill(), which in turns has
to wait for an RCU grace period. This can take a long time, depending
on the number of contexts. And there's no point in doing them serially,
when we could be waiting for all of them in one fell swoop.

This patches makes my fio thread offload test case exit 0.2s instead
of almost 6s.

Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit dc48e56d761610da4ea1088d1bea0a030b8e3e43 upstream.

exit_aio() currently serializes killing io contexts. Each context
killing ends up having to do percpu_ref_kill(), which in turns has
to wait for an RCU grace period. This can take a long time, depending
on the number of contexts. And there's no point in doing them serially,
when we could be waiting for all of them in one fell swoop.

This patches makes my fio thread offload test case exit 0.2s instead
of almost 6s.

Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: change exit_aio() to load mm-&gt;ioctx_table once and avoid rcu_read_lock()</title>
<updated>2015-05-26T12:33:44+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-30T17:02:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=eb18d4eed036ccf29e4c979d46661c16753f940c'/>
<id>eb18d4eed036ccf29e4c979d46661c16753f940c</id>
<content type='text'>
commit 4b70ac5fd9b58bfaa5f25b4ea48f528aefbf3308 upstream.

On 04/30, Benjamin LaHaise wrote:
&gt;
&gt; &gt; -		ctx-&gt;mmap_size = 0;
&gt; &gt; -
&gt; &gt; -		kill_ioctx(mm, ctx, NULL);
&gt; &gt; +		if (ctx) {
&gt; &gt; +			ctx-&gt;mmap_size = 0;
&gt; &gt; +			kill_ioctx(mm, ctx, NULL);
&gt; &gt; +		}
&gt;
&gt; Rather than indenting and moving the two lines changing mmap_size and the
&gt; kill_ioctx() call, why not just do "if (!ctx) ... continue;"?  That reduces
&gt; the number of lines changed and avoid excessive indentation.

OK. To me the code looks better/simpler with "if (ctx)", but this is subjective
of course, I won't argue.

The patch still removes the empty line between mmap_size = 0 and kill_ioctx(),
we reset mmap_size only for kill_ioctx(). But feel free to remove this change.

-------------------------------------------------------------------------------
Subject: [PATCH v3 1/2] aio: change exit_aio() to load mm-&gt;ioctx_table once and avoid rcu_read_lock()

1. We can read -&gt;ioctx_table only once and we do not read rcu_read_lock()
   or even rcu_dereference().

   This mm has no users, nobody else can play with -&gt;ioctx_table. Otherwise
   the code is buggy anyway, if we need rcu_read_lock() in a loop because
   -&gt;ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
   munmap(), but another reason is that we simply can't do vm_munmap() unless
   current-&gt;mm == mm and this is not true in general, the caller is mmput().

3. We do not really need to nullify mm-&gt;ioctx_table before return, probably
   the current code does this to catch the potential problems. But in this
   case RCU_INIT_POINTER(NULL) looks better.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4b70ac5fd9b58bfaa5f25b4ea48f528aefbf3308 upstream.

On 04/30, Benjamin LaHaise wrote:
&gt;
&gt; &gt; -		ctx-&gt;mmap_size = 0;
&gt; &gt; -
&gt; &gt; -		kill_ioctx(mm, ctx, NULL);
&gt; &gt; +		if (ctx) {
&gt; &gt; +			ctx-&gt;mmap_size = 0;
&gt; &gt; +			kill_ioctx(mm, ctx, NULL);
&gt; &gt; +		}
&gt;
&gt; Rather than indenting and moving the two lines changing mmap_size and the
&gt; kill_ioctx() call, why not just do "if (!ctx) ... continue;"?  That reduces
&gt; the number of lines changed and avoid excessive indentation.

OK. To me the code looks better/simpler with "if (ctx)", but this is subjective
of course, I won't argue.

The patch still removes the empty line between mmap_size = 0 and kill_ioctx(),
we reset mmap_size only for kill_ioctx(). But feel free to remove this change.

-------------------------------------------------------------------------------
Subject: [PATCH v3 1/2] aio: change exit_aio() to load mm-&gt;ioctx_table once and avoid rcu_read_lock()

1. We can read -&gt;ioctx_table only once and we do not read rcu_read_lock()
   or even rcu_dereference().

   This mm has no users, nobody else can play with -&gt;ioctx_table. Otherwise
   the code is buggy anyway, if we need rcu_read_lock() in a loop because
   -&gt;ioctx_table can be updated then kfree(table) is obviously wrong.

2. Update the comment. "exit_mmap(mm) is coming" is the good reason to avoid
   munmap(), but another reason is that we simply can't do vm_munmap() unless
   current-&gt;mm == mm and this is not true in general, the caller is mmput().

3. We do not really need to nullify mm-&gt;ioctx_table before return, probably
   the current code does this to catch the potential problems. But in this
   case RCU_INIT_POINTER(NULL) looks better.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ioctx_alloc(): fix vma (and file) leak on failure</title>
<updated>2015-04-22T06:58:45+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-04-06T21:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f93249794846f36c887803570f524968568c76ee'/>
<id>f93249794846f36c887803570f524968568c76ee</id>
<content type='text'>
commit deeb8525f9bcea60f5e86521880c1161de7a5829 upstream.

If we fail past the aio_setup_ring(), we need to destroy the
mapping.  We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.

Reproducer (based on one by Joe Mario &lt;jmario@redhat.com&gt;):

void count(char *p)
{
	char s[80];
	printf("%s: ", p);
	fflush(stdout);
	sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid());
	system(s);
}

int main()
{
	io_context_t *ctx;
	int created, limit, i, destroyed;
	FILE *f;

	count("before");
	if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
		perror("opening aio-max-nr");
	else if (fscanf(f, "%d", &amp;limit) != 1)
		fprintf(stderr, "can't parse aio-max-nr\n");
	else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
		perror("allocating aio_context_t array");
	else {
		for (i = 0, created = 0; i &lt; limit; i++) {
			if (io_setup(1000, ctx + created) == 0)
				created++;
		}
		for (i = 0, destroyed = 0; i &lt; created; i++)
			if (io_destroy(ctx[i]) == 0)
				destroyed++;
		printf("created %d, failed %d, destroyed %d\n",
			created, limit - created, destroyed);
		count("after");
	}
}

Found-by: Joe Mario &lt;jmario@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit deeb8525f9bcea60f5e86521880c1161de7a5829 upstream.

If we fail past the aio_setup_ring(), we need to destroy the
mapping.  We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.

Reproducer (based on one by Joe Mario &lt;jmario@redhat.com&gt;):

void count(char *p)
{
	char s[80];
	printf("%s: ", p);
	fflush(stdout);
	sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid());
	system(s);
}

int main()
{
	io_context_t *ctx;
	int created, limit, i, destroyed;
	FILE *f;

	count("before");
	if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
		perror("opening aio-max-nr");
	else if (fscanf(f, "%d", &amp;limit) != 1)
		fprintf(stderr, "can't parse aio-max-nr\n");
	else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
		perror("allocating aio_context_t array");
	else {
		for (i = 0, created = 0; i &lt; limit; i++) {
			if (io_setup(1000, ctx + created) == 0)
				created++;
		}
		for (i = 0, destroyed = 0; i &lt; created; i++)
			if (io_destroy(ctx[i]) == 0)
				destroyed++;
		printf("created %d, failed %d, destroyed %d\n",
			created, limit - created, destroyed);
		count("after");
	}
}

Found-by: Joe Mario &lt;jmario@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer</title>
<updated>2014-12-06T14:18:19+00:00</updated>
<author>
<name>Gu Zheng</name>
<email>guz.fnst@cn.fujitsu.com</email>
</author>
<published>2014-11-06T09:46:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cab86d8c6534900243994bb0558e66a41fcbc39d'/>
<id>cab86d8c6534900243994bb0558e66a41fcbc39d</id>
<content type='text'>
commit 835f252c6debd204fcd607c79975089b1ecd3472 upstream.

https://bugzilla.kernel.org/show_bug.cgi?id=86831

Markus reported that when shutting down mysqld (with AIO support,
on a ext3 formatted Harddrive) leads to a negative number of dirty pages
(underrun to the counter). The negative number results in a drastic reduction
of the write performance because the page cache is not used, because the kernel
thinks it is still 2 ^ 32 dirty pages open.

Add a warn trace in __dec_zone_state will catch this easily:

static inline void __dec_zone_state(struct zone *zone, enum
	zone_stat_item item)
{
     atomic_long_dec(&amp;zone-&gt;vm_stat[item]);
+    WARN_ON_ONCE(item == NR_FILE_DIRTY &amp;&amp;
	atomic_long_read(&amp;zone-&gt;vm_stat[item]) &lt; 0);
     atomic_long_dec(&amp;vm_stat[item]);
}

[   21.341632] ------------[ cut here ]------------
[   21.346294] WARNING: CPU: 0 PID: 309 at include/linux/vmstat.h:242
cancel_dirty_page+0x164/0x224()
[   21.355296] Modules linked in: wutbox_cp sata_mv
[   21.359968] CPU: 0 PID: 309 Comm: kworker/0:1 Not tainted 3.14.21-WuT #80
[   21.366793] Workqueue: events free_ioctx
[   21.370760] [&lt;c0016a64&gt;] (unwind_backtrace) from [&lt;c0012f88&gt;]
(show_stack+0x20/0x24)
[   21.378562] [&lt;c0012f88&gt;] (show_stack) from [&lt;c03f8ccc&gt;]
(dump_stack+0x24/0x28)
[   21.385840] [&lt;c03f8ccc&gt;] (dump_stack) from [&lt;c0023ae4&gt;]
(warn_slowpath_common+0x84/0x9c)
[   21.393976] [&lt;c0023ae4&gt;] (warn_slowpath_common) from [&lt;c0023bb8&gt;]
(warn_slowpath_null+0x2c/0x34)
[   21.402800] [&lt;c0023bb8&gt;] (warn_slowpath_null) from [&lt;c00c0688&gt;]
(cancel_dirty_page+0x164/0x224)
[   21.411524] [&lt;c00c0688&gt;] (cancel_dirty_page) from [&lt;c00c080c&gt;]
(truncate_inode_page+0x8c/0x158)
[   21.420272] [&lt;c00c080c&gt;] (truncate_inode_page) from [&lt;c00c0a94&gt;]
(truncate_inode_pages_range+0x11c/0x53c)
[   21.429890] [&lt;c00c0a94&gt;] (truncate_inode_pages_range) from
[&lt;c00c0f6c&gt;] (truncate_pagecache+0x88/0xac)
[   21.439252] [&lt;c00c0f6c&gt;] (truncate_pagecache) from [&lt;c00c0fec&gt;]
(truncate_setsize+0x5c/0x74)
[   21.447731] [&lt;c00c0fec&gt;] (truncate_setsize) from [&lt;c013b3a8&gt;]
(put_aio_ring_file.isra.14+0x34/0x90)
[   21.456826] [&lt;c013b3a8&gt;] (put_aio_ring_file.isra.14) from
[&lt;c013b424&gt;] (aio_free_ring+0x20/0xcc)
[   21.465660] [&lt;c013b424&gt;] (aio_free_ring) from [&lt;c013b4f4&gt;]
(free_ioctx+0x24/0x44)
[   21.473190] [&lt;c013b4f4&gt;] (free_ioctx) from [&lt;c003d8d8&gt;]
(process_one_work+0x134/0x47c)
[   21.481132] [&lt;c003d8d8&gt;] (process_one_work) from [&lt;c003e988&gt;]
(worker_thread+0x130/0x414)
[   21.489350] [&lt;c003e988&gt;] (worker_thread) from [&lt;c00448ac&gt;]
(kthread+0xd4/0xec)
[   21.496621] [&lt;c00448ac&gt;] (kthread) from [&lt;c000ec18&gt;]
(ret_from_fork+0x14/0x20)
[   21.503884] ---[ end trace 79c4bf42c038c9a1 ]---

The cause is that we set the aio ring file pages as *DIRTY* via SetPageDirty
(bypasses the VFS dirty pages increment) when init, and aio fs uses
*default_backing_dev_info* as the backing dev, which does not disable
the dirty pages accounting capability.
So truncating aio ring file will contribute to accounting dirty pages (VFS
dirty pages decrement), then error occurs.

The original goal is keeping these pages in memory (can not be reclaimed
or swapped) in life-time via marking it dirty. But thinking more, we have
already pinned pages via elevating the page's refcount, which can already
achieve the goal, so the SetPageDirty seems unnecessary.

In order to fix the issue, using the __set_page_dirty_no_writeback instead
of the nop .set_page_dirty, and dropped the SetPageDirty (don't manually
set the dirty flags, don't disable set_page_dirty(), rely on default behaviour).

With the above change, the dirty pages accounting can work well. But as we
known, aio fs is an anonymous one, which should never cause any real write-back,
we can ignore the dirty pages (write back) accounting by disabling the dirty
pages (write back) accounting capability. So we introduce an aio private
backing dev info (disabled the ACCT_DIRTY/WRITEBACK/ACCT_WB capabilities) to
replace the default one.

Reported-by: Markus Königshaus &lt;m.koenigshaus@wut.de&gt;
Signed-off-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 835f252c6debd204fcd607c79975089b1ecd3472 upstream.

https://bugzilla.kernel.org/show_bug.cgi?id=86831

Markus reported that when shutting down mysqld (with AIO support,
on a ext3 formatted Harddrive) leads to a negative number of dirty pages
(underrun to the counter). The negative number results in a drastic reduction
of the write performance because the page cache is not used, because the kernel
thinks it is still 2 ^ 32 dirty pages open.

Add a warn trace in __dec_zone_state will catch this easily:

static inline void __dec_zone_state(struct zone *zone, enum
	zone_stat_item item)
{
     atomic_long_dec(&amp;zone-&gt;vm_stat[item]);
+    WARN_ON_ONCE(item == NR_FILE_DIRTY &amp;&amp;
	atomic_long_read(&amp;zone-&gt;vm_stat[item]) &lt; 0);
     atomic_long_dec(&amp;vm_stat[item]);
}

[   21.341632] ------------[ cut here ]------------
[   21.346294] WARNING: CPU: 0 PID: 309 at include/linux/vmstat.h:242
cancel_dirty_page+0x164/0x224()
[   21.355296] Modules linked in: wutbox_cp sata_mv
[   21.359968] CPU: 0 PID: 309 Comm: kworker/0:1 Not tainted 3.14.21-WuT #80
[   21.366793] Workqueue: events free_ioctx
[   21.370760] [&lt;c0016a64&gt;] (unwind_backtrace) from [&lt;c0012f88&gt;]
(show_stack+0x20/0x24)
[   21.378562] [&lt;c0012f88&gt;] (show_stack) from [&lt;c03f8ccc&gt;]
(dump_stack+0x24/0x28)
[   21.385840] [&lt;c03f8ccc&gt;] (dump_stack) from [&lt;c0023ae4&gt;]
(warn_slowpath_common+0x84/0x9c)
[   21.393976] [&lt;c0023ae4&gt;] (warn_slowpath_common) from [&lt;c0023bb8&gt;]
(warn_slowpath_null+0x2c/0x34)
[   21.402800] [&lt;c0023bb8&gt;] (warn_slowpath_null) from [&lt;c00c0688&gt;]
(cancel_dirty_page+0x164/0x224)
[   21.411524] [&lt;c00c0688&gt;] (cancel_dirty_page) from [&lt;c00c080c&gt;]
(truncate_inode_page+0x8c/0x158)
[   21.420272] [&lt;c00c080c&gt;] (truncate_inode_page) from [&lt;c00c0a94&gt;]
(truncate_inode_pages_range+0x11c/0x53c)
[   21.429890] [&lt;c00c0a94&gt;] (truncate_inode_pages_range) from
[&lt;c00c0f6c&gt;] (truncate_pagecache+0x88/0xac)
[   21.439252] [&lt;c00c0f6c&gt;] (truncate_pagecache) from [&lt;c00c0fec&gt;]
(truncate_setsize+0x5c/0x74)
[   21.447731] [&lt;c00c0fec&gt;] (truncate_setsize) from [&lt;c013b3a8&gt;]
(put_aio_ring_file.isra.14+0x34/0x90)
[   21.456826] [&lt;c013b3a8&gt;] (put_aio_ring_file.isra.14) from
[&lt;c013b424&gt;] (aio_free_ring+0x20/0xcc)
[   21.465660] [&lt;c013b424&gt;] (aio_free_ring) from [&lt;c013b4f4&gt;]
(free_ioctx+0x24/0x44)
[   21.473190] [&lt;c013b4f4&gt;] (free_ioctx) from [&lt;c003d8d8&gt;]
(process_one_work+0x134/0x47c)
[   21.481132] [&lt;c003d8d8&gt;] (process_one_work) from [&lt;c003e988&gt;]
(worker_thread+0x130/0x414)
[   21.489350] [&lt;c003e988&gt;] (worker_thread) from [&lt;c00448ac&gt;]
(kthread+0xd4/0xec)
[   21.496621] [&lt;c00448ac&gt;] (kthread) from [&lt;c000ec18&gt;]
(ret_from_fork+0x14/0x20)
[   21.503884] ---[ end trace 79c4bf42c038c9a1 ]---

The cause is that we set the aio ring file pages as *DIRTY* via SetPageDirty
(bypasses the VFS dirty pages increment) when init, and aio fs uses
*default_backing_dev_info* as the backing dev, which does not disable
the dirty pages accounting capability.
So truncating aio ring file will contribute to accounting dirty pages (VFS
dirty pages decrement), then error occurs.

The original goal is keeping these pages in memory (can not be reclaimed
or swapped) in life-time via marking it dirty. But thinking more, we have
already pinned pages via elevating the page's refcount, which can already
achieve the goal, so the SetPageDirty seems unnecessary.

In order to fix the issue, using the __set_page_dirty_no_writeback instead
of the nop .set_page_dirty, and dropped the SetPageDirty (don't manually
set the dirty flags, don't disable set_page_dirty(), rely on default behaviour).

With the above change, the dirty pages accounting can work well. But as we
known, aio fs is an anonymous one, which should never cause any real write-back,
we can ignore the dirty pages (write back) accounting by disabling the dirty
pages (write back) accounting capability. So we introduce an aio private
backing dev info (disabled the ACCT_DIRTY/WRITEBACK/ACCT_WB capabilities) to
replace the default one.

Reported-by: Markus Königshaus &lt;m.koenigshaus@wut.de&gt;
Signed-off-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: block exit_aio() until all context requests are completed</title>
<updated>2014-10-13T13:41:41+00:00</updated>
<author>
<name>Gu Zheng</name>
<email>guz.fnst@cn.fujitsu.com</email>
</author>
<published>2014-09-03T09:45:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=88e350c62c63bb4ae5887bfbbad88e3354e7ce6f'/>
<id>88e350c62c63bb4ae5887bfbbad88e3354e7ce6f</id>
<content type='text'>
commit 6098b45b32e6baeacc04790773ced9340601d511 upstream.

It seems that exit_aio() also needs to wait for all iocbs to complete (like
io_destroy), but we missed the wait step in current implemention, so fix
it in the same way as we did in io_destroy.

Signed-off-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6098b45b32e6baeacc04790773ced9340601d511 upstream.

It seems that exit_aio() also needs to wait for all iocbs to complete (like
io_destroy), but we missed the wait step in current implemention, so fix
it in the same way as we did in io_destroy.

Signed-off-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: add missing smp_rmb() in read_events_ring</title>
<updated>2014-09-26T09:23:43+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2014-09-02T17:17:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0fbdd4f77c9051e4f23b943aa8d373d151679f0d'/>
<id>0fbdd4f77c9051e4f23b943aa8d373d151679f0d</id>
<content type='text'>
commit 2ff396be602f10b5eab8e73b24f20348fa2de159 upstream.

We ran into a case on ppc64 running mariadb where io_getevents would
return zeroed out I/O events.  After adding instrumentation, it became
clear that there was some missing synchronization between reading the
tail pointer and the events themselves.  This small patch fixes the
problem in testing.

Thanks to Zach for helping to look into this, and suggesting the fix.

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2ff396be602f10b5eab8e73b24f20348fa2de159 upstream.

We ran into a case on ppc64 running mariadb where io_getevents would
return zeroed out I/O events.  After adding instrumentation, it became
clear that there was some missing synchronization between reading the
tail pointer and the events themselves.  This small patch fixes the
problem in testing.

Thanks to Zach for helping to look into this, and suggesting the fix.

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>aio: protect reqs_available updates from changes in interrupt handlers</title>
<updated>2014-07-29T15:01:48+00:00</updated>
<author>
<name>Benjamin LaHaise</name>
<email>bcrl@kvack.org</email>
</author>
<published>2014-07-14T16:49:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=60714352c491ab477ae767582d18059e2534c5a7'/>
<id>60714352c491ab477ae767582d18059e2534c5a7</id>
<content type='text'>
commit 263782c1c95bbddbb022dc092fd89a36bb8d5577 upstream.

As of commit f8567a3845ac05bb28f3c1b478ef752762bd39ef it is now possible to
have put_reqs_available() called from irq context.  While put_reqs_available()
is per cpu, it did not protect itself from interrupts on the same CPU.  This
lead to aio_complete() corrupting the available io requests count when run
under a heavy O_DIRECT workloads as reported by Robert Elliott.  Fix this by
disabling irq updates around the per cpu batch updates of reqs_available.

Many thanks to Robert and folks for testing and tracking this down.

Reported-by: Robert Elliot &lt;Elliott@hp.com&gt;
Tested-by: Robert Elliot &lt;Elliott@hp.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;, Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 263782c1c95bbddbb022dc092fd89a36bb8d5577 upstream.

As of commit f8567a3845ac05bb28f3c1b478ef752762bd39ef it is now possible to
have put_reqs_available() called from irq context.  While put_reqs_available()
is per cpu, it did not protect itself from interrupts on the same CPU.  This
lead to aio_complete() corrupting the available io requests count when run
under a heavy O_DIRECT workloads as reported by Robert Elliott.  Fix this by
disabling irq updates around the per cpu batch updates of reqs_available.

Many thanks to Robert and folks for testing and tracking this down.

Reported-by: Robert Elliot &lt;Elliott@hp.com&gt;
Tested-by: Robert Elliot &lt;Elliott@hp.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;, Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "aio: fix kernel memory disclosure in io_getevents() introduced in v3.10"</title>
<updated>2014-07-14T13:21:39+00:00</updated>
<author>
<name>Jiri Slaby</name>
<email>jslaby@suse.cz</email>
</author>
<published>2014-07-14T13:20:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=48e8cad86bb1241c08bdaa80db022c25068ff8e0'/>
<id>48e8cad86bb1241c08bdaa80db022c25068ff8e0</id>
<content type='text'>
This reverts commit 0e2e24e5dc6eb6f0698e9dc97e652f132b885624, which
was applied twice mistakenly. The first one is
bee3f7b8188d4b2a5dfaeb2eb4a68d99f67daecf.

Reported-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Cc: Petr Matousek &lt;pmatouse@redhat.com&gt;
Cc: Kent Overstreet &lt;kmo@daterainc.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 0e2e24e5dc6eb6f0698e9dc97e652f132b885624, which
was applied twice mistakenly. The first one is
bee3f7b8188d4b2a5dfaeb2eb4a68d99f67daecf.

Reported-by: Gu Zheng &lt;guz.fnst@cn.fujitsu.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
Cc: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Cc: Petr Matousek &lt;pmatouse@redhat.com&gt;
Cc: Kent Overstreet &lt;kmo@daterainc.com&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
