<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/aio.c, branch v4.0</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ioctx_alloc(): fix vma (and file) leak on failure</title>
<updated>2015-04-06T21:57:44+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-04-06T21:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=deeb8525f9bcea60f5e86521880c1161de7a5829'/>
<id>deeb8525f9bcea60f5e86521880c1161de7a5829</id>
<content type='text'>
If we fail past the aio_setup_ring(), we need to destroy the
mapping.  We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.

Reproducer (based on one by Joe Mario &lt;jmario@redhat.com&gt;):

void count(char *p)
{
	char s[80];
	printf("%s: ", p);
	fflush(stdout);
	sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid());
	system(s);
}

int main()
{
	io_context_t *ctx;
	int created, limit, i, destroyed;
	FILE *f;

	count("before");
	if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
		perror("opening aio-max-nr");
	else if (fscanf(f, "%d", &amp;limit) != 1)
		fprintf(stderr, "can't parse aio-max-nr\n");
	else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
		perror("allocating aio_context_t array");
	else {
		for (i = 0, created = 0; i &lt; limit; i++) {
			if (io_setup(1000, ctx + created) == 0)
				created++;
		}
		for (i = 0, destroyed = 0; i &lt; created; i++)
			if (io_destroy(ctx[i]) == 0)
				destroyed++;
		printf("created %d, failed %d, destroyed %d\n",
			created, limit - created, destroyed);
		count("after");
	}
}

Found-by: Joe Mario &lt;jmario@redhat.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we fail past the aio_setup_ring(), we need to destroy the
mapping.  We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.

Reproducer (based on one by Joe Mario &lt;jmario@redhat.com&gt;):

void count(char *p)
{
	char s[80];
	printf("%s: ", p);
	fflush(stdout);
	sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid());
	system(s);
}

int main()
{
	io_context_t *ctx;
	int created, limit, i, destroyed;
	FILE *f;

	count("before");
	if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
		perror("opening aio-max-nr");
	else if (fscanf(f, "%d", &amp;limit) != 1)
		fprintf(stderr, "can't parse aio-max-nr\n");
	else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
		perror("allocating aio_context_t array");
	else {
		for (i = 0, created = 0; i &lt; limit; i++) {
			if (io_setup(1000, ctx + created) == 0)
				created++;
		}
		for (i = 0, destroyed = 0; i &lt; created; i++)
			if (io_destroy(ctx[i]) == 0)
				destroyed++;
		printf("created %d, failed %d, destroyed %d\n",
			created, limit - created, destroyed);
		count("after");
	}
}

Found-by: Joe Mario &lt;jmario@redhat.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fix mremap() vs. ioctx_kill() race</title>
<updated>2015-04-06T21:50:59+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-04-06T21:48:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b2edffdd912b4205899a8efa0974dfbbc3216109'/>
<id>b2edffdd912b4205899a8efa0974dfbbc3216109</id>
<content type='text'>
teach -&gt;mremap() method to return an error and have it fail for
aio mappings in process of being killed

Note that in case of -&gt;mremap() failure we need to undo move_page_tables()
we'd already done; we could call -&gt;mremap() first, but then the failure of
move_page_tables() would require undoing whatever _successful_ -&gt;mremap()
has done, which would be a lot more headache in general.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
teach -&gt;mremap() method to return an error and have it fail for
aio mappings in process of being killed

Note that in case of -&gt;mremap() failure we need to undo move_page_tables()
we'd already done; we could call -&gt;mremap() first, but then the failure of
move_page_tables() would require undoing whatever _successful_ -&gt;mremap()
has done, which would be a lot more headache in general.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs/aio.c: Remove duplicate function name in pr_debug messages</title>
<updated>2015-02-20T09:56:44+00:00</updated>
<author>
<name>Kinglong Mee</name>
<email>kinglongmee@gmail.com</email>
</author>
<published>2015-02-04T13:15:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=acd88d4e1af3c6c55f679c202cd517dff7ea9c6f'/>
<id>acd88d4e1af3c6c55f679c202cd517dff7ea9c6f</id>
<content type='text'>
Have defined pr_fmt as below in fs/aio.c, so remove duplicate
function name in pr_debug message.

#define pr_fmt(fmt) "%s: " fmt, __func__

Signed-off-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Have defined pr_fmt as below in fs/aio.c, so remove duplicate
function name in pr_debug message.

#define pr_fmt(fmt) "%s: " fmt, __func__

Signed-off-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block</title>
<updated>2015-02-12T21:50:21+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-02-12T21:50:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6bec0035286119eefc32a5b1102127e6a4032cb2'/>
<id>6bec0035286119eefc32a5b1102127e6a4032cb2</id>
<content type='text'>
Pull backing device changes from Jens Axboe:
 "This contains a cleanup of how the backing device is handled, in
  preparation for a rework of the life time rules.  In this part, the
  most important change is to split the unrelated nommu mmap flags from
  it, but also removing a backing_dev_info pointer from the
  address_space (and inode), and a cleanup of other various minor bits.

  Christoph did all the work here, I just fixed an oops with pages that
  have a swap backing.  Arnd fixed a missing export, and Oleg killed the
  lustre backing_dev_info from staging.  Last patch was from Al,
  unexporting parts that are now no longer needed outside"

* 'for-3.20/bdi' of git://git.kernel.dk/linux-block:
  Make super_blocks and sb_lock static
  mtd: export new mtd_mmap_capabilities
  fs: make inode_to_bdi() handle NULL inode
  staging/lustre/llite: get rid of backing_dev_info
  fs: remove default_backing_dev_info
  fs: don't reassign dirty inodes to default_backing_dev_info
  nfs: don't call bdi_unregister
  ceph: remove call to bdi_unregister
  fs: remove mapping-&gt;backing_dev_info
  fs: export inode_to_bdi and use it in favor of mapping-&gt;backing_dev_info
  nilfs2: set up s_bdi like the generic mount_bdev code
  block_dev: get bdev inode bdi directly from the block device
  block_dev: only write bdev inode on close
  fs: introduce f_op-&gt;mmap_capabilities for nommu mmap support
  fs: kill BDI_CAP_SWAP_BACKED
  fs: deduplicate noop_backing_dev_info
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull backing device changes from Jens Axboe:
 "This contains a cleanup of how the backing device is handled, in
  preparation for a rework of the life time rules.  In this part, the
  most important change is to split the unrelated nommu mmap flags from
  it, but also removing a backing_dev_info pointer from the
  address_space (and inode), and a cleanup of other various minor bits.

  Christoph did all the work here, I just fixed an oops with pages that
  have a swap backing.  Arnd fixed a missing export, and Oleg killed the
  lustre backing_dev_info from staging.  Last patch was from Al,
  unexporting parts that are now no longer needed outside"

* 'for-3.20/bdi' of git://git.kernel.dk/linux-block:
  Make super_blocks and sb_lock static
  mtd: export new mtd_mmap_capabilities
  fs: make inode_to_bdi() handle NULL inode
  staging/lustre/llite: get rid of backing_dev_info
  fs: remove default_backing_dev_info
  fs: don't reassign dirty inodes to default_backing_dev_info
  nfs: don't call bdi_unregister
  ceph: remove call to bdi_unregister
  fs: remove mapping-&gt;backing_dev_info
  fs: export inode_to_bdi and use it in favor of mapping-&gt;backing_dev_info
  nilfs2: set up s_bdi like the generic mount_bdev code
  block_dev: get bdev inode bdi directly from the block device
  block_dev: only write bdev inode on close
  fs: introduce f_op-&gt;mmap_capabilities for nommu mmap support
  fs: kill BDI_CAP_SWAP_BACKED
  fs: deduplicate noop_backing_dev_info
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: annotate aio_read_event_ring for sleep patterns</title>
<updated>2015-02-04T00:29:05+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2015-02-04T00:29:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9c9ce763b114e608b5cf3aa6cb69ad9f6e8b6adf'/>
<id>9c9ce763b114e608b5cf3aa6cb69ad9f6e8b6adf</id>
<content type='text'>
Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw
warnings like the following due to being called from wait_event
context:

 WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90()
 do not call blocking ops when !TASK_RUNNING; state=1 set at [&lt;ffffffff810d85a3&gt;] prepare_to_wait_event+0x63/0x110
 Modules linked in:
 CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8
  ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8
  ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300
 Call Trace:
  [&lt;ffffffff81daf2bd&gt;] dump_stack+0x4c/0x65
  [&lt;ffffffff8109beda&gt;] warn_slowpath_common+0x8a/0xc0
  [&lt;ffffffff8109bf56&gt;] warn_slowpath_fmt+0x46/0x50
  [&lt;ffffffff810d85a3&gt;] ? prepare_to_wait_event+0x63/0x110
  [&lt;ffffffff810d85a3&gt;] ? prepare_to_wait_event+0x63/0x110
  [&lt;ffffffff810bdfcf&gt;] __might_sleep+0x7f/0x90
  [&lt;ffffffff81db8344&gt;] mutex_lock+0x24/0x45
  [&lt;ffffffff81216b7c&gt;] aio_read_events+0x4c/0x290
  [&lt;ffffffff81216fac&gt;] read_events+0x1ec/0x220
  [&lt;ffffffff810d8650&gt;] ? prepare_to_wait_event+0x110/0x110
  [&lt;ffffffff810fdb10&gt;] ? hrtimer_get_res+0x50/0x50
  [&lt;ffffffff8121899d&gt;] SyS_io_getevents+0x4d/0xb0
  [&lt;ffffffff81dba5a9&gt;] system_call_fastpath+0x12/0x17
 ---[ end trace bde69eaf655a4fea ]---

There is not actually a bug here, so annotate the code to tell the
debug logic that everything is just fine and not to fire a false
positive.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw
warnings like the following due to being called from wait_event
context:

 WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90()
 do not call blocking ops when !TASK_RUNNING; state=1 set at [&lt;ffffffff810d85a3&gt;] prepare_to_wait_event+0x63/0x110
 Modules linked in:
 CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8
  ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8
  ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300
 Call Trace:
  [&lt;ffffffff81daf2bd&gt;] dump_stack+0x4c/0x65
  [&lt;ffffffff8109beda&gt;] warn_slowpath_common+0x8a/0xc0
  [&lt;ffffffff8109bf56&gt;] warn_slowpath_fmt+0x46/0x50
  [&lt;ffffffff810d85a3&gt;] ? prepare_to_wait_event+0x63/0x110
  [&lt;ffffffff810d85a3&gt;] ? prepare_to_wait_event+0x63/0x110
  [&lt;ffffffff810bdfcf&gt;] __might_sleep+0x7f/0x90
  [&lt;ffffffff81db8344&gt;] mutex_lock+0x24/0x45
  [&lt;ffffffff81216b7c&gt;] aio_read_events+0x4c/0x290
  [&lt;ffffffff81216fac&gt;] read_events+0x1ec/0x220
  [&lt;ffffffff810d8650&gt;] ? prepare_to_wait_event+0x110/0x110
  [&lt;ffffffff810fdb10&gt;] ? hrtimer_get_res+0x50/0x50
  [&lt;ffffffff8121899d&gt;] SyS_io_getevents+0x4d/0xb0
  [&lt;ffffffff81dba5a9&gt;] system_call_fastpath+0x12/0x17
 ---[ end trace bde69eaf655a4fea ]---

There is not actually a bug here, so annotate the code to tell the
debug logic that everything is just fine and not to fire a false
positive.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: remove mapping-&gt;backing_dev_info</title>
<updated>2015-01-20T21:03:05+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2015-01-14T09:42:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b83ae6d421435c6204150300f1c25bfbd39cd62b'/>
<id>b83ae6d421435c6204150300f1c25bfbd39cd62b</id>
<content type='text'>
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Reviewed-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Reviewed-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: introduce f_op-&gt;mmap_capabilities for nommu mmap support</title>
<updated>2015-01-20T21:02:58+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2015-01-14T09:42:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b4caecd48005fbed3949dde6c1cb233142fd69e9'/>
<id>b4caecd48005fbed3949dde6c1cb233142fd69e9</id>
<content type='text'>
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.

Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead.  Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device.  It also removes the need for
the mtd_inodefs filesystem.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Brian Norris &lt;computersforpeace@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.

Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead.  Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device.  It also removes the need for
the mtd_inodefs filesystem.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Brian Norris &lt;computersforpeace@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: Skip timer for io_getevents if timeout=0</title>
<updated>2014-12-13T22:50:20+00:00</updated>
<author>
<name>Fam Zheng</name>
<email>famz@redhat.com</email>
</author>
<published>2014-11-06T12:44:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f785de588735306ec4d7c875caf9d28481c8b21'/>
<id>5f785de588735306ec4d7c875caf9d28481c8b21</id>
<content type='text'>
In this case, it is basically a polling. Let's not involve timer at all
because that would hurt performance for application event loops.

In an arbitrary test I've done, io_getevents syscall elapsed time
reduces from 50000+ nanoseconds to a few hundereds.

Signed-off-by: Fam Zheng &lt;famz@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In this case, it is basically a polling. Let's not involve timer at all
because that would hurt performance for application event loops.

In an arbitrary test I've done, io_getevents syscall elapsed time
reduces from 50000+ nanoseconds to a few hundereds.

Signed-off-by: Fam Zheng &lt;famz@redhat.com&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>aio: Make it possible to remap aio ring</title>
<updated>2014-12-13T22:49:50+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@parallels.com</email>
</author>
<published>2014-09-18T15:56:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e4a0d3e720e7e508749c1439b5ba3aff56c92976'/>
<id>e4a0d3e720e7e508749c1439b5ba3aff56c92976</id>
<content type='text'>
There are actually two issues this patch addresses. Let me start with
the one I tried to solve in the beginning.

So, in the checkpoint-restore project (criu) we try to dump tasks'
state and restore one back exactly as it was. One of the tasks' state
bits is rings set up with io_setup() call. There's (almost) no problems
in dumping them, there's a problem restoring them -- if I dump a task
with aio ring originally mapped at address A, I want to restore one
back at exactly the same address A. Unfortunately, the io_setup() does
not allow for that -- it mmaps the ring at whatever place mm finds
appropriate (it calls do_mmap_pgoff() with zero address and without
the MAP_FIXED flag).

To make restore possible I'm going to mremap() the freshly created ring
into the address A (under which it was seen before dump). The problem is
that the ring's virtual address is passed back to the user-space as the
context ID and this ID is then used as search key by all the other io_foo()
calls. Reworking this ID to be just some integer doesn't seem to work, as
this value is already used by libaio as a pointer using which this library
accesses memory for aio meta-data.

So, to make restore work we need to make sure that

a) ring is mapped at desired virtual address
b) kioctx-&gt;user_id matches this value

Having said that, the patch makes mremap() on aio region update the
kioctx's user_id and mmap_base values.

Here appears the 2nd issue I mentioned in the beginning of this mail.
If (regardless of the C/R dances I do) someone creates an io context
with io_setup(), then mremap()-s the ring and then destroys the context,
the kill_ioctx() routine will call munmap() on wrong (old) address.
This will result in a) aio ring remaining in memory and b) some other
vma get unexpectedly unmapped.

What do you think?

Signed-off-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are actually two issues this patch addresses. Let me start with
the one I tried to solve in the beginning.

So, in the checkpoint-restore project (criu) we try to dump tasks'
state and restore one back exactly as it was. One of the tasks' state
bits is rings set up with io_setup() call. There's (almost) no problems
in dumping them, there's a problem restoring them -- if I dump a task
with aio ring originally mapped at address A, I want to restore one
back at exactly the same address A. Unfortunately, the io_setup() does
not allow for that -- it mmaps the ring at whatever place mm finds
appropriate (it calls do_mmap_pgoff() with zero address and without
the MAP_FIXED flag).

To make restore possible I'm going to mremap() the freshly created ring
into the address A (under which it was seen before dump). The problem is
that the ring's virtual address is passed back to the user-space as the
context ID and this ID is then used as search key by all the other io_foo()
calls. Reworking this ID to be just some integer doesn't seem to work, as
this value is already used by libaio as a pointer using which this library
accesses memory for aio meta-data.

So, to make restore work we need to make sure that

a) ring is mapped at desired virtual address
b) kioctx-&gt;user_id matches this value

Having said that, the patch makes mremap() on aio region update the
kioctx's user_id and mmap_base values.

Here appears the 2nd issue I mentioned in the beginning of this mail.
If (regardless of the C/R dances I do) someone creates an io context
with io_setup(), then mremap()-s the ring and then destroys the context,
the kill_ioctx() routine will call munmap() on wrong (old) address.
This will result in a) aio ring remaining in memory and b) some other
vma get unexpectedly unmapped.

What do you think?

Signed-off-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Benjamin LaHaise &lt;bcrl@kvack.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kvack.org/~bcrl/aio-fixes</title>
<updated>2014-11-26T02:55:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-11-26T02:55:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=277f850fbc4a0c732311af2b9a2d28d29a3c5582'/>
<id>277f850fbc4a0c732311af2b9a2d28d29a3c5582</id>
<content type='text'>
Pull aio fix from Ben LaHaise:
 "Dirty page accounting fix for aio"

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull aio fix from Ben LaHaise:
 "Dirty page accounting fix for aio"

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer
</pre>
</div>
</content>
</entry>
</feed>
