<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm, branch v3.0-rc5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>memcg: fix direct softlimit reclaim to be called in limit path</title>
<updated>2011-06-28T01:00:13+00:00</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2011-06-27T23:18:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ac34a1a3c39da0a1b9188d12a9ce85506364ed2a'/>
<id>ac34a1a3c39da0a1b9188d12a9ce85506364ed2a</id>
<content type='text'>
Commit d149e3b25d7c ("memcg: add the soft_limit reclaim in global direct
reclaim") adds a softlimit hook to shrink_zones().  By this, soft limit
is called as

   try_to_free_pages()
       do_try_to_free_pages()
           shrink_zones()
               mem_cgroup_soft_limit_reclaim()

Then, direct reclaim is memcg softlimit hint aware, now.

But, the memory cgroup's "limit" path can call softlimit shrinker.

   try_to_free_mem_cgroup_pages()
       do_try_to_free_pages()
           shrink_zones()
               mem_cgroup_soft_limit_reclaim()

This will cause a global reclaim when a memcg hits limit.

This is bug. soft_limit_reclaim() should be called when
scanning_global_lru(sc) == true.

And the commit adds a variable "total_scanned" for counting softlimit
scanned pages....it's not "total".  This patch removes the variable and
update sc-&gt;nr_scanned instead of it.  This will affect shrink_slab()'s
scan condition but, global LRU is scanned by softlimit and I think this
change makes sense.

TODO: avoid too much scanning of a zone when softlimit did enough work.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit d149e3b25d7c ("memcg: add the soft_limit reclaim in global direct
reclaim") adds a softlimit hook to shrink_zones().  By this, soft limit
is called as

   try_to_free_pages()
       do_try_to_free_pages()
           shrink_zones()
               mem_cgroup_soft_limit_reclaim()

Then, direct reclaim is memcg softlimit hint aware, now.

But, the memory cgroup's "limit" path can call softlimit shrinker.

   try_to_free_mem_cgroup_pages()
       do_try_to_free_pages()
           shrink_zones()
               mem_cgroup_soft_limit_reclaim()

This will cause a global reclaim when a memcg hits limit.

This is bug. soft_limit_reclaim() should be called when
scanning_global_lru(sc) == true.

And the commit adds a variable "total_scanned" for counting softlimit
scanned pages....it's not "total".  This patch removes the variable and
update sc-&gt;nr_scanned instead of it.  This will affect shrink_slab()'s
scan condition but, global LRU is scanned by softlimit and I think this
change makes sense.

TODO: avoid too much scanning of a zone when softlimit did enough work.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix assertion mapping-&gt;nrpages == 0 in end_writeback()</title>
<updated>2011-06-28T01:00:13+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2011-06-27T23:18:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=08142579b6ca35883c1ed066a2681de6f6917062'/>
<id>08142579b6ca35883c1ed066a2681de6f6917062</id>
<content type='text'>
Under heavy memory and filesystem load, users observe the assertion
mapping-&gt;nrpages == 0 in end_writeback() trigger.  This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:

	CPU0				CPU1
  ...
  shrink_page_list()
    __remove_mapping()
      __delete_from_page_cache()
        radix_tree_delete()
					evict_inode()
					  truncate_inode_pages()
					    truncate_inode_pages_range()
					      pagevec_lookup() - finds nothing
					  end_writeback()
					    mapping-&gt;nrpages != 0 -&gt; BUG
        page-&gt;mapping = NULL
        mapping-&gt;nrpages--

Fix the problem by doing a reliable check of mapping-&gt;nrpages under
mapping-&gt;tree_lock in end_writeback().

Analyzed by Jay &lt;jinshan.xiong@whamcloud.com&gt;, lost in LKML, and dug out
by Miklos Szeredi &lt;mszeredi@suse.de&gt;.

Cc: Jay &lt;jinshan.xiong@whamcloud.com&gt;
Cc: Miklos Szeredi &lt;mszeredi@suse.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under heavy memory and filesystem load, users observe the assertion
mapping-&gt;nrpages == 0 in end_writeback() trigger.  This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:

	CPU0				CPU1
  ...
  shrink_page_list()
    __remove_mapping()
      __delete_from_page_cache()
        radix_tree_delete()
					evict_inode()
					  truncate_inode_pages()
					    truncate_inode_pages_range()
					      pagevec_lookup() - finds nothing
					  end_writeback()
					    mapping-&gt;nrpages != 0 -&gt; BUG
        page-&gt;mapping = NULL
        mapping-&gt;nrpages--

Fix the problem by doing a reliable check of mapping-&gt;nrpages under
mapping-&gt;tree_lock in end_writeback().

Analyzed by Jay &lt;jinshan.xiong@whamcloud.com&gt;, lost in LKML, and dug out
by Miklos Szeredi &lt;mszeredi@suse.de&gt;.

Cc: Jay &lt;jinshan.xiong@whamcloud.com&gt;
Cc: Miklos Szeredi &lt;mszeredi@suse.de&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure.c: fix spinlock vs mutex order</title>
<updated>2011-06-28T01:00:13+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2011-06-27T23:18:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9b679320a5fbf46454011e5c62e0b8991b0956d1'/>
<id>9b679320a5fbf46454011e5c62e0b8991b0956d1</id>
<content type='text'>
We cannot take a mutex while holding a spinlock, so flip the order and
fix the locking documentation.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We cannot take a mutex while holding a spinlock, so flip the order and
fix the locking documentation.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tmpfs: add shmem_read_mapping_page_gfp</title>
<updated>2011-06-28T01:00:12+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2011-06-27T23:18:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d9d90e5eb70e09903dadff42099b6c948f814050'/>
<id>d9d90e5eb70e09903dadff42099b6c948f814050</id>
<content type='text'>
Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
is unsuited to tmpfs, because it inserts a page into pagecache before
calling the filesystem's -&gt;readpage: tmpfs may have pages in swapcache
which only it knows how to locate and switch to filecache.

At present tmpfs provides a -&gt;readpage method, and copes with this by
copying pages; but soon we can simplify it by removing its -&gt;readpage.
Provide shmem_read_mapping_page_gfp() now, ready for that transition,

Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
with shmem_read_mapping_page() inline for the common mapping_gfp case.

(shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
read_mapping_page functions use the mapping's -&gt;readpage, and the
read_cache_page functions use the supplied filler, so I think
read_cache_page_gfp was slightly misnamed.)

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
is unsuited to tmpfs, because it inserts a page into pagecache before
calling the filesystem's -&gt;readpage: tmpfs may have pages in swapcache
which only it knows how to locate and switch to filecache.

At present tmpfs provides a -&gt;readpage method, and copes with this by
copying pages; but soon we can simplify it by removing its -&gt;readpage.
Provide shmem_read_mapping_page_gfp() now, ready for that transition,

Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
with shmem_read_mapping_page() inline for the common mapping_gfp case.

(shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
read_mapping_page functions use the mapping's -&gt;readpage, and the
read_cache_page functions use the supplied filler, so I think
read_cache_page_gfp was slightly misnamed.)

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tmpfs: take control of its truncate_range</title>
<updated>2011-06-28T01:00:12+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2011-06-27T23:18:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=94c1e62df4494b79782cb9c7279f827212d1de70'/>
<id>94c1e62df4494b79782cb9c7279f827212d1de70</id>
<content type='text'>
2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.

Slightly redefine the -&gt;truncate_range interface: let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its -&gt;setattr.

Let's rename shmem_notify_change() to shmem_setattr().  Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform its
own variant of truncate_inode_pages_range().

Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after -&gt;truncate_range,
there is no need to call it a third time.

Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
that i915_gem_object_truncate() can call it explicitly in future; get
this patch in first, then update drm/i915 once this is available (until
then, i915 will just be doing the truncate_inode_pages() twice).

Though introduced five years ago, no other filesystem is implementing
-&gt;truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
whereupon -&gt;truncate_range can be removed from inode_operations -
shmem_truncate_range() will help i915 across that transition too.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.

Slightly redefine the -&gt;truncate_range interface: let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its -&gt;setattr.

Let's rename shmem_notify_change() to shmem_setattr().  Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform its
own variant of truncate_inode_pages_range().

Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after -&gt;truncate_range,
there is no need to call it a third time.

Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
that i915_gem_object_truncate() can call it explicitly in future; get
this patch in first, then update drm/i915 once this is available (until
then, i915 will just be doing the truncate_inode_pages() twice).

Though introduced five years ago, no other filesystem is implementing
-&gt;truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
whereupon -&gt;truncate_range can be removed from inode_operations -
shmem_truncate_range() will help i915 across that transition too.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: move shmem prototypes to shmem_fs.h</title>
<updated>2011-06-28T01:00:12+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2011-06-27T23:18:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=072441e21ddcd1140606b7d4ef6eab579a86b0b3'/>
<id>072441e21ddcd1140606b7d4ef6eab579a86b0b3</id>
<content type='text'>
Before adding any more global entry points into shmem.c, gather such
prototypes into shmem_fs.h.  Remove mm's own declarations from swap.h,
but for now leave the ones in mm.h: because shmem_file_setup() and
shmem_zero_setup() are called from various places, and we should not
force other subsystems to update immediately.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before adding any more global entry points into shmem.c, gather such
prototypes into shmem_fs.h.  Remove mm's own declarations from swap.h,
but for now leave the ones in mm.h: because shmem_file_setup() and
shmem_zero_setup() are called from various places, and we should not
force other subsystems to update immediately.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: move vmtruncate_range to truncate.c</title>
<updated>2011-06-28T01:00:12+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2011-06-27T23:18:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5b8ba10198a109f8a02380648c5d29000caa9c55'/>
<id>5b8ba10198a109f8a02380648c5d29000caa9c55</id>
<content type='text'>
You would expect to find vmtruncate_range() next to vmtruncate() in
mm/truncate.c: move it there.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
You would expect to find vmtruncate_range() next to vmtruncate() in
mm/truncate.c: move it there.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, hotplug: protect zonelist building with zonelists_mutex</title>
<updated>2011-06-23T04:06:48+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-06-23T01:13:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f957db4fcdd8f03e186aa8f041f4049e76ab741c'/>
<id>f957db4fcdd8f03e186aa8f041f4049e76ab741c</id>
<content type='text'>
Commit 959ecc48fc75 ("mm/memory_hotplug.c: fix building of node hotplug
zonelist") does not protect the build_all_zonelists() call with
zonelists_mutex as needed.  This can lead to races in constructing
zonelist ordering if a concurrent build is underway.  Protecting this
with lock_memory_hotplug() is insufficient since zonelists can be
rebuild though sysfs as well.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 959ecc48fc75 ("mm/memory_hotplug.c: fix building of node hotplug
zonelist") does not protect the build_all_zonelists() call with
zonelists_mutex as needed.  This can lead to races in constructing
zonelist ordering if a concurrent build is underway.  Protecting this
with lock_memory_hotplug() is insufficient since zonelists can be
rebuild though sysfs as well.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, hotplug: fix error handling in mem_online_node()</title>
<updated>2011-06-23T04:06:47+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-06-23T01:13:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7553e8f2d5161a2b7a9b7a9f37be1b77e735552f'/>
<id>7553e8f2d5161a2b7a9b7a9f37be1b77e735552f</id>
<content type='text'>
The error handling in mem_online_node() is incorrect: hotadd_new_pgdat()
returns NULL if the new pgdat could not have been allocated and a pointer
to it otherwise.

mem_online_node() should fail if hotadd_new_pgdat() fails, not the
inverse.  This fixes an issue when memoryless nodes are not onlined and
their sysfs interface is not registered when their first cpu is brought
up.

The bug was introduced by commit cf23422b9d76 ("cpu/mem hotplug: enable
CPUs online before local memory online") iow v2.6.35.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The error handling in mem_online_node() is incorrect: hotadd_new_pgdat()
returns NULL if the new pgdat could not have been allocated and a pointer
to it otherwise.

mem_online_node() should fail if hotadd_new_pgdat() fails, not the
inverse.  This fixes an issue when memoryless nodes are not onlined and
their sysfs interface is not registered when their first cpu is brought
up.

The bug was introduced by commit cf23422b9d76 ("cpu/mem hotplug: enable
CPUs online before local memory online") iow v2.6.35.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: avoid anon_vma_chain allocation under anon_vma lock</title>
<updated>2011-06-18T02:24:11+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-06-18T02:05:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dd34739c03f2f9a79403d33419c2e61e11b4c403'/>
<id>dd34739c03f2f9a79403d33419c2e61e11b4c403</id>
<content type='text'>
Hugh Dickins points out that lockdep (correctly) spots a potential
deadlock on the anon_vma lock, because we now do a GFP_KERNEL allocation
of anon_vma_chain while doing anon_vma_clone().  The problem is that
page reclaim will want to take the anon_vma lock of any anonymous pages
that it will try to reclaim.

So re-organize the code in anon_vma_clone() slightly: first do just a
GFP_NOWAIT allocation, which will usually work fine.  But if that fails,
let's just drop the lock and re-do the allocation, now with GFP_KERNEL.

End result: not only do we avoid the locking problem, this also ends up
getting better concurrency in case the allocation does need to block.
Tim Chen reports that with all these anon_vma locking tweaks, we're now
almost back up to the spinlock performance.

Reported-and-tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Tested-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hugh Dickins points out that lockdep (correctly) spots a potential
deadlock on the anon_vma lock, because we now do a GFP_KERNEL allocation
of anon_vma_chain while doing anon_vma_clone().  The problem is that
page reclaim will want to take the anon_vma lock of any anonymous pages
that it will try to reclaim.

So re-organize the code in anon_vma_clone() slightly: first do just a
GFP_NOWAIT allocation, which will usually work fine.  But if that fails,
let's just drop the lock and re-do the allocation, now with GFP_KERNEL.

End result: not only do we avoid the locking problem, this also ends up
getting better concurrency in case the allocation does need to block.
Tim Chen reports that with all these anon_vma locking tweaks, we're now
almost back up to the spinlock performance.

Reported-and-tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Tested-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
