<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/page_alloc.c, branch v3.7-rc5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm, numa: avoid setting zone_reclaim_mode unless a node is sufficiently distant</title>
<updated>2012-10-25T21:37:53+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2012-10-25T20:38:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6b187d0260b6cd1d0904309f32659b7ed5948af8'/>
<id>6b187d0260b6cd1d0904309f32659b7ed5948af8</id>
<content type='text'>
Commit 957f822a0ab9 ("mm, numa: reclaim from all nodes within reclaim
distance") caused zone_reclaim_mode to be set for all systems where two
nodes are within RECLAIM_DISTANCE of each other.  This is the opposite
of what we actually want: zone_reclaim_mode should be set if two nodes
are sufficiently distant.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reported-by: Julian Wollrath &lt;jwollrath@web.de&gt;
Tested-by: Julian Wollrath &lt;jwollrath@web.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Patrik Kullman &lt;patrik.kullman@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 957f822a0ab9 ("mm, numa: reclaim from all nodes within reclaim
distance") caused zone_reclaim_mode to be set for all systems where two
nodes are within RECLAIM_DISTANCE of each other.  This is the opposite
of what we actually want: zone_reclaim_mode should be set if two nodes
are sufficiently distant.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Reported-by: Julian Wollrath &lt;jwollrath@web.de&gt;
Tested-by: Julian Wollrath &lt;jwollrath@web.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Patrik Kullman &lt;patrik.kullman@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/page_alloc.c:alloc_contig_range(): return early for err path</title>
<updated>2012-10-25T21:37:52+00:00</updated>
<author>
<name>Bob Liu</name>
<email>lliubbo@gmail.com</email>
</author>
<published>2012-10-25T20:37:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=86a595f961360c9c31d00fb111dd09e5d5ba9a40'/>
<id>86a595f961360c9c31d00fb111dd09e5d5ba9a40</id>
<content type='text'>
If start_isolate_page_range() failed, unset_migratetype_isolate() has been
done inside it.

Signed-off-by: Bob Liu &lt;lliubbo@gmail.com&gt;
Cc: Ni zhan Chen &lt;nizhan.chen@gmail.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If start_isolate_page_range() failed, unset_migratetype_isolate() has been
done inside it.

Signed-off-by: Bob Liu &lt;lliubbo@gmail.com&gt;
Cc: Ni zhan Chen &lt;nizhan.chen@gmail.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cma: decrease cc.nr_migratepages after reclaiming pagelist</title>
<updated>2012-10-09T07:23:01+00:00</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2012-10-08T23:33:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=beb51eaa88238daba698ad837222ad277d440b6d'/>
<id>beb51eaa88238daba698ad837222ad277d440b6d</id>
<content type='text'>
reclaim_clean_pages_from_list() reclaims clean pages before migration so
cc.nr_migratepages should be updated.  Currently, there is no problem but
it can be wrong if we try to use the value in future.

Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: Bartlomiej Zolnierkiewicz &lt;b.zolnierkie@samsung.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
reclaim_clean_pages_from_list() reclaims clean pages before migration so
cc.nr_migratepages should be updated.  Currently, there is no problem but
it can be wrong if we try to use the value in future.

Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: Bartlomiej Zolnierkiewicz &lt;b.zolnierkie@samsung.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>CMA: migrate mlocked pages</title>
<updated>2012-10-09T07:23:00+00:00</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2012-10-08T23:33:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e46a28790e594c0876d1a84270926abf75460f61'/>
<id>e46a28790e594c0876d1a84270926abf75460f61</id>
<content type='text'>
Presently CMA cannot migrate mlocked pages so it ends up failing to allocate
contiguous memory space.

This patch makes mlocked pages be migrated out.  Of course, it can affect
realtime processes but in CMA usecase, contiguous memory allocation failing
is far worse than access latency to an mlocked page being variable while
CMA is running.  If someone wants to make the system realtime, he shouldn't
enable CMA because stalls can still happen at random times.

[akpm@linux-foundation.org: tweak comment text, per Mel]
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: Bartlomiej Zolnierkiewicz &lt;b.zolnierkie@samsung.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Presently CMA cannot migrate mlocked pages so it ends up failing to allocate
contiguous memory space.

This patch makes mlocked pages be migrated out.  Of course, it can affect
realtime processes but in CMA usecase, contiguous memory allocation failing
is far worse than access latency to an mlocked page being variable while
CMA is running.  If someone wants to make the system realtime, he shouldn't
enable CMA because stalls can still happen at random times.

[akpm@linux-foundation.org: tweak comment text, per Mel]
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: Bartlomiej Zolnierkiewicz &lt;b.zolnierkie@samsung.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memory-hotplug: fix zone stat mismatch</title>
<updated>2012-10-09T07:22:59+00:00</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2012-10-08T23:33:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5a883813845a2bb5ed2bd8c9240736c0740b156f'/>
<id>5a883813845a2bb5ed2bd8c9240736c0740b156f</id>
<content type='text'>
During memory-hotplug, I found NR_ISOLATED_[ANON|FILE] are increasing,
causing the kernel to hang.  When the system doesn't have enough free
pages, it enters reclaim but never reclaim any pages due to
too_many_isolated()==true and loops forever.

The cause is that when we do memory-hotadd after memory-remove,
__zone_pcp_update() clears a zone's ZONE_STAT_ITEMS in setup_pageset()
although the vm_stat_diff of all CPUs still have values.

In addtion, when we offline all pages of the zone, we reset them in
zone_pcp_reset without draining so we loss some zone stat item.

Reviewed-by: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During memory-hotplug, I found NR_ISOLATED_[ANON|FILE] are increasing,
causing the kernel to hang.  When the system doesn't have enough free
pages, it enters reclaim but never reclaim any pages due to
too_many_isolated()==true and loops forever.

The cause is that when we do memory-hotadd after memory-remove,
__zone_pcp_update() clears a zone's ZONE_STAT_ITEMS in setup_pageset()
although the vm_stat_diff of all CPUs still have values.

In addtion, when we offline all pages of the zone, we reset them in
zone_pcp_reset without draining so we loss some zone stat item.

Reviewed-by: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, numa: reclaim from all nodes within reclaim distance</title>
<updated>2012-10-09T07:22:56+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2012-10-08T23:33:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=957f822a0ab95e88b146638bad6209bbc315bedd'/>
<id>957f822a0ab95e88b146638bad6209bbc315bedd</id>
<content type='text'>
RECLAIM_DISTANCE represents the distance between nodes at which it is
deemed too costly to allocate from; it's preferred to try to reclaim from
a local zone before falling back to allocating on a remote node with such
a distance.

To do this, zone_reclaim_mode is set if the distance between any two
nodes on the system is greather than this distance.  This, however, ends
up causing the page allocator to reclaim from every zone regardless of
its affinity.

What we really want is to reclaim only from zones that are closer than
RECLAIM_DISTANCE.  This patch adds a nodemask to each node that
represents the set of nodes that are within this distance.  During the
zone iteration, if the bit for a zone's node is set for the local node,
then reclaim is attempted; otherwise, the zone is skipped.

[akpm@linux-foundation.org: fix CONFIG_NUMA=n build]
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
RECLAIM_DISTANCE represents the distance between nodes at which it is
deemed too costly to allocate from; it's preferred to try to reclaim from
a local zone before falling back to allocating on a remote node with such
a distance.

To do this, zone_reclaim_mode is set if the distance between any two
nodes on the system is greather than this distance.  This, however, ends
up causing the page allocator to reclaim from every zone regardless of
its affinity.

What we really want is to reclaim only from zones that are closer than
RECLAIM_DISTANCE.  This patch adds a nodemask to each node that
represents the set of nodes that are within this distance.  During the
zone iteration, if the bit for a zone's node is set for the local node,
then reclaim is attempted; otherwise, the zone is skipped.

[akpm@linux-foundation.org: fix CONFIG_NUMA=n build]
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove free_page_mlock</title>
<updated>2012-10-09T07:22:56+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2012-10-08T23:33:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a0c5e813f087dffc0d9b173d2e7d3328b1482fd5'/>
<id>a0c5e813f087dffc0d9b173d2e7d3328b1482fd5</id>
<content type='text'>
We should not be seeing non-0 unevictable_pgs_mlockfreed any longer.  So
remove free_page_mlock() from the page freeing paths: __PG_MLOCKED is
already in PAGE_FLAGS_CHECK_AT_FREE, so free_pages_check() will now be
checking it, reporting "BUG: Bad page state" if it's ever found set.
Comment UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed always 0.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We should not be seeing non-0 unevictable_pgs_mlockfreed any longer.  So
remove free_page_mlock() from the page freeing paths: __PG_MLOCKED is
already in PAGE_FLAGS_CHECK_AT_FREE, so free_pages_check() will now be
checking it, reporting "BUG: Bad page state" if it's ever found set.
Comment UNEVICTABLE_MLOCKFREED and unevictable_pgs_mlockfreed always 0.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix-up zone present pages</title>
<updated>2012-10-09T07:22:54+00:00</updated>
<author>
<name>Jianguo Wu</name>
<email>wujianguo@huawei.com</email>
</author>
<published>2012-10-08T23:33:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125'/>
<id>7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125</id>
<content type='text'>
I think zone-&gt;present_pages indicates pages that buddy system can management,
it should be:

	zone-&gt;present_pages = spanned pages - absent pages - bootmem pages,

but is now:
	zone-&gt;present_pages = spanned pages - absent pages - memmap pages.

spanned pages: total size, including holes.
absent pages: holes.
bootmem pages: pages used in system boot, managed by bootmem allocator.
memmap pages: pages used by page structs.

This may cause zone-&gt;present_pages less than it should be.  For example,
numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
present_pages should be spanned pages - absent pages, but now it also
minus memmap pages(free_area_init_core), which are actually allocated from
ZONE_MOVABLE.  When offlining all memory of a zone, this will cause
zone-&gt;present_pages less than 0, because present_pages is unsigned long
type, it is actually a very large integer, it indirectly caused
zone-&gt;watermark[WMARK_MIN] becomes a large
integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
large integer(calculate_totalreserve_pages()), and finally cause memory
allocating failure when fork process(__vm_enough_memory()).

[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory

I think the bug described in

  http://marc.info/?l=linux-mm&amp;m=134502182714186&amp;w=2

is also caused by wrong zone present pages.

This patch intends to fix-up zone-&gt;present_pages when memory are freed to
buddy system on x86_64 and IA64 platforms.

Signed-off-by: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Signed-off-by: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Reported-by: Petr Tesarik &lt;ptesarik@suse.cz&gt;
Tested-by: Petr Tesarik &lt;ptesarik@suse.cz&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I think zone-&gt;present_pages indicates pages that buddy system can management,
it should be:

	zone-&gt;present_pages = spanned pages - absent pages - bootmem pages,

but is now:
	zone-&gt;present_pages = spanned pages - absent pages - memmap pages.

spanned pages: total size, including holes.
absent pages: holes.
bootmem pages: pages used in system boot, managed by bootmem allocator.
memmap pages: pages used by page structs.

This may cause zone-&gt;present_pages less than it should be.  For example,
numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
present_pages should be spanned pages - absent pages, but now it also
minus memmap pages(free_area_init_core), which are actually allocated from
ZONE_MOVABLE.  When offlining all memory of a zone, this will cause
zone-&gt;present_pages less than 0, because present_pages is unsigned long
type, it is actually a very large integer, it indirectly caused
zone-&gt;watermark[WMARK_MIN] becomes a large
integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
large integer(calculate_totalreserve_pages()), and finally cause memory
allocating failure when fork process(__vm_enough_memory()).

[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory

I think the bug described in

  http://marc.info/?l=linux-mm&amp;m=134502182714186&amp;w=2

is also caused by wrong zone present pages.

This patch intends to fix-up zone-&gt;present_pages when memory are freed to
buddy system on x86_64 and IA64 platforms.

Signed-off-by: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Signed-off-by: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Reported-by: Petr Tesarik &lt;ptesarik@suse.cz&gt;
Tested-by: Petr Tesarik &lt;ptesarik@suse.cz&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/page_alloc: refactor out __alloc_contig_migrate_alloc()</title>
<updated>2012-10-09T07:22:52+00:00</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2012-10-08T23:32:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=723a0644a7255f532575fd43245f9ef976491328'/>
<id>723a0644a7255f532575fd43245f9ef976491328</id>
<content type='text'>
__alloc_contig_migrate_alloc() can be used by memory-hotplug so refactor
it out (move + rename as a common name) into page_isolation.c.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Acked-by: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__alloc_contig_migrate_alloc() can be used by memory-hotplug so refactor
it out (move + rename as a common name) into page_isolation.c.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Acked-by: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: compaction: clear PG_migrate_skip based on compaction and reclaim activity</title>
<updated>2012-10-09T07:22:51+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-10-08T23:32:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=62997027ca5b3d4618198ed8b1aba40b61b1137b'/>
<id>62997027ca5b3d4618198ed8b1aba40b61b1137b</id>
<content type='text'>
Compaction caches if a pageblock was scanned and no pages were isolated so
that the pageblocks can be skipped in the future to reduce scanning.  This
information is not cleared by the page allocator based on activity due to
the impact it would have to the page allocator fast paths.  Hence there is
a requirement that something clear the cache or pageblocks will be skipped
forever.  Currently the cache is cleared if there were a number of recent
allocation failures and it has not been cleared within the last 5 seconds.
Time-based decisions like this are terrible as they have no relationship
to VM activity and is basically a big hammer.

Unfortunately, accurate heuristics would add cost to some hot paths so
this patch implements a rough heuristic.  There are two cases where the
cache is cleared.

1. If a !kswapd process completes a compaction cycle (migrate and free
   scanner meet), the zone is marked compact_blockskip_flush. When kswapd
   goes to sleep, it will clear the cache. This is expected to be the
   common case where the cache is cleared. It does not really matter if
   kswapd happens to be asleep or going to sleep when the flag is set as
   it will be woken on the next allocation request.

2. If there have been multiple failures recently and compaction just
   finished being deferred then a process will clear the cache and start a
   full scan.  This situation happens if there are multiple high-order
   allocation requests under heavy memory pressure.

The clearing of the PG_migrate_skip bits and other scans is inherently
racy but the race is harmless.  For allocations that can fail such as THP,
they will simply fail.  For requests that cannot fail, they will retry the
allocation.  Tests indicated that scanning rates were roughly similar to
when the time-based heuristic was used and the allocation success rates
were similar.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Richard Davies &lt;richard@arachsys.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Avi Kivity &lt;avi@redhat.com&gt;
Cc: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Compaction caches if a pageblock was scanned and no pages were isolated so
that the pageblocks can be skipped in the future to reduce scanning.  This
information is not cleared by the page allocator based on activity due to
the impact it would have to the page allocator fast paths.  Hence there is
a requirement that something clear the cache or pageblocks will be skipped
forever.  Currently the cache is cleared if there were a number of recent
allocation failures and it has not been cleared within the last 5 seconds.
Time-based decisions like this are terrible as they have no relationship
to VM activity and is basically a big hammer.

Unfortunately, accurate heuristics would add cost to some hot paths so
this patch implements a rough heuristic.  There are two cases where the
cache is cleared.

1. If a !kswapd process completes a compaction cycle (migrate and free
   scanner meet), the zone is marked compact_blockskip_flush. When kswapd
   goes to sleep, it will clear the cache. This is expected to be the
   common case where the cache is cleared. It does not really matter if
   kswapd happens to be asleep or going to sleep when the flag is set as
   it will be woken on the next allocation request.

2. If there have been multiple failures recently and compaction just
   finished being deferred then a process will clear the cache and start a
   full scan.  This situation happens if there are multiple high-order
   allocation requests under heavy memory pressure.

The clearing of the PG_migrate_skip bits and other scans is inherently
racy but the race is harmless.  For allocations that can fail such as THP,
they will simply fail.  For requests that cannot fail, they will retry the
allocation.  Tests indicated that scanning rates were roughly similar to
when the time-based heuristic was used and the allocation success rates
were similar.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Richard Davies &lt;richard@arachsys.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Avi Kivity &lt;avi@redhat.com&gt;
Cc: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
