linux.git/mm/page_alloc.c, branch v3.14-rc2

mm: show message when updating min_free_kbytes in thp

2014-01-24T00:36:52+00:00

min_free_kbytes may be raised during THP's initialization.  Sometimes,
this will change the value which was set by the user.  Showing this
message will clarify this confusion.

Only show this message when changing a value which was set by the user
according to Michal Hocko's suggestion.

Show the old value of min_free_kbytes according to Dave Hansen's
suggestion.  This will give user the chance to restore old value of
min_free_kbytes.

Signed-off-by: Han Pingtian 
Reviewed-by: Michal Hocko 
Cc: David Rientjes 
Cc: Mel Gorman 
Cc: Dave Hansen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: prevent setting of a value less than 0 to min_free_kbytes

2014-01-24T00:36:52+00:00

If echo -1 > /proc/vm/sys/min_free_kbytes, the system will hang.  Changing
proc_dointvec() to proc_dointvec_minmax() in the
min_free_kbytes_sysctl_handler() can prevent this to happen.

mhocko said:

: You can still do echo $BIG_VALUE > /proc/vm/sys/min_free_kbytes and make
: your machine unusable but I agree that proc_dointvec_minmax is more
: suitable here as we already have:
:
: 	.proc_handler   = min_free_kbytes_sysctl_handler,
: 	.extra1         = &zero,
:
: It used to work properly but then 6fce56ec91b5 ("sysctl: Remove references
: to ctl_name and strategy from the generic sysctl table") has removed
: sysctl_intvec strategy and so extra1 is ignored.

Signed-off-by: Han Pingtian 
Acked-by: Michal Hocko 
Acked-by: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE

2014-01-24T00:36:50+00:00

Most of the VM_BUG_ON assertions are performed on a page.  Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.

I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.

This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.

[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin 
Cc: "Kirill A. Shutemov" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: print more details for bad_page()

2014-01-24T00:36:50+00:00

bad_page() is cool in that it prints out a bunch of data about the page.
But, I can never remember which page flags are good and which are bad,
or whether ->index or ->mapping is required to be NULL.

This patch allows bad/dump_page() callers to specify a string about why
they are dumping the page and adds explanation strings to a number of
places.  It also adds a 'bad_flags' argument to bad_page(), which it
then dumps out separately from the flags which are actually set.

This way, the messages will show specifically why the page was bad,
*specifically* which flags it is complaining about, if it was a page
flag combination which was the problem.

[akpm@linux-foundation.org: switch to pr_alert]
Signed-off-by: Dave Hansen 
Reviewed-by: Christoph Lameter 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure

2014-01-22T00:19:49+00:00

__GFP_NOFAIL may return NULL when coupled with GFP_NOWAIT or GFP_ATOMIC.

Luckily, nothing currently does such craziness.  So instead of causing
such allocations to loop (potentially forever), we maintain the current
behavior and also warn about the new users of the deprecated flag.

Suggested-by: Andrew Morton 
Signed-off-by: David Rientjes 
Cc: Mel Gorman 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: compaction: encapsulate defer reset logic

2014-01-22T00:19:48+00:00

Currently there are several functions to manipulate the deferred
compaction state variables.  The remaining case where the variables are
touched directly is when a successful allocation occurs in direct
compaction, or is expected to be successful in the future by kswapd.
Here, the lowest order that is expected to fail is updated, and in the
case of successful allocation, the deferred status and counter is reset
completely.

Create a new function compaction_defer_reset() to encapsulate this
functionality and make it easier to understand the code.  No functional
change.

Signed-off-by: Vlastimil Babka 
Acked-by: Mel Gorman 
Reviewed-by: Rik van Riel 
Cc: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/page_alloc.c: use memblock apis for early memory allocations

2014-01-22T00:19:47+00:00

Switch to memblock interfaces for early memory allocator instead of
bootmem allocator.  No functional change in beahvior than what it is in
current code from bootmem users points of view.

Archs already converted to NO_BOOTMEM now directly use memblock
interfaces instead of bootmem wrappers build on top of memblock.  And
the archs which still uses bootmem, these new apis just fallback to
exiting bootmem APIs.

Signed-off-by: Grygorii Strashko 
Signed-off-by: Santosh Shilimkar 
Cc: Yinghai Lu 
Cc: Tejun Heo 
Cc: "Rafael J. Wysocki" 
Cc: Arnd Bergmann 
Cc: Christoph Lameter 
Cc: Greg Kroah-Hartman 
Cc: H. Peter Anvin 
Cc: Johannes Weiner 
Cc: KAMEZAWA Hiroyuki 
Cc: Konrad Rzeszutek Wilk 
Cc: Michal Hocko 
Cc: Paul Walmsley 
Cc: Pavel Machek 
Cc: Russell King 
Cc: Tony Lindgren 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

x86, numa, acpi, memory-hotplug: make movable_node have higher priority

2014-01-22T00:19:45+00:00

If users specify the original movablecore=nn@ss boot option, the kernel
will arrange [ss, ss+nn) as ZONE_MOVABLE.  The kernelcore=nn@ss boot
option is similar except it specifies ZONE_NORMAL ranges.

Now, if users specify "movable_node" in kernel commandline, the kernel
will arrange hotpluggable memory in SRAT as ZONE_MOVABLE.  And if users
do this, all the other movablecore=nn@ss and kernelcore=nn@ss options
should be ignored.

For those who don't want this, just specify nothing.  The kernel will
act as before.

Signed-off-by: Tang Chen 
Signed-off-by: Zhang Yanfei 
Reviewed-by: Wanpeng Li 
Cc: "H. Peter Anvin" 
Cc: "Rafael J . Wysocki" 
Cc: Chen Tang 
Cc: Gong Chen 
Cc: Ingo Molnar 
Cc: Jiang Liu 
Cc: Johannes Weiner 
Cc: Lai Jiangshan 
Cc: Larry Woodman 
Cc: Len Brown 
Cc: Liu Jiang 
Cc: Mel Gorman 
Cc: Michal Nazarewicz 
Cc: Minchan Kim 
Cc: Prarit Bhargava 
Cc: Rik van Riel 
Cc: Taku Izumi 
Cc: Tejun Heo 
Cc: Thomas Gleixner 
Cc: Thomas Renninger 
Cc: Toshi Kani 
Cc: Vasilis Liaskovitis 
Cc: Wen Congyang 
Cc: Yasuaki Ishimatsu 
Cc: Yinghai Lu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT

2014-01-22T00:19:44+00:00

Commit 4b59e6c47309 ("mm, show_mem: suppress page counts in
non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to
suppress PFN walks on large memory machines.  Commit c78e93630d15 ("mm:
do not walk all of system memory during show_mem") avoided a PFN walk in
the generic show_mem helper which removes the requirement for
SHOW_MEM_FILTER_PAGE_COUNT in that case.

This patch removes PFN walkers from the arch-specific implementations
that report on a per-node or per-zone granularity.  ARM and unicore32
still do a PFN walk as they report memory usage on each bank which is a
much finer granularity where the debugging information may still be of
use.  As the remaining arches doing PFN walks have relatively small
amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT.

[akpm@linux-foundation.org: fix parisc]
Signed-off-by: Mel Gorman 
Acked-by: David Rientjes 
Cc: Tony Luck 
Cc: Russell King 
Cc: James Bottomley 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve

2014-01-22T00:19:43+00:00

Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on
9TB memory machine since onlining memory sections is too slow.  And we
found out setup_zone_migrate_reserve spent >90% of the time.

The problem is, setup_zone_migrate_reserve scans all pageblocks
unconditionally, but it is only necessary if the number of reserved
block was reduced (i.e.  memory hot remove).

Moreover, maximum MIGRATE_RESERVE per zone is currently 2.  It means
that the number of reserved pageblocks is almost always unchanged.

This patch adds zone->nr_migrate_reserve_block to maintain the number of
MIGRATE_RESERVE pageblocks and it reduces the overhead of
setup_zone_migrate_reserve dramatically.  The following table shows time
of onlining a memory section.

  Amount of memory     | 128GB | 192GB | 256GB|
  ---------------------------------------------
  linux-3.12           |  23.9 |  31.4 | 44.5 |
  This patch           |   8.3 |   8.3 |  8.6 |
  Mel's proposal patch |  10.9 |  19.2 | 31.3 |
  ---------------------------------------------
                                   (millisecond)

  128GB : 4 nodes and each node has 32GB of memory
  192GB : 6 nodes and each node has 32GB of memory
  256GB : 8 nodes and each node has 32GB of memory

  (*1) Mel proposed his idea by the following threads.
       https://lkml.org/lkml/2013/10/30/272

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Reported-by: Yasuaki Ishimatsu 
Tested-by: Yasuaki Ishimatsu 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds