linux-stable.git/mm/memblock.c, branch linux-4.2.y

mm: page_alloc: pass PFN to __free_pages_bootmem

2015-07-01T02:44:55+00:00

__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised.  Parallel initialisation
of struct pages defers initialisation and __free_pages_bootmem can be
called for struct pages that cannot yet map struct page to PFN.  This
patch passes PFN to __free_pages_bootmem with no other functional change.

Signed-off-by: Mel Gorman 
Tested-by: Nate Zimmer 
Tested-by: Waiman Long 
Tested-by: Daniel J Blueman 
Acked-by: Pekka Enberg 
Cc: Robin Holt 
Cc: Nate Zimmer 
Cc: Dave Hansen 
Cc: Waiman Long 
Cc: Scott Norton 
Cc: "Luck, Tony" 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memblock: introduce a for_each_reserved_mem_region iterator

2015-07-01T02:44:55+00:00

Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used.  This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[    8.406511] kswapd 3 initialised deferred memory in 1116ms
[    8.428518] kswapd 1 initialised deferred memory in 1140ms
[    8.435977] kswapd 0 initialised deferred memory in 1148ms
[    8.437416] kswapd 2 initialised deferred memory in 1148ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.

Nate Zimmer said:

: On an older 8 TB box with lots and lots of cpus the boot time, as
: measure from grub to login prompt, the boot time improved from 1484
: seconds to exactly 1000 seconds.

Waiman Long said:

: I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system.  From
: grub menu to ssh login, the bootup time was 453s before the patch and 265s
: after the patch - a saving of 188s (42%).

Daniel Blueman said:

: On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing
: stock 4.0 boot in 7136s.  This drops to 2159s, or a 70% reduction with
: this patchset.  Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350)
: drops this to 1045s.

This patch (of 13):

As part of initializing struct page's in 2MiB chunks, we noticed that at
the end of free_all_bootmem(), there was nothing which had forced the
reserved/allocated 4KiB pages to be initialized.

This helper function will be used for that expansion.

Signed-off-by: Robin Holt 
Signed-off-by: Nate Zimmer 
Signed-off-by: Mel Gorman 
Tested-by: Nate Zimmer 
Tested-by: Waiman Long 
Tested-by: Daniel J Blueman 
Acked-by: Pekka Enberg 
Cc: Robin Holt 
Cc: Dave Hansen 
Cc: Waiman Long 
Cc: Scott Norton 
Cc: "Luck, Tony" 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memblock: allocate boot time data structures from mirrored memory

2015-06-25T00:49:45+00:00

Try to allocate all boot time kernel data structures from mirrored
memory.

If we run out of mirrored memory print warnings, but fall back to using
non-mirrored memory to make sure that we still boot.

By number of bytes, most of what we allocate at boot time is the page
structures.  64 bytes per 4K page on x86_64 ...  or about 1.5% of total
system memory.  For workloads where the bulk of memory is allocated to
applications this may represent a useful improvement to system
availability since 1.5% of total memory might be a third of the memory
allocated to the kernel.

Signed-off-by: Tony Luck 
Cc: Xishi Qiu 
Cc: Hanjun Guo 
Cc: Xiexiuqi 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: Yinghai Lu 
Cc: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute

2015-06-25T00:49:44+00:00

Some high end Intel Xeon systems report uncorrectable memory errors as a
recoverable machine check.  Linux has included code for some time to
process these and just signal the affected processes (or even recover
completely if the error was in a read only page that can be replaced by
reading from disk).

But we have no recovery path for errors encountered during kernel code
execution.  Except for some very specific cases were are unlikely to ever
be able to recover.

Enter memory mirroring. Actually 3rd generation of memory mirroing.

Gen1: All memory is mirrored
	Pro: No s/w enabling - h/w just gets good data from other side of the
	     mirror
	Con: Halves effective memory capacity available to OS/applications

Gen2: Partial memory mirror - just mirror memory begind some memory controllers
	Pro: Keep more of the capacity
	Con: Nightmare to enable. Have to choose between allocating from
	     mirrored memory for safety vs. NUMA local memory for performance

Gen3: Address range partial memory mirror - some mirror on each memory
      controller
	Pro: Can tune the amount of mirror and keep NUMA performance
	Con: I have to write memory management code to implement

The current plan is just to use mirrored memory for kernel allocations.
This has been broken into two phases:

1) This patch series - find the mirrored memory, use it for boot time
   allocations

2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
   unused mirrored memory from mm/memblock.c and only give it out to
   select kernel allocations (this is still being scoped because
   page_alloc.c is scary).

This patch (of 3):

Add extra "flags" to memblock to allow selection of memory based on
attribute.  No functional changes

Signed-off-by: Tony Luck 
Cc: Xishi Qiu 
Cc: Hanjun Guo 
Cc: Xiexiuqi 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: Yinghai Lu 
Cc: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memblock.c: add debug output for memblock_add()

2015-04-15T23:35:19+00:00

memblock_reserve() calls memblock_reserve_region() which prints debugging
information if 'memblock=debug' was passed on the command line.  This
patch adds the same behaviour, but for memblock_add function().

[akpm@linux-foundation.org: s/memblock_memory/memblock_add/ in message]
Signed-off-by: Alexander Kuleshov 
Cc: Martin Schwidefsky 
Cc: Philipp Hachtmann 
Cc: Fabian Frederick 
Cc: Catalin Marinas 
Cc: Emil Medve 
Cc: Akinobu Mita 
Cc: Tang Chen 
Cc: Tony Luck 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memblock.c: rename local variable of memblock_type to `type'

2015-04-14T23:49:00+00:00

A small cleanup.  Seems in e3239ff9 ("memblock: Rename memblock_region to
memblock_type and memblock_property to memblock_region") this one was
missed.

Signed-off-by: Baoquan He 
Cc: Benjamin Herrenschmidt 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memblock.c: refactor functions to set/clear MEMBLOCK_HOTPLUG

2014-12-13T20:42:46+00:00

There is a lot of duplication in the rubric around actually setting or
clearing a mem region flag.  Create a new helper function to do this and
reduce each of memblock_mark_hotplug() and memblock_clear_hotplug() to a
single line.

This will be useful if someone were to add a new mem region flag - which
I hope to be doing some day soon. But it looks like a plausible cleanup
even without that - so I'd like to get it out of the way now.

Signed-off-by: Tony Luck 
Cc: Santosh Shilimkar 
Cc: Tang Chen 
Cc: Grygorii Strashko 
Cc: Zhang Yanfei 
Cc: Philipp Hachtmann 
Cc: Yinghai Lu 
Cc: Emil Medve 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()

2014-09-10T22:42:12+00:00

Let memblock skip the hotpluggable memory regions in __next_mem_range(),
it is used to to prevent memblock from allocating hotpluggable memory
for the kernel at early time. The code is the same as __next_mem_range_rev().

Clear hotpluggable flag before releasing free pages to the buddy
allocator.  If we don't clear hotpluggable flag in
free_low_memory_core_early(), the memory which marked hotpluggable flag
will not free to buddy allocator.  Because __next_mem_range() will skip
them.

free_low_memory_core_early
	for_each_free_mem_range
		for_each_mem_range
			__next_mem_range

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Xishi Qiu 
Cc: Tejun Heo 
Cc: Tang Chen 
Cc: Zhang Yanfei 
Cc: Wen Congyang 
Cc: "Rafael J. Wysocki" 
Cc: "H. Peter Anvin" 
Cc: Wu Fengguang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memblock, memhotplug: fix wrong type in memblock_find_in_range_node().

2014-08-29T23:28:15+00:00

In memblock_find_in_range_node(), we defined ret as int.  But it should
be phys_addr_t because it is used to store the return value from
__memblock_find_range_bottom_up().

The bug has not been triggered because when allocating low memory near
the kernel end, the "int ret" won't turn out to be negative.  When we
started to allocate memory on other nodes, and the "int ret" could be
minus.  Then the kernel will panic.

A simple way to reproduce this: comment out the following code in
numa_init(),

        memblock_set_bottom_up(false);

and the kernel won't boot.

Reported-by: Xishi Qiu 
Signed-off-by: Tang Chen 
Tested-by: Xishi Qiu 
Cc: 	[3.13+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memblock.c: call kmemleak directly from memblock_(alloc|free)

2014-06-06T23:08:17+00:00

Kmemleak could ignore memory blocks allocated via memblock_alloc()
leading to false positives during scanning.  This patch adds the
corresponding callbacks and removes kmemleak_free_* calls in
mm/nobootmem.c to avoid duplication.

The kmemleak_alloc() in mm/nobootmem.c is kept since
__alloc_memory_core_early() does not use memblock_alloc() directly.

Signed-off-by: Catalin Marinas 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds