linux-stable.git/mm/compaction.c, branch v4.2.1

mm/compaction.c: fix "suitable_migration_target() unused" warning

2015-04-15T23:35:20+00:00

mm/compaction.c:250:13: warning: 'suitable_migration_target' defined but not used [-Wunused-function]

Reported-by: Fengguang Wu 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: reset compaction scanner positions

2015-04-15T23:35:17+00:00

When the compaction is activated via /proc/sys/vm/compact_memory it would
better scan the whole zone.  And some platforms, for instance ARM, have
the start_pfn of a zone at zero.  Therefore the first try to compact via
/proc doesn't work.  It needs to reset the compaction scanner position
first.

Signed-off-by: Gioh Kim 
Acked-by: Vlastimil Babka 
Acked-by: David Rientjes 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: allow compaction of unevictable pages

2015-04-15T23:35:17+00:00

Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of this is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion.  The
compaction code today does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state.  This patch introduces a sysctl for controlling compaction
behavior with respect to the unevictable lru.  Users who demand no page
faults after a page is present can set compact_unevictable_allowed to 0
and users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
after fragmenting memory.  When the value is set to 1, allocations
succeed.

Signed-off-by: Eric B Munson 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Christoph Lameter 
Acked-by: David Rientjes 
Acked-by: Rik van Riel 
Cc: Vlastimil Babka 
Cc: Thomas Gleixner 
Cc: Christoph Lameter 
Cc: Peter Zijlstra 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: enhance compaction finish condition

2015-04-14T23:49:01+00:00

Compaction has anti fragmentation algorithm.  It is that freepage should
be more than pageblock order to finish the compaction if we don't find any
freepage in requested migratetype buddy list.  This is for mitigating
fragmentation, but, there is a lack of migratetype consideration and it is
too excessive compared to page allocator's anti fragmentation algorithm.

Not considering migratetype would cause premature finish of compaction.
For example, if allocation request is for unmovable migratetype, freepage
with CMA migratetype doesn't help that allocation and compaction should
not be stopped.  But, current logic regards this situation as compaction
is no longer needed, so finish the compaction.

Secondly, condition is too excessive compared to page allocator's logic.
We can steal freepage from other migratetype and change pageblock
migratetype on more relaxed conditions in page allocator.  This is
designed to prevent fragmentation and we can use it here.  Imposing hard
constraint only to the compaction doesn't help much in this case since
page allocator would cause fragmentation again.

To solve these problems, this patch borrows anti fragmentation logic from
page allocator.  It will reduce premature compaction finish in some cases
and reduce excessive compaction work.

stress-highalloc test in mmtests with non movable order 7 allocation shows
considerable increase of compaction success rate.

Compaction success rate (Compaction success * 100 / Compaction stalls, %)
31.82 : 42.20

I tested it on non-reboot 5 runs stress-highalloc benchmark and found that
there is no more degradation on allocation success rate than before.  That
roughly means that this patch doesn't result in more fragmentations.

Vlastimil suggests additional idea that we only test for fallbacks when
migration scanner has scanned a whole pageblock.  It looked good for
fragmentation because chance of stealing increase due to making more free
pages in certain pageblock.  So, I tested it, but, it results in decreased
compaction success rate, roughly 38.00.  I guess the reason that if system
is low memory condition, watermark check could be failed due to not enough
order 0 free page and so, sometimes, we can't reach a fallback check
although migrate_pfn is aligned to pageblock_nr_pages.  I can insert code
to cope with this situation but it makes code more complicated so I don't
include his idea at this patch.

[akpm@linux-foundation.org: fix CONFIG_CMA=n build]
Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: page_alloc: add kasan hooks on alloc and free paths

2015-02-14T05:21:41+00:00

Add kernel address sanitizer hooks to mark allocated page's addresses as
accessible in corresponding shadow region.  Mark freed pages as
inaccessible.

Signed-off-by: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Konstantin Serebryany 
Cc: Dmitry Chernenkov 
Signed-off-by: Andrey Konovalov 
Cc: Yuri Gribov 
Cc: Konstantin Khlebnikov 
Cc: Sasha Levin 
Cc: Christoph Lameter 
Cc: Joonsoo Kim 
Cc: Dave Hansen 
Cc: Andi Kleen 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: fix negative nr_isolated counts

2015-02-13T02:54:11+00:00

The vmstat interfaces are good at hiding negative counts (at least when
CONFIG_SMP); but if you peer behind the curtain, you find that
nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
more negative: so they can absorb larger and larger numbers of isolated
pages, yet still appear to be zero.

I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
but I guess it's there for a good reason, in which case we ought to get
too_many_isolated() working again.

The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
to call acct_isolated() to increment them.

It is possible that the bug whcih this patch fixes could cause OOM kills
when the system still has a lot of reclaimable page cache.

Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
Signed-off-by: Hugh Dickins 
Acked-by: Vlastimil Babka 
Acked-by: Joonsoo Kim 
Cc: 	[3.18+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: stop the isolation when we isolate enough freepage

2015-02-13T02:54:10+00:00

Currently, freepage isolation in one pageblock doesn't consider how many
freepages we isolate. When I traced flow of compaction, compaction
sometimes isolates more than 256 freepages to migrate just 32 pages.

In this patch, freepage isolation is stopped at the point that we
have more isolated freepage than isolated page for migration. This
results in slowing down free page scanner and make compaction success
rate higher.

stress-highalloc test in mmtests with non movable order 7 allocation shows
increase of compaction success rate.

Compaction success rate (Compaction success * 100 / Compaction stalls, %)
27.13 : 31.82

pfn where both scanners meets on compaction complete
(separate test due to enormous tracepoint buffer)
(zone_start=4096, zone_end=1048576)
586034 : 654378

In fact, I didn't fully understand why this patch results in such good
result. There was a guess that not used freepages are released to pcp list
and on next compaction trial we won't isolate them again so compaction
success rate would decrease. To prevent this effect, I tested with adding
pcp drain code on release_freepages(), but, it has no good effect.

Anyway, this patch reduces waste time to isolate unneeded freepages so
seems reasonable.

Vlastimil said:

: I briefly tried it on top of the pivot-changing series and with order-9
: allocations it reduced free page scanned counter by almost 10%.  No effect
: on success rates (maybe because pivot changing already took care of the
: scanners meeting problem) but the scanning reduction is good on its own.
:
: It also explains why e14c720efdd7 ("mm, compaction: remember position
: within pageblock in free pages scanner") had less than expected
: improvements.  It would only actually stop within pageblock in case of
: async compaction detecting contention.  I guess that's also why the
: infinite loop problem fixed by 1d5bfe1ffb5b affected so relatively few
: people.

Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Tested-by: Vlastimil Babka 
Reviewed-by: Zhang Yanfei 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: fix wrong order check in compact_finished()

2015-02-13T02:54:10+00:00

What we want to check here is whether there is highorder freepage in buddy
list of other migratetype in order to steal it without fragmentation.
But, current code just checks cc->order which means allocation request
order.  So, this is wrong.

Without this fix, non-movable synchronous compaction below pageblock order
would not stopped until compaction is complete, because migratetype of
most pageblocks are movable and high order freepage made by compaction is
usually on movable type buddy list.

There is some report related to this bug. See below link.

  http://www.spinics.net/lists/linux-mm/msg81666.html

Although the issued system still has load spike comes from compaction,
this makes that system completely stable and responsive according to his
report.

stress-highalloc test in mmtests with non movable order 7 allocation
doesn't show any notable difference in allocation success rate, but, it
shows more compaction success rate.

Compaction success rate (Compaction success * 100 / Compaction stalls, %)
18.47 : 28.94

Fixes: 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page immediately when it is made available")
Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Reviewed-by: Zhang Yanfei 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Cc: 	[3.7+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: add tracepoint to observe behaviour of compaction defer

2015-02-12T01:06:04+00:00

Compaction deferring logic is heavy hammer that block the way to the
compaction.  It doesn't consider overall system state, so it could prevent
user from doing compaction falsely.  In other words, even if system has
enough range of memory to compact, compaction would be skipped due to
compaction deferring logic.  This patch add new tracepoint to understand
work of deferring logic.  This will also help to check compaction success
and fail.

Signed-off-by: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: more trace to understand when/why compaction start/finish

2015-02-12T01:06:04+00:00

It is not well analyzed that when/why compaction start/finish or not.
With these new tracepoints, we can know much more about start/finish
reason of compaction.  I can find following bug with these tracepoint.

http://www.spinics.net/lists/linux-mm/msg81582.html

Signed-off-by: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds