linux-stable.git/mm/compaction.c, branch v3.18.76

mm, compaction: abort free scanner if split fails

2016-07-12T12:47:12+00:00

[ Upstream commit 284f69fb49e2e385203f52441b324b9a68461d6b ]

[ Upstream commit a4f04f2c6955aff5e2c08dcb40aca247ff4d7370 ]

If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly.  If the
watermark is insufficient for a free page of order <= cc->order, then
terminate the scanner since all future splits will also likely fail.

This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail while holding zone->lock.

compaction_alloc() iterating a 128GB zone has been benchmarked to take
over 400ms on some systems whereas any free page isolated and ready to
be split ends up failing in split_free_page() because of the low
watermark check and thus the iteration continues.

The next time compaction occurs, the freeing scanner will likely start
at the end of the zone again since no success was made previously and we
get the same lengthy iteration until the zone is brought above the low
watermark.  All thp page faults can take >400ms in such a state without
this fix.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.com
Signed-off-by: David Rientjes 
Acked-by: Vlastimil Babka 
Cc: Minchan Kim 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm, compaction: skip compound pages by order in free scanner

2016-07-12T12:47:12+00:00

[ Upstream commit 683854270f84daa09baffe2b21d64ec88c614fa9 ]

[ Upstream commit 9fcd6d2e052eef525e94a9ae58dbe7ed4df4f5a7 ]

The compaction free scanner is looking for PageBuddy() pages and
skipping all others.  For large compound pages such as THP or hugetlbfs,
we can save a lot of iterations if we skip them at once using their
compound_order().  This is generally unsafe and we can read a bogus
value of order due to a race, but if we are careful, the only danger is
skipping too much.

When tested with stress-highalloc from mmtests on 4GB system with 1GB
hugetlbfs pages, the vmstat compact_free_scanned count decreased by at
least 15%.

Signed-off-by: Vlastimil Babka 
Cc: Minchan Kim 
Cc: Mel Gorman 
Acked-by: Joonsoo Kim 
Acked-by: Michal Nazarewicz 
Cc: Naoya Horiguchi 
Cc: Christoph Lameter 
Cc: Rik van Riel 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm, cma: prevent nr_isolated_* counters from going negative

2016-05-18T00:11:06+00:00

[ Upstream commit 14af4a5e9b26ad251f81c174e8a43f3e179434a5 ]

/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
increasingly negative under compaction: which would add delay when
should be none, or no delay when should delay.  The bug in compaction
was due to a recent mmotm patch, but much older instance of the bug was
also noticed in isolate_migratepages_range() which is used for CMA and
gigantic hugepage allocations.

The bug is caused by putback_movable_pages() in an error path
decrementing the isolated counters without them being previously
incremented by acct_isolated().  Fix isolate_migratepages_range() by
removing the error-path putback, thus reaching acct_isolated() with
migratepages still isolated, and leaving putback to caller like most
other places do.

Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
[vbabka@suse.cz: expanded the changelog]
Signed-off-by: Hugh Dickins 
Signed-off-by: Vlastimil Babka 
Acked-by: Joonsoo Kim 
Cc: Michal Hocko 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

Signed-off-by: Sasha Levin

mm: fix negative nr_isolated counts

2015-03-14T19:37:16+00:00

commit ff59909a077b3c51c168cb658601c6b63136a347 upstream.

The vmstat interfaces are good at hiding negative counts (at least when
CONFIG_SMP); but if you peer behind the curtain, you find that
nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
more negative: so they can absorb larger and larger numbers of isolated
pages, yet still appear to be zero.

I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
but I guess it's there for a good reason, in which case we ought to get
too_many_isolated() working again.

The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
to call acct_isolated() to increment them.

It is possible that the bug whcih this patch fixes could cause OOM kills
when the system still has a lot of reclaimable page cache.

Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
Signed-off-by: Hugh Dickins 
Acked-by: Vlastimil Babka 
Acked-by: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm/compaction: fix wrong order check in compact_finished()

2015-03-14T19:37:16+00:00

commit 372549c2a3778fd3df445819811c944ad54609ca upstream.

What we want to check here is whether there is highorder freepage in buddy
list of other migratetype in order to steal it without fragmentation.
But, current code just checks cc->order which means allocation request
order.  So, this is wrong.

Without this fix, non-movable synchronous compaction below pageblock order
would not stopped until compaction is complete, because migratetype of
most pageblocks are movable and high order freepage made by compaction is
usually on movable type buddy list.

There is some report related to this bug. See below link.

  http://www.spinics.net/lists/linux-mm/msg81666.html

Although the issued system still has load spike comes from compaction,
this makes that system completely stable and responsive according to his
report.

stress-highalloc test in mmtests with non movable order 7 allocation
doesn't show any notable difference in allocation success rate, but, it
shows more compaction success rate.

Compaction success rate (Compaction success * 100 / Compaction stalls, %)
18.47 : 28.94

Fixes: 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page immediately when it is made available")
Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Reviewed-by: Zhang Yanfei 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

mm, compaction: prevent infinite loop in compact_zone

2014-11-14T00:17:06+00:00

Several people have reported occasionally seeing processes stuck in
compact_zone(), even triggering soft lockups, in 3.18-rc2+.

Testing a revert of commit e14c720efdd7 ("mm, compaction: remember
position within pageblock in free pages scanner") fixed the issue,
although the stuck processes do not appear to involve the free scanner.

Finally, by code inspection, the bug was found in isolate_migratepages()
which uses a slightly different condition to detect if the migration and
free scanners have met, than compact_finished().  That has not been a
problem until commit e14c720efdd7 allowed the free scanner position
between individual invocations to be in the middle of a pageblock.

In a relatively rare case, the migration scanner position can end up at
the beginning of a pageblock, with the free scanner position in the
middle of the same pageblock.  If it's the migration scanner's turn,
isolate_migratepages() exits immediately (without updating the
position), while compact_finished() decides to continue compaction,
resulting in a potentially infinite loop.  The system can recover only
if another process creates enough high-order pages to make the watermark
checks in compact_finished() pass.

This patch fixes the immediate problem by bumping the migration
scanner's position to meet the free scanner in isolate_migratepages(),
when both are within the same pageblock.  This causes compact_finished()
to terminate properly.  A more robust check in compact_finished() is
planned as a cleanup for better future maintainability.

Fixes: e14c720efdd73 ("mm, compaction: remember position within pageblock in free pages scanner)
Signed-off-by: Vlastimil Babka 
Reported-by: P. Christeas 
Tested-by: P. Christeas 
Link: http://marc.info/?l=linux-mm&m=141508604232522&w=2
Reported-by: Norbert Preining 
Tested-by: Norbert Preining 
Link: https://lkml.org/lkml/2014/11/4/904
Reported-by: Pavel Machek 
Link: https://lkml.org/lkml/2014/11/7/164
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: skip the range until proper target pageblock is met

2014-11-14T00:17:05+00:00

Commit 7d49d8868336 ("mm, compaction: reduce zone checking frequency in
the migration scanner") has a side-effect that changes the iteration
range calculation.  Before the change, block_end_pfn is calculated using
start_pfn, but now it blindly adds pageblock_nr_pages to the previous
value.

This causes the problem that isolation_start_pfn is larger than
block_end_pfn when we isolate the page with more than pageblock order.
In this case, isolation would fail due to an invalid range parameter.

To prevent this, this patch implements skipping the range until a proper
target pageblock is met.  Without this patch, CMA with more than
pageblock order always fails but with this patch it will succeed.

Signed-off-by: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Minchan Kim 
Cc: Michal Nazarewicz 
Cc: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction.c: avoid premature range skip in isolate_migratepages_range

2014-10-29T23:33:13+00:00

Commit edc2ca612496 ("mm, compaction: move pageblock checks up from
isolate_migratepages_range()") commonizes isolate_migratepages variants
and make them use isolate_migratepages_block().

isolate_migratepages_block() could stop the execution when enough pages
are isolated, but, there is no code in isolate_migratepages_range() to
handle this case.  In the result, even if isolate_migratepages_block()
returns prematurely without checking all pages in the range,

isolate_migratepages_block() is called repeately on the following
pageblock and some pages in the previous range are skipped to check.
Then, CMA is failed frequently due to this fact.

To fix this problem, this patch let isolate_migratepages_range() know
the situation that enough pages are isolated and stop the isolation in
that case.

Note that isolate_migratepages() has no such problem, because, it always
stops the isolation after just one call of isolate_migratepages_block().

Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Cc: David Rientjes 
Cc: Minchan Kim 
Cc: Michal Nazarewicz 
Cc: Naoya Horiguchi 
Cc: Christoph Lameter 
Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/balloon_compaction: redesign ballooned pages management

2014-10-10T02:26:01+00:00

Sasha Levin reported KASAN splash inside isolate_migratepages_range().
Problem is in the function __is_movable_balloon_page() which tests
AS_BALLOON_MAP in page->mapping->flags.  This function has no protection
against anonymous pages.  As result it tried to check address space flags
inside struct anon_vma.

Further investigation shows more problems in current implementation:

* Special branch in __unmap_and_move() never works:
  balloon_page_movable() checks page flags and page_count.  In
  __unmap_and_move() page is locked, reference counter is elevated, thus
  balloon_page_movable() always fails.  As a result execution goes to the
  normal migration path.  virtballoon_migratepage() returns
  MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
  move_to_new_page() thinks this is an error code and assigns
  newpage->mapping to NULL.  Newly migrated page lose connectivity with
  balloon an all ability for further migration.

* lru_lock erroneously required in isolate_migratepages_range() for
  isolation ballooned page.  This function releases lru_lock periodically,
  this makes migration mostly impossible for some pages.

* balloon_page_dequeue have a tight race with balloon_page_isolate:
  balloon_page_isolate could be executed in parallel with dequeue between
  picking page from list and locking page_lock.  Race is rare because they
  use trylock_page() for locking.

This patch fixes all of them.

Instead of fake mapping with special flag this patch uses special state of
page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256.  Buddy allocator uses
PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose.  Storing mark
directly in struct page makes everything safer and easier.

PagePrivate is used to mark pages present in page list (i.e.  not
isolated, like PageLRU for normal pages).  It replaces special rules for
reference counter and makes balloon migration similar to migration of
normal pages.  This flag is protected by page_lock together with link to
the balloon device.

Signed-off-by: Konstantin Khlebnikov 
Reported-by: Sasha Levin 
Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
Cc: Rafael Aquini 
Cc: Andrey Ryabinin 
Cc: 	[3.8+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction.c: fix warning of 'flags' may be used uninitialized

2014-10-10T02:25:57+00:00

C      mm/compaction.o
mm/compaction.c: In function isolate_freepages_block:
mm/compaction.c:364:37: warning: flags may be used uninitialized in this function [-Wmaybe-uninitialized]
       && compact_unlock_should_abort(&cc->zone->lock, flags,
                                     ^

Signed-off-by: Xiubo Li 
Cc: Vlastimil Babka 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Minchan Kim 
Cc: Arnd Bergmann 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds