linux.git/mm/compaction.c, branch v4.7-rc4

mm/compaction.c: fix zoneindex in kcompactd()

2016-05-21T00:58:30+00:00

While testing the kcompactd in my platform 3G MEM only DMA ZONE.  I
found the kcompactd never wakeup.  It seems the zoneindex has already
minus 1 before.  So the traverse here should be <=.

It fixes a regression where kswapd could previously compact, but
kcompactd not.  Not a crash fix though.

[akpm@linux-foundation.org: fix kcompactd_do_work() as well, per Hugh]
Link: http://lkml.kernel.org/r/1463659121-84124-1-git-send-email-puck.chen@hisilicon.com
Fixes: accf62422b3a ("mm, kswapd: replace kswapd compaction with waking up kcompactd")
Signed-off-by: Chen Feng 
Acked-by: Vlastimil Babka 
Cc: Hugh Dickins 
Cc: Michal Hocko 
Cc: Kirill A. Shutemov 
Cc: Johannes Weiner 
Cc: Tejun Heo 
Cc: Zhuangluan Su 
Cc: Yiping Xu 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, oom, compaction: prevent from should_compact_retry looping for ever for costly orders

2016-05-21T00:58:30+00:00

"mm: consider compaction feedback also for costly allocation" has
removed the upper bound for the reclaim/compaction retries based on the
number of reclaimed pages for costly orders.  While this is desirable
the patch did miss a mis interaction between reclaim, compaction and the
retry logic.  The direct reclaim tries to get zones over min watermark
while compaction backs off and returns COMPACT_SKIPPED when all zones
are below low watermark + 1<
Acked-by: Hillf Danton 
Acked-by: Vlastimil Babka 
Cc: David Rientjes 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Tetsuo Handa 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: distinguish between full and partial COMPACT_COMPLETE

2016-05-21T00:58:30+00:00

COMPACT_COMPLETE now means that compaction and free scanner met.  This
is not very useful information if somebody just wants to use this
feedback and make any decisions based on that.  The current caller might
be a poor guy who just happened to scan tiny portion of the zone and
that could be the reason no suitable pages were compacted.  Make sure we
distinguish the full and partial zone walks.

Consumers should treat COMPACT_PARTIAL_SKIPPED as a potential success
and be optimistic in retrying.

The existing users of COMPACT_COMPLETE are conservatively changed to use
COMPACT_PARTIAL_SKIPPED as well but some of them should be probably
reconsidered and only defer the compaction only for COMPACT_COMPLETE
with the new semantic.

This patch shouldn't introduce any functional changes.

Signed-off-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Hillf Danton 
Cc: David Rientjes 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Tetsuo Handa 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED

2016-05-21T00:58:30+00:00

try_to_compact_pages() can currently return COMPACT_SKIPPED even when
the compaction is defered for some zone just because zone DMA is skipped
in 99% of cases due to watermark checks.  This makes COMPACT_DEFERRED
basically unusable for the page allocator as a feedback mechanism.

Make sure we distinguish those two states properly and switch their
ordering in the enum.  This would mean that the COMPACT_SKIPPED will be
returned only when all eligible zones are skipped.

As a result COMPACT_DEFERRED handling for THP in __alloc_pages_slowpath
will be more precise and we would bail out rather than reclaim.

Signed-off-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Hillf Danton 
Cc: David Rientjes 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Tetsuo Handa 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: cover all compaction mode in compact_zone

2016-05-21T00:58:30+00:00

The compiler is complaining after "mm, compaction: change COMPACT_
constants into enum"

  mm/compaction.c: In function `compact_zone':
  mm/compaction.c:1350:2: warning: enumeration value `COMPACT_DEFERRED' not handled in switch [-Wswitch]
    switch (ret) {
    ^
  mm/compaction.c:1350:2: warning: enumeration value `COMPACT_COMPLETE' not handled in switch [-Wswitch]
  mm/compaction.c:1350:2: warning: enumeration value `COMPACT_NO_SUITABLE_PAGE' not handled in switch [-Wswitch]
  mm/compaction.c:1350:2: warning: enumeration value `COMPACT_NOT_SUITABLE_ZONE' not handled in switch [-Wswitch]
  mm/compaction.c:1350:2: warning: enumeration value `COMPACT_CONTENDED' not handled in switch [-Wswitch]

compaction_suitable is allowed to return only COMPACT_PARTIAL,
COMPACT_SKIPPED and COMPACT_CONTINUE so other cases are simply
impossible.  Put a VM_BUG_ON to catch an impossible return value.

Signed-off-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Hillf Danton 
Cc: David Rientjes 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Tetsuo Handa 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: change COMPACT_ constants into enum

2016-05-21T00:58:30+00:00

Compaction code is doing weird dances between COMPACT_FOO -> int ->
unsigned long

But there doesn't seem to be any reason for that.  All functions which
return/use one of those constants are not expecting any other value so it
really makes sense to define an enum for them and make it clear that no
other values are expected.

This is a pure cleanup and shouldn't introduce any functional changes.

Signed-off-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Hillf Danton 
Cc: David Rientjes 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Tetsuo Handa 
Cc: Vladimir Davydov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, page_alloc: remove field from alloc_context

2016-05-20T02:12:14+00:00

The classzone_idx can be inferred from preferred_zoneref so remove the
unnecessary field and save stack space.

Signed-off-by: Mel Gorman 
Cc: Vlastimil Babka 
Cc: Jesper Dangaard Brouer 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, page_alloc: convert alloc_flags to unsigned

2016-05-20T02:12:14+00:00

alloc_flags is a bitmask of flags but it is signed which does not
necessarily generate the best code depending on the compiler.  Even
without an impact, it makes more sense that this be unsigned.

Signed-off-by: Mel Gorman 
Acked-by: Vlastimil Babka 
Cc: Jesper Dangaard Brouer 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: skip blocks where isolation fails in async direct compaction

2016-05-20T02:12:14+00:00

The goal of direct compaction is to quickly make a high-order page
available for the pending allocation.  Within an aligned block of pages
of desired order, a single allocated page that cannot be isolated for
migration means that the block cannot fully merge to a buddy page that
would satisfy the allocation request.  Therefore we can reduce the
allocation stall by skipping the rest of the block immediately on
isolation failure.  For async compaction, this also means a higher
chance of succeeding until it detects contention.

We however shouldn't completely sacrifice the second objective of
compaction, which is to reduce overal long-term memory fragmentation.
As a compromise, perform the eager skipping only in direct async
compaction, while sync compaction (including kcompactd) remains
thorough.

Testing was done using stress-highalloc from mmtests, configured for
order-4 GFP_KERNEL allocations:

                                 4.6-rc1               4.6-rc1
                                  before                 after
  Success 1 Min         24.00 (  0.00%)       27.00 (-12.50%)
  Success 1 Mean        30.20 (  0.00%)       31.60 ( -4.64%)
  Success 1 Max         37.00 (  0.00%)       35.00 (  5.41%)
  Success 2 Min         42.00 (  0.00%)       32.00 ( 23.81%)
  Success 2 Mean        44.00 (  0.00%)       44.80 ( -1.82%)
  Success 2 Max         48.00 (  0.00%)       52.00 ( -8.33%)
  Success 3 Min         91.00 (  0.00%)       92.00 ( -1.10%)
  Success 3 Mean        92.20 (  0.00%)       92.80 ( -0.65%)
  Success 3 Max         94.00 (  0.00%)       93.00 (  1.06%)

We can see that success rates are unaffected by the skipping.

                4.6-rc1     4.6-rc1
                 before       after
  User         2587.42     2566.53
  System        482.89      471.20
  Elapsed      1395.68     1382.00

Times are not so useful metric for this benchmark as main portion is the
interfering kernel builds, but results do hint at reduced system times.

                                      4.6-rc1     4.6-rc1
                                       before       after
  Direct pages scanned                163614      159608
  Kswapd pages scanned               2070139     2078790
  Kswapd pages reclaimed             2061707     2069757
  Direct pages reclaimed              163354      159505

Reduced direct reclaim was unintended, but could be explained by more
successful first attempt at (async) direct compaction, which is
attempted before the first reclaim attempt in __alloc_pages_slowpath().

  Compaction stalls                    33052       39853
  Compaction success                   12121       19773
  Compaction failures                  20931       20079

Compaction is indeed more successful, and thus less likely to get
deferred, so there are also more direct compaction stalls.

  Page migrate success               3781876     3326819
  Page migrate failure                 45817       41774
  Compaction pages isolated          7868232     6941457
  Compaction migrate scanned       168160492   127269354
  Compaction migrate prescanned            0           0
  Compaction free scanned         2522142582  2326342620
  Compaction free direct alloc             0           0
  Compaction free dir. all. miss           0           0
  Compaction cost                       5252        4476

The patch reduces migration scanned pages by 25% thanks to the eager
skipping.

[hughd@google.com: prevent nr_isolated_* from going negative]
Signed-off-by: Vlastimil Babka 
Signed-off-by: Hugh Dickins 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: David Rientjes 
Cc: Minchan Kim 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: reduce spurious pcplist drains

2016-05-20T02:12:14+00:00

Compaction drains the local pcplists each time migration scanner moves
away from a cc->order aligned block where it isolated pages for
migration, so that the pages freed by migrations can merge into higher
orders.

The detection is currently coarser than it could be.  The
cc->last_migrated_pfn variable should track the lowest pfn that was
isolated for migration.  But it is set to the pfn where
isolate_migratepages_block() starts scanning, which is typically the
first pfn of the pageblock.  There, the scanner might fail to isolate
several order-aligned blocks, and then isolate COMPACT_CLUSTER_MAX in
another block.  This would cause the pcplists drain to be performed,
although the scanner didn't yet finish the block where it isolated from.

This patch thus makes cc->last_migrated_pfn handling more accurate by
setting it to the pfn of an actually isolated page in
isolate_migratepages_block().  Although practical effects of this patch
are likely low, it arguably makes the intent of the code more obvious.
Also the next patch will make async direct compaction skip blocks more
aggressively, and draining pcplists due to skipped blocks is wasteful.

Signed-off-by: Vlastimil Babka 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: David Rientjes 
Cc: Minchan Kim 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds