linux.git/mm/compaction.c, branch v4.12

mm, compaction: finish whole pageblock to reduce fragmentation

2017-05-09T00:15:10+00:00

The main goal of direct compaction is to form a high-order page for
allocation, but it should also help against long-term fragmentation when
possible.

Most lower-than-pageblock-order compactions are for non-movable
allocations, which means that if we compact in a movable pageblock and
terminate as soon as we create the high-order page, it's unlikely that
the fallback heuristics will claim the whole block.  Instead there might
be a single unmovable page in a pageblock full of movable pages, and the
next unmovable allocation might pick another pageblock and increase
long-term fragmentation.

To help against such scenarios, this patch changes the termination
criteria for compaction so that the current pageblock is finished even
though the high-order page already exists.  Note that it might be
possible that the high-order page formed elsewhere in the zone due to
parallel activity, but this patch doesn't try to detect that.

This is only done with sync compaction, because async compaction is
limited to pageblock of the same migratetype, where it cannot result in
a migratetype fallback.  (Async compaction also eagerly skips
order-aligned blocks where isolation fails, which is against the goal of
migrating away as much of the pageblock as possible.)

As a result of this patch, long-term memory fragmentation should be
reduced.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 20%.  The number

Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Acked-by: Mel Gorman 
Acked-by: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: restrict async compaction to pageblocks of same migratetype

2017-05-09T00:15:10+00:00

The migrate scanner in async compaction is currently limited to
MIGRATE_MOVABLE pageblocks.  This is a heuristic intended to reduce
latency, based on the assumption that non-MOVABLE pageblocks are
unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are
not movable.  Should the async compaction succeed, this increases the
chance that the non-MOVABLE allocations will fallback to a MOVABLE
pageblock, making the long-term fragmentation worse.

This patch attempts to help the situation by changing async direct
compaction so that the migrate scanner only scans the pageblocks of the
requested migratetype.  If it's a non-MOVABLE type and there are such
pageblocks that do contain movable pages, chances are that the
allocation can succeed within one of such pageblocks, removing the need
for a fallback.  If that fails, the subsequent sync attempt will ignore
this restriction.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 30%.  The number of movable allocations falling back is reduced by
12%.

Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: add migratetype to compact_control

2017-05-09T00:15:10+00:00

Preparation patch.  We are going to need migratetype at lower layers
than compact_zone() and compact_finished().

Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Acked-by: Mel Gorman 
Acked-by: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: change migrate_async_suitable() to suitable_migration_source()

2017-05-09T00:15:10+00:00

Preparation for making the decisions more complex and depending on
compact_control flags.  No functional change.

Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Acked-by: Mel Gorman 
Acked-by: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: remove redundant watermark check in compact_finished()

2017-05-09T00:15:09+00:00

When detecting whether compaction has succeeded in forming a high-order
page, __compact_finished() employs a watermark check, followed by an own
search for a suitable page in the freelists.  This is not ideal for two
reasons:

 - The watermark check also searches high-order freelists, but has a
   less strict criteria wrt fallback. It's therefore redundant and waste
   of cycles. This was different in the past when high-order watermark
   check attempted to apply reserves to high-order pages.

 - The watermark check might actually fail due to lack of order-0 pages.
   Compaction can't help with that, so there's no point in continuing
   because of that. It's possible that high-order page still exists and
   it terminates.

This patch therefore removes the watermark check.  This should save some
cycles and terminate compaction sooner in some cases.

Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Acked-by: Mel Gorman 
Acked-by: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: ignore block suitable after check large free page

2017-05-03T22:52:09+00:00

By reviewing code, I find that if the migrate target is a large free
page and we ignore suitable, it may splite large target free page into
smaller block which is not good for defrag.  So move the ignore block
suitable after check large free page.

As Vlastimil pointed out in RFC version that this patch is just based on
logical analyses which might be better for future-proofing the function
and it is most likely won't have any visible effect right now, for
direct compaction shouldn't have to be called if there's a
>=pageblock_order page already available.

Link: http://lkml.kernel.org/r/1489490743-5364-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Minchan Kim 
Cc: Hanjun Guo 
Cc: Xishi Qiu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

sched/headers: Prepare to move signal wakeup & sigpending methods from into

2017-03-02T07:42:32+00:00

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

mm/migration: make isolate_movable_page() return int type

2017-02-25T01:46:55+00:00

Patch series "HWPOISON: soft offlining for non-lru movable page", v6.

After Minchan's commit bda807d44454 ("mm: migrate: support non-lru
movable page migration"), some type of non-lru page like zsmalloc and
virtio-balloon page also support migration.

Therefore, we can:

1) soft offlining no-lru movable pages, which means when memory
   corrected errors occur on a non-lru movable page, we can stop to use
   it by migrating data onto another page and disable the original
   (maybe half-broken) one.

2) enable memory hotplug for non-lru movable pages, i.e. we may offline
   blocks, which include such pages, by using non-lru page migration.

This patchset is heavily dependent on non-lru movable page migration.

This patch (of 4):

Change the return type of isolate_movable_page() from bool to int.  It
will return 0 when isolate movable page successfully, and return -EBUSY
when it isolates failed.

There is no functional change within this patch but prepare for later
patch.

[xieyisheng1@huawei.com: v6]
  Link: http://lkml.kernel.org/r/1486108770-630-2-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1485867981-16037-2-git-send-email-ysxie@foxmail.com
Signed-off-by: Yisheng Xie 
Suggested-by: Michal Hocko 
Acked-by: Minchan Kim 
Cc: Andi Kleen 
Cc: Hanjun Guo 
Cc: Johannes Weiner 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Naoya Horiguchi 
Cc: Reza Arbab 
Cc: Taku Izumi 
Cc: Vitaly Kuznetsov 
Cc: Vlastimil Babka 
Cc: Xishi Qiu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm,compaction: serialize waitqueue_active() checks

2017-02-23T00:41:29+00:00

Without a memory barrier, the following race can occur with a high-order
allocation:

wakeup_kcompactd(order == 1)  		     kcompactd()
  [L] waitqueue_active(kcompactd_wait)
						[S] prepare_to_wait_event(kcompactd_wait)
						[L] (kcompactd_max_order == 0)
  [S] kcompactd_max_order = order;		      schedule()

Where the waitqueue_active() check is speculatively re-ordered to before
setting the actual condition (max_order), not seeing the threads that's
going to block; making us miss a wakeup.  There are a couple of options
to fix this, including calling wq_has_sleepers() which adds a full
barrier, or unconditionally doing the wake_up_interruptible() and
serialize on the q->lock.  However, to make use of the control
dependency, we just need to add L->L guarantees.

While this bug is theoretical, there have been other offenders of the
lockless waitqueue_active() in the past -- this is also documented in
the call itself.

Link: http://lkml.kernel.org/r/1483975528-24342-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: add vmstats for kcompactd work

2017-02-23T00:41:29+00:00

A "compact_daemon_wake" vmstat exists that represents the number of
times kcompactd has woken up.  This doesn't represent how much work it
actually did, though.

It's useful to understand how much compaction work is being done by
kcompactd versus other methods such as direct compaction and explicitly
triggered per-node (or system) compaction.

This adds two new vmstats: "compact_daemon_migrate_scanned" and
"compact_daemon_free_scanned" to represent the number of pages kcompactd
has scanned as part of its migration scanner and freeing scanner,
respectively.

These values are still accounted for in the general
"compact_migrate_scanned" and "compact_free_scanned" for compatibility.

It could be argued that explicitly triggered compaction could also be
tracked separately, and that could be added if others find it useful.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1612071749390.69852@chino.kir.corp.google.com
Signed-off-by: David Rientjes 
Acked-by: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds