linux-stable.git/mm/page_alloc.c, branch v4.1.2

mm: remove rest of ACCESS_ONCE() usages

2015-04-15T23:35:18+00:00

We converted some of the usages of ACCESS_ONCE to READ_ONCE in the mm/
tree since it doesn't work reliably on non-scalar types.

This patch removes the rest of the usages of ACCESS_ONCE, and use the new
READ_ONCE API for the read accesses.  This makes things cleaner, instead
of using separate/multiple sets of APIs.

Signed-off-by: Jason Low 
Acked-by: Michal Hocko 
Acked-by: Davidlohr Bueso 
Acked-by: Rik van Riel 
Reviewed-by: Christian Borntraeger 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/page_alloc.c: clean up comment

2015-04-14T23:49:04+00:00

Signed-off-by: Yaowei Bai 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove GFP_THISNODE

2015-04-14T23:49:03+00:00

NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.

GFP_THISNODE is a secret combination of gfp bits that have different
behavior than expected.  It is a combination of __GFP_THISNODE,
__GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
allocator slowpath to fail without trying reclaim even though it may be
used in combination with __GFP_WAIT.

An example of the problem this creates: commit e97ca8e5b864 ("mm: fix
GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
that really just wanted __GFP_THISNODE.  The problem doesn't end there,
however, because even it was a no-op for alloc_misplaced_dst_page(),
which also sets __GFP_NORETRY and __GFP_NOWARN, and
migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
is set in GFP_TRANSHUGE.  Converting GFP_THISNODE to __GFP_THISNODE is a
no-op in these cases since the page allocator special-cases
__GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.

It's time to just remove GFP_THISNODE entirely.  We leave __GFP_THISNODE
to restrict an allocation to a local node, but remove GFP_THISNODE and
its obscurity.  Instead, we require that a caller clear __GFP_WAIT if it
wants to avoid reclaim.

This allows the aforementioned functions to actually reclaim as they
should.  It also enables any future callers that want to do
__GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim.  The
rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
it is unchanged.

Signed-off-by: David Rientjes 
Acked-by: Vlastimil Babka 
Cc: Christoph Lameter 
Acked-by: Pekka Enberg 
Cc: Joonsoo Kim 
Acked-by: Johannes Weiner 
Cc: Mel Gorman 
Cc: Pravin Shelar 
Cc: Jarno Rajahalme 
Cc: Li Zefan 
Cc: Greg Thelen 
Cc: Tejun Heo 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: completely remove dumping per-cpu lists from show_mem()

2015-04-14T23:49:01+00:00

It seems nobody needs this.

Signed-off-by: Konstantin Khlebnikov 
Cc: Michal Hocko 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: hide per-cpu lists in output of show_mem()

2015-04-14T23:49:01+00:00

This makes show_mem() much less verbose on huge machines.  Instead of huge
and almost useless dump of counters for each per-zone per-cpu lists this
patch prints the sum of these counters for each zone (free_pcp) and size
of per-cpu list for current cpu (local_pcp).

The filter flag SHOW_MEM_PERCPU_LISTS reverts to the old verbose mode.

[akpm@linux-foundation.org: update show_free_areas comment]
Signed-off-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: enhance compaction finish condition

2015-04-14T23:49:01+00:00

Compaction has anti fragmentation algorithm.  It is that freepage should
be more than pageblock order to finish the compaction if we don't find any
freepage in requested migratetype buddy list.  This is for mitigating
fragmentation, but, there is a lack of migratetype consideration and it is
too excessive compared to page allocator's anti fragmentation algorithm.

Not considering migratetype would cause premature finish of compaction.
For example, if allocation request is for unmovable migratetype, freepage
with CMA migratetype doesn't help that allocation and compaction should
not be stopped.  But, current logic regards this situation as compaction
is no longer needed, so finish the compaction.

Secondly, condition is too excessive compared to page allocator's logic.
We can steal freepage from other migratetype and change pageblock
migratetype on more relaxed conditions in page allocator.  This is
designed to prevent fragmentation and we can use it here.  Imposing hard
constraint only to the compaction doesn't help much in this case since
page allocator would cause fragmentation again.

To solve these problems, this patch borrows anti fragmentation logic from
page allocator.  It will reduce premature compaction finish in some cases
and reduce excessive compaction work.

stress-highalloc test in mmtests with non movable order 7 allocation shows
considerable increase of compaction success rate.

Compaction success rate (Compaction success * 100 / Compaction stalls, %)
31.82 : 42.20

I tested it on non-reboot 5 runs stress-highalloc benchmark and found that
there is no more degradation on allocation success rate than before.  That
roughly means that this patch doesn't result in more fragmentations.

Vlastimil suggests additional idea that we only test for fallbacks when
migration scanner has scanned a whole pageblock.  It looked good for
fragmentation because chance of stealing increase due to making more free
pages in certain pageblock.  So, I tested it, but, it results in decreased
compaction success rate, roughly 38.00.  I guess the reason that if system
is low memory condition, watermark check could be failed due to not enough
order 0 free page and so, sometimes, we can't reach a fallback check
although migrate_pfn is aligned to pageblock_nr_pages.  I can insert code
to cope with this situation but it makes code more complicated so I don't
include his idea at this patch.

[akpm@linux-foundation.org: fix CONFIG_CMA=n build]
Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/page_alloc: factor out fallback freepage checking

2015-04-14T23:49:01+00:00

This is preparation step to use page allocator's anti fragmentation logic
in compaction.  This patch just separates fallback freepage checking part
from fallback freepage management part.  Therefore, there is no functional
change.

Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/cma: change fallback behaviour for CMA freepage

2015-04-14T23:49:01+00:00

Freepage with MIGRATE_CMA can be used only for MIGRATE_MOVABLE and they
should not be expanded to other migratetype buddy list to protect them
from unmovable/reclaimable allocation.  Implementing these requirements in
__rmqueue_fallback(), that is, finding largest possible block of freepage
has bad effect that high order freepage with MIGRATE_CMA are broken
continually although there are suitable order CMA freepage.  Reason is
that they are not be expanded to other migratetype buddy list and next
__rmqueue_fallback() invocation try to finds another largest block of
freepage and break it again.  So, MIGRATE_CMA fallback should be handled
separately.  This patch introduces __rmqueue_cma_fallback(), that just
wrapper of __rmqueue_smallest() and call it before __rmqueue_fallback() if
migratetype == MIGRATE_MOVABLE.

This results in unintended behaviour change that MIGRATE_CMA freepage is
always used first rather than other migratetype as movable allocation's
fallback.  But, as already mentioned above, MIGRATE_CMA can be used only
for MIGRATE_MOVABLE, so it is better to use MIGRATE_CMA freepage first as
much as possible.  Otherwise, we needlessly take up precious freepages
with other migratetype and increase chance of fragmentation.

Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled

2015-03-13T01:46:07+00:00

Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
after OOM killer is disabled if the allocation is performed by a kernel
thread.  This behavior was introduced from the very beginning by
7f33d49a2ed5 ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
 This means that the basic contract for the allocation request is broken
and the context requesting such an allocation might blow up unexpectedly.

There are basically two ways forward.

1) move oom_killer_disable after kernel threads are frozen.  This has a
   risk that the OOM victim wouldn't be able to finish because it would
   depend on an already frozen kernel thread.  This would be really tricky
   to debug.

2) do not fail GFP_NOFAIL allocation no matter what and risk a
   potential Freezable kernel threads will loop and fail the suspend.
   Incidental allocations after kernel threads are frozen will at least
   dump a warning - if we are lucky and the serial console is still active
   of course...

This patch implements the later option because it is safer.  We would see
warning rather than allocation failures for the kernel threads which would
blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
from deeper pm code.

Signed-off-by: Michal Hocko 
Acked-by: David Rientjes 
Cc: Johannes Weiner 
Cc: Tetsuo Handa 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change

2015-02-28T17:57:51+00:00

Historically, !__GFP_FS allocations were not allowed to invoke the OOM
killer once reclaim had failed, but nevertheless kept looping in the
allocator.

Commit 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into
allocation slowpath"), which should have been a simple cleanup patch,
accidentally changed the behavior to aborting the allocation at that
point.  This creates problems with filesystem callers (?) that currently
rely on the allocator waiting for other tasks to intervene.

Revert the behavior as it shouldn't have been changed as part of a
cleanup patch.

Fixes: 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath")
Signed-off-by: Johannes Weiner 
Acked-by: Michal Hocko 
Reported-by: Tetsuo Handa 
Cc: Theodore Ts'o 
Cc: Dave Chinner 
Acked-by: David Rientjes 
Cc: Oleg Nesterov 
Cc: Mel Gorman 
Cc: 	[3.19.x]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds