linux.git/mm/memory-failure.c, branch v3.11

mm/memory-failure.c: fix memory leak in successful soft offlining

2013-07-03T23:07:31+00:00

After a successful page migration by soft offlining, the source page is
not properly freed and it's never reusable even if we unpoison it
afterward.

This is caused by the race between freeing page and setting PG_hwpoison.
In successful soft offlining, the source page is put (and the refcount
becomes 0) by putback_lru_page() in unmap_and_move(), where it's linked
to pagevec and actual freeing back to buddy is delayed.  So if
PG_hwpoison is set for the page before freeing, the freeing does not
functions as expected (in such case freeing aborts in
free_pages_prepare() check.)

This patch tries to make sure to free the source page before setting
PG_hwpoison on it.  To avoid reallocating, the page keeps
MIGRATE_ISOLATE until after setting PG_hwpoison.

This patch also removes obsolete comments about "keeping elevated
refcount" because what they say is not true.  Unlike memory_failure(),
soft_offline_page() uses no special page isolation code, and the
soft-offlined pages have no elevated.

Signed-off-by: Naoya Horiguchi 
Cc: Andi Kleen 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

HWPOISON: check dirty flag to match against clean page

2013-04-29T22:54:28+00:00

Currently page_action() does not check dirty flag to determine whether
the error page is "clean mlocked/unevictable LRU" page.  This doesn't
cause any misjudgement because we do matching against "dirty
mlocked/unevictable LRU" just before the check.  But in order to make
code consistent and/or to avoid potential regression, we had better
check dirty flag explicitly.

Signed-off-by: Naoya Horiguchi 
Suggested-by: Chen Gong 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Wu Fengguang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

HWPOISON: change order of error_states[]'s elements

2013-02-24T01:50:22+00:00

error_states[] has two separate states "unevictable LRU page" and
"mlocked LRU page", and the former one has the higher priority now.  But
because of that the latter one is rarely chosen because pages with
PageMlocked highly likely have PG_unevictable set.  On the other hand,
PG_unevictable without PageMlocked is common for ramfs or SHM_LOCKed
shared memory, so reversing the priority of these two states helps us
clearly distinguish them.

Signed-off-by: Naoya Horiguchi 
Cc: Andi Kleen 
Cc: Chen Gong 
Cc: Tony Luck 
Cc: Wu Fengguang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

HWPOISON: fix misjudgement of page_action() for errors on mlocked pages

2013-02-24T01:50:22+00:00

memory_failure() can't handle memory errors on mlocked pages correctly,
because page_action() judges such errors as ones on "unknown pages"
instead of ones on "unevictable LRU page" or "mlocked LRU page".  In
order to determine page_state page_action() checks page flags at the
timing of the judgement, but such page flags are not the same with those
just after memory_failure() is called, because memory_failure() does
unmapping of the error pages before doing page_action().  This unmapping
changes the page state, especially page_remove_rmap() (called from
try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
mlocked pages after that.

With this patch, we store the page flag of the error page before doing
unmap, and (only) if the first check with page flags at the time decided
the error page is unknown, we do the second check with the stored page
flag.  This implementation doesn't change error handling for the page
types for which the first check can determine the page state correctly.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Naoya Horiguchi 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Chen Gong 
Cc: Wu Fengguang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove offlining arg to migrate_pages

2013-02-24T01:50:19+00:00

No functional change, but the only purpose of the offlining argument to
migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
KSM page for memory hotremove (which took ksm_thread_mutex) but not for
other callers.  Now all cases are safe, remove the arg.

Signed-off-by: Hugh Dickins 
Cc: Rik van Riel 
Cc: Petr Holasek 
Cc: Andrea Arcangeli 
Cc: Izik Eidus 
Cc: Gerald Schaefer 
Cc: KOSAKI Motohiro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memory-failure.c: fix wrong num_poisoned_pages in handling memory error on thp

2013-02-24T01:50:15+00:00

num_poisoned_pages counts up the number of pages isolated by memory
errors.  But for thp, only one subpage is isolated because memory error
handler splits it, so it's wrong to add (1 << compound_trans_order).

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Naoya Horiguchi 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Wu Fengguang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memory-failure.c: clean up soft_offline_page()

2013-02-24T01:50:15+00:00

Currently soft_offline_page() is hard to maintain because it has many
return points and goto statements.  All of this mess come from
get_any_page().

This function should only get page refcount as the name implies, but it
does some page isolating actions like SetPageHWPoison() and dequeuing
hugepage.  This patch corrects it and introduces some internal
subroutines to make soft offlining code more readable and maintainable.

Signed-off-by: Naoya Horiguchi 
Reviewed-by: Andi Kleen 
Cc: Tony Luck 
Cc: Wu Fengguang 
Cc: Xishi Qiu 
Cc: Jiang Liu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memory-failure: use num_poisoned_pages instead of mce_bad_pages

2013-02-24T01:50:15+00:00

Since MCE is an x86 concept, and this code is in mm/, it would be better
to use the name num_poisoned_pages instead of mce_bad_pages.

[akpm@linux-foundation.org: fix mm/sparse.c]
Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
Suggested-by: Borislav Petkov 
Reviewed-by: Wanpeng Li 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memory-failure: do code refactor of soft_offline_page()

2013-02-24T01:50:15+00:00

There are too many return points randomly intermingled with some "goto
done" return points.  So adjust the function structure, one for the
success path, the other for the failure path.  Use atomic_long_inc
instead of atomic_long_add.

Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
Suggested-by: Andrew Morton 
Cc: Borislav Petkov 
Cc: Wanpeng Li 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

memory-failure: fix an error of mce_bad_pages statistics

2013-02-24T01:50:15+00:00

When doing

    $ echo paddr > /sys/devices/system/memory/soft_offline_page

to offline a *free* page, the value of mce_bad_pages will be added, and
the page is set HWPoison flag, but it is still managed by page buddy
alocator.

   $ cat /proc/meminfo | grep HardwareCorrupted

shows the value.

If we offline the same page, the value of mce_bad_pages will be added
*again*, this means the value is incorrect now.  Assume the page is
still free during this short time.

  soft_offline_page()
    get_any_page()
      "else if (is_free_buddy_page(p))" branch return 0
        "goto done";
           "atomic_long_add(1, &mce_bad_pages);"

This patch:

Move poisoned page check at the beginning of the function in order to
fix the error.

Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
Tested-by: Naoya Horiguchi 
Cc: Borislav Petkov 
Cc: Wanpeng Li 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds