linux.git/mm/memory-failure.c, branch v5.14

mm/hwpoison: retry with shake_page() for unhandlable pages

2021-08-20T18:31:42+00:00

HWPoisonHandlable() sometimes returns false for typical user pages due
to races with average memory events like transfers over LRU lists.  This
causes failures in hwpoison handling.

There's retry code for such a case but does not work because the retry
loop reaches the retry limit too quickly before the page settles down to
handlable state.  Let get_any_page() call shake_page() to fix it.

[naoya.horiguchi@nec.com: get_any_page(): return -EIO when retry limit reached]
  Link: https://lkml.kernel.org/r/20210819001958.2365157-1-naoya.horiguchi@linux.dev

Link: https://lkml.kernel.org/r/20210817053703.2267588-1-naoya.horiguchi@linux.dev
Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
Signed-off-by: Naoya Horiguchi 
Reported-by: Tony Luck 
Reviewed-by: Yang Shi 
Cc: Oscar Salvador 
Cc: Muchun Song 
Cc: Mike Kravetz 
Cc: Michal Hocko 
Cc: 		[5.13+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: fix spelling mistakes

2021-07-01T18:06:02+00:00

Fix some spelling mistakes in comments:
each having differents usage ==> each has a different usage
statments ==> statements
adresses ==> addresses
aggresive ==> aggressive
datas ==> data
posion ==> poison
higer ==> higher
precisly ==> precisely
wont ==> won't
We moves tha ==> We move the
endianess ==> endianness

Link: https://lkml.kernel.org/r/20210519065853.7723-2-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei 
Reviewed-by: Souptick Joarder 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: hwpoison_user_mappings() try_to_unmap() with TTU_SYNC

2021-07-01T03:47:30+00:00

TTU_SYNC prevents an unlikely race, when try_to_unmap() returns shortly
before the page is accounted as unmapped.  It is unlikely to coincide with
hwpoisoning, but now that we have the flag, hwpoison_user_mappings() would
do well to use it.

Link: https://lkml.kernel.org/r/329c28ed-95df-9a2c-8893-b444d8a6d340@google.com
Signed-off-by: Hugh Dickins 
Acked-by: Kirill A. Shutemov 
Acked-by: Naoya Horiguchi 
Cc: Alistair Popple 
Cc: Jan Kara 
Cc: Jue Wang 
Cc: "Matthew Wilcox (Oracle)" 
Cc: Miaohe Lin 
Cc: Minchan Kim 
Cc: Oscar Salvador 
Cc: Peter Xu 
Cc: Ralph Campbell 
Cc: Shakeel Butt 
Cc: Wang Yugui 
Cc: Yang Shi 
Cc: Zi Yan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: rmap: make try_to_unmap() void function

2021-07-01T03:47:30+00:00

Currently try_to_unmap() return bool value by checking page_mapcount(),
however this may return false positive since page_mapcount() doesn't check
all subpages of compound page.  The total_mapcount() could be used
instead, but its cost is higher since it traverses all subpages.

Actually the most callers of try_to_unmap() don't care about the return
value at all.  So just need check if page is still mapped by page_mapped()
when necessary.  And page_mapped() does bail out early when it finds
mapped subpage.

Link: https://lkml.kernel.org/r/bb27e3fe-6036-b637-5086-272befbfe3da@google.com
Suggested-by: Hugh Dickins 
Signed-off-by: Yang Shi 
Acked-by: Minchan Kim 
Reviewed-by: Shakeel Butt 
Acked-by: Kirill A. Shutemov 
Signed-off-by: Hugh Dickins 
Acked-by: Naoya Horiguchi 
Cc: Alistair Popple 
Cc: Jan Kara 
Cc: Jue Wang 
Cc: "Matthew Wilcox (Oracle)" 
Cc: Miaohe Lin 
Cc: Oscar Salvador 
Cc: Peter Xu 
Cc: Ralph Campbell 
Cc: Wang Yugui 
Cc: Zi Yan 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/hwpoison: disable pcp for page_handle_poison()

2021-07-01T03:47:27+00:00

Recent changes by patch "mm/page_alloc: allow high-order pages to be
stored on the per-cpu lists" makes kernels determine whether to use pcp by
pcp_allowed_order(), which breaks soft-offline for hugetlb pages.

Soft-offline dissolves a migration source page, then removes it from buddy
free list, so it's assumed that any subpage of the soft-offlined hugepage
are recognized as a buddy page just after returning from
dissolve_free_huge_page().  pcp_allowed_order() returns true for hugetlb,
so this assumption is no longer true.

So disable pcp during dissolve_free_huge_page() and take_page_off_buddy()
to prevent soft-offlined hugepages from linking to pcp lists.
Soft-offline should not be common events so the impact on performance
should be minimal.  And I think that the optimization of Mel's patch could
benefit to hugetlb so zone_pcp_disable() is called only in hwpoison
context.

Link: https://lkml.kernel.org/r/20210617092626.291006-1-nao.horiguchi@gmail.com
Signed-off-by: Naoya Horiguchi 
Acked-by: Mel Gorman 
Cc: Mike Kravetz 
Cc: David Hildenbrand 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm,hwpoison: make get_hwpoison_page() call get_any_page()

2021-06-29T17:53:56+00:00

__get_hwpoison_page() could fail to grab refcount by some race condition,
so it's helpful if we can handle it by retrying.  We already have retry
logic, so make get_hwpoison_page() call get_any_page() when called from
memory_failure().

As a result, get_hwpoison_page() can return negative values (i.e.  error
code), so some callers are also changed to handle error cases.
soft_offline_page() does nothing for -EBUSY because that's enough and
users in userspace can easily handle it.  unpoison_memory() is also
unchanged because it's broken and need thorough fixes (will be done
later).

Link: https://lkml.kernel.org/r/20210603233632.2964832-3-nao.horiguchi@gmail.com
Signed-off-by: Naoya Horiguchi 
Cc: Oscar Salvador 
Cc: Muchun Song 
Cc: Mike Kravetz 
Cc: Michal Hocko 
Cc: Tony Luck 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm,hwpoison: send SIGBUS with error virutal address

2021-06-29T17:53:55+00:00

Now an action required MCE in already hwpoisoned address surely sends a
SIGBUS to current process, but the SIGBUS doesn't convey error virtual
address.  That's not optimal for hwpoison-aware applications.

To fix the issue, make memory_failure() call kill_accessing_process(),
that does pagetable walk to find the error virtual address.  It could find
multiple virtual addresses for the same error page, and it seems hard to
tell which virtual address is correct one.  But that's rare and sending
incorrect virtual address could be better than no address.  So let's
report the first found virtual address for now.

[naoya.horiguchi@nec.com: fix walk_page_range() return]
  Link: https://lkml.kernel.org/r/20210603051055.GA244241@hori.linux.bs1.fc.nec.co.jp

Link: https://lkml.kernel.org/r/20210521030156.2612074-4-nao.horiguchi@gmail.com
Signed-off-by: Naoya Horiguchi 
Cc: Tony Luck 
Cc: Aili Yao 
Cc: Oscar Salvador 
Cc: David Hildenbrand 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Jue Wang 
Cc: Borislav Petkov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/hwpoison: do not lock page again when me_huge_page() successfully recovers

2021-06-25T02:40:54+00:00

Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later.  My testcase (which calls hard-offline on
some tail page in a hugetlb, then accesses the address of the hugetlb
range) showed that page allocation code detects this page lock on buddy
page and printed out "BUG: Bad page state" message.

check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
page, so this flag works as kind of filter, but this filtering doesn't
work in this case because the "bad page" is not the actual hwpoisoned
page.  So stop locking page again.  Actions to be taken depend on the
page type of the error, so page unlocking should be done in ->action()
callbacks.  So let's make it assumed and change all existing callbacks
that way.

Link: https://lkml.kernel.org/r/20210609072029.74645-1-nao.horiguchi@gmail.com
Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Signed-off-by: Naoya Horiguchi 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: Tony Luck 
Cc: "Aneesh Kumar K.V" 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned

2021-06-25T02:40:54+00:00

When memory_failure() is called with MF_ACTION_REQUIRED on the page that
has already been hwpoisoned, memory_failure() could fail to send SIGBUS
to the affected process, which results in infinite loop of MCEs.

Currently memory_failure() returns 0 if it's called for already
hwpoisoned page, then the caller, kill_me_maybe(), could return without
sending SIGBUS to current process.  An action required MCE is raised
when the current process accesses to the broken memory, so no SIGBUS
means that the current process continues to run and access to the error
page again soon, so running into MCE loop.

This issue can arise for example in the following scenarios:

 - Two or more threads access to the poisoned page concurrently. If
   local MCE is enabled, MCE handler independently handles the MCE
   events. So there's a race among MCE events, and the second or latter
   threads fall into the situation in question.

 - If there was a precedent memory error event and memory_failure() for
   the event failed to unmap the error page for some reason, the
   subsequent memory access to the error page triggers the MCE loop
   situation.

To fix the issue, make memory_failure() return an error code when the
error page has already been hwpoisoned.  This allows memory error
handler to control how it sends signals to userspace.  And make sure
that any process touching a hwpoisoned page should get a SIGBUS even in
"already hwpoisoned" path of memory_failure() as is done in page fault
path.

Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com
Signed-off-by: Aili Yao 
Signed-off-by: Naoya Horiguchi 
Reviewed-by: Oscar Salvador 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: David Hildenbrand 
Cc: Jue Wang 
Cc: Tony Luck 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memory-failure: use a mutex to avoid memory_failure() races

2021-06-25T02:40:54+00:00

Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5.

I wrote this patchset to materialize what I think is the current
allowable solution mentioned by the previous discussion [1].  I simply
borrowed Tony's mutex patch and Aili's return code patch, then I queued
another one to find error virtual address in the best effort manner.  I
know that this is not a perfect solution, but should work for some
typical case.

[1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/

This patch (of 2):

There can be races when multiple CPUs consume poison from the same page.
The first into memory_failure() atomically sets the HWPoison page flag
and begins hunting for tasks that map this page.  Eventually it
invalidates those mappings and may send a SIGBUS to the affected tasks.

But while all that work is going on, other CPUs see a "success" return
code from memory_failure() and so they believe the error has been
handled and continue executing.

Fix by wrapping most of the internal parts of memory_failure() in a
mutex.

[akpm@linux-foundation.org: make mf_mutex local to memory_failure()]

Link: https://lkml.kernel.org/r/20210521030156.2612074-1-nao.horiguchi@gmail.com
Link: https://lkml.kernel.org/r/20210521030156.2612074-2-nao.horiguchi@gmail.com
Signed-off-by: Tony Luck 
Signed-off-by: Naoya Horiguchi 
Reviewed-by: Borislav Petkov 
Reviewed-by: Oscar Salvador 
Cc: Aili Yao 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: David Hildenbrand 
Cc: Jue Wang 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds