<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/memory-failure.c, branch v5.10</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>hugetlbfs: fix anon huge page migration race</title>
<updated>2020-11-14T19:26:04+00:00</updated>
<author>
<name>Mike Kravetz</name>
<email>mike.kravetz@oracle.com</email>
</author>
<published>2020-11-14T06:52:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=336bf30eb76580b579dc711ded5d599d905c0217'/>
<id>336bf30eb76580b579dc711ded5d599d905c0217</id>
<content type='text'>
Qian Cai reported the following BUG in [1]

  LTP: starting move_pages12
  BUG: unable to handle page fault for address: ffffffffffffffe0
  ...
  RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63
  Call Trace:
    rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864
    try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763
    migrate_pages+0x1005/0x1fb0
    move_pages_and_store_status.isra.47+0xd7/0x1a0
    __x64_sys_move_pages+0xa5c/0x1100
    do_syscall_64+0x5f/0x310
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Hugh Dickins diagnosed this as a migration bug caused by code introduced
to use i_mmap_rwsem for pmd sharing synchronization.  Specifically, the
routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
flag to try_to_unmap() while holding i_mmap_rwsem.  This is wrong for
anon pages as the anon_vma_lock should be held in this case.  Further
analysis suggested that i_mmap_rwsem was not required to he held at all
when calling try_to_unmap for anon pages as an anon page could never be
part of a shared pmd mapping.

Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
to drop page lock and acquire i_mmap_rwsem is wrong.  There is no way to
keep mapping valid while dropping page lock.

This patch does the following:

 - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
   calling try_to_unmap.

 - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine
   will now simply do a 'trylock' while still holding the page lock. If
   the trylock fails, it will return NULL. This could impact the
   callers:

    - migration calling code will receive -EAGAIN and retry up to the
      hard coded limit (10).

    - memory error code will treat the page as BUSY. This will force
      killing (SIGKILL) instead of SIGBUS any mapping tasks.

   Do note that this change in behavior only happens when there is a
   race. None of the standard kernel testing suites actually hit this
   race, but it is possible.

[1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
[2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/

Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Suggested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Qian Cai reported the following BUG in [1]

  LTP: starting move_pages12
  BUG: unable to handle page fault for address: ffffffffffffffe0
  ...
  RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63
  Call Trace:
    rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864
    try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763
    migrate_pages+0x1005/0x1fb0
    move_pages_and_store_status.isra.47+0xd7/0x1a0
    __x64_sys_move_pages+0xa5c/0x1100
    do_syscall_64+0x5f/0x310
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Hugh Dickins diagnosed this as a migration bug caused by code introduced
to use i_mmap_rwsem for pmd sharing synchronization.  Specifically, the
routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
flag to try_to_unmap() while holding i_mmap_rwsem.  This is wrong for
anon pages as the anon_vma_lock should be held in this case.  Further
analysis suggested that i_mmap_rwsem was not required to he held at all
when calling try_to_unmap for anon pages as an anon page could never be
part of a shared pmd mapping.

Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
to drop page lock and acquire i_mmap_rwsem is wrong.  There is no way to
keep mapping valid while dropping page lock.

This patch does the following:

 - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
   calling try_to_unmap.

 - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine
   will now simply do a 'trylock' while still holding the page lock. If
   the trylock fails, it will return NULL. This could impact the
   callers:

    - migration calling code will receive -EAGAIN and retry up to the
      hard coded limit (10).

    - memory error code will treat the page as BUSY. This will force
      killing (SIGKILL) instead of SIGBUS any mapping tasks.

   Do note that this change in behavior only happens when there is a
   race. None of the standard kernel testing suites actually hit this
   race, but it is possible.

[1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
[2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/

Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Suggested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: remove a wrapper for alloc_migration_target()</title>
<updated>2020-10-18T16:27:09+00:00</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2020-10-17T23:13:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=546087599986be4fe4e39a621cc0828e832caccb'/>
<id>546087599986be4fe4e39a621cc0828e832caccb</id>
<content type='text'>
There is a well-defined standard migration target callback.  Use it
directly.

Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Link: http://lkml.kernel.org/r/1594622517-20681-9-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a well-defined standard migration target callback.  Use it
directly.

Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Link: http://lkml.kernel.org/r/1594622517-20681-9-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: try to narrow window race for free pages</title>
<updated>2020-10-16T18:11:17+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2020-10-16T03:07:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b94e02822debdf0cc473556aad7dcc859f216653'/>
<id>b94e02822debdf0cc473556aad7dcc859f216653</id>
<content type='text'>
Aristeu Rozanski reported that a customer test case started to report
-EBUSY after the hwpoison rework patchset.

There is a race window between spotting a free page and taking it off its
buddy freelist, so it might be that by the time we try to take it off, the
page has been already allocated.

This patch tries to handle such race window by trying to handle the new
type of page again if the page was allocated under us.

Reported-by: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Tested-by: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-15-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Aristeu Rozanski reported that a customer test case started to report
-EBUSY after the hwpoison rework patchset.

There is a race window between spotting a free page and taking it off its
buddy freelist, so it might be that by the time we try to take it off, the
page has been already allocated.

This patch tries to handle such race window by trying to handle the new
type of page again if the page was allocated under us.

Reported-by: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Tested-by: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-15-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: double-check page count in __get_any_page()</title>
<updated>2020-10-16T18:11:17+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>naoya.horiguchi@nec.com</email>
</author>
<published>2020-10-16T03:07:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1f2481ddbe444de5bed72f167d7180d1b2708e56'/>
<id>1f2481ddbe444de5bed72f167d7180d1b2708e56</id>
<content type='text'>
Soft offlining could fail with EIO due to the race condition with hugepage
migration.  This issuse became visible due to the change by previous patch
that makes soft offline handler take page refcount by its own.  We have no
way to directly pin zero refcount page, and the page considered as a zero
refcount page could be allocated just after the first check.

This patch adds the second check to find the race and gives us chance to
handle it more reliably.

Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-14-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Soft offlining could fail with EIO due to the race condition with hugepage
migration.  This issuse became visible due to the change by previous patch
that makes soft offline handler take page refcount by its own.  We have no
way to directly pin zero refcount page, and the page considered as a zero
refcount page could be allocated just after the first check.

This patch adds the second check to find the race and gives us chance to
handle it more reliably.

Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-14-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: introduce MF_MSG_UNSPLIT_THP</title>
<updated>2020-10-16T18:11:17+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>naoya.horiguchi@nec.com</email>
</author>
<published>2020-10-16T03:07:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5d1fd5dc877bc1c670e7b1c174aa659b76c07de1'/>
<id>5d1fd5dc877bc1c670e7b1c174aa659b76c07de1</id>
<content type='text'>
memory_failure() is supposed to call action_result() when it handles a
memory error event, but there's one missing case.  So let's add it.

I find that include/ras/ras_event.h has some other MF_MSG_* undefined, so
this patch also adds them.

Signed-off-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-13-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
memory_failure() is supposed to call action_result() when it handles a
memory error event, but there's one missing case.  So let's add it.

I find that include/ras/ras_event.h has some other MF_MSG_* undefined, so
this patch also adds them.

Signed-off-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-13-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: return 0 if the page is already poisoned in soft-offline</title>
<updated>2020-10-16T18:11:16+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2020-10-16T03:07:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5a2ffca3c23333a41cf8604f62994cfd28e4267b'/>
<id>5a2ffca3c23333a41cf8604f62994cfd28e4267b</id>
<content type='text'>
Currently, there is an inconsistency when calling soft-offline from
different paths on a page that is already poisoned.

1) madvise:

        madvise_inject_error skips any poisoned page and continues
        the loop.
        If that was the only page to madvise, it returns 0.

2) /sys/devices/system/memory/:

        When calling soft_offline_page_store()-&gt;soft_offline_page(),
        we return -EBUSY in case the page is already poisoned.
        This is inconsistent with a) the above example and b)
        memory_failure, where we return 0 if the page was poisoned.

Fix this by dropping the PageHWPoison() check in madvise_inject_error, and
let soft_offline_page return 0 if it finds the page already poisoned.

Please, note that this represents a user-api change, since now the return
error when calling soft_offline_page_store()-&gt;soft_offline_page() will be
different.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-12-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, there is an inconsistency when calling soft-offline from
different paths on a page that is already poisoned.

1) madvise:

        madvise_inject_error skips any poisoned page and continues
        the loop.
        If that was the only page to madvise, it returns 0.

2) /sys/devices/system/memory/:

        When calling soft_offline_page_store()-&gt;soft_offline_page(),
        we return -EBUSY in case the page is already poisoned.
        This is inconsistent with a) the above example and b)
        memory_failure, where we return 0 if the page was poisoned.

Fix this by dropping the PageHWPoison() check in madvise_inject_error, and
let soft_offline_page return 0 if it finds the page already poisoned.

Please, note that this represents a user-api change, since now the return
error when calling soft_offline_page_store()-&gt;soft_offline_page() will be
different.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-12-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_page</title>
<updated>2020-10-16T18:11:16+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2020-10-16T03:07:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6b9a217eda4a13cc72914fdf7433712122ff595b'/>
<id>6b9a217eda4a13cc72914fdf7433712122ff595b</id>
<content type='text'>
Merging soft_offline_huge_page and __soft_offline_page let us get rid of
quite some duplicated code, and makes the code much easier to follow.

Now, __soft_offline_page will handle both normal and hugetlb pages.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-11-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merging soft_offline_huge_page and __soft_offline_page let us get rid of
quite some duplicated code, and makes the code much easier to follow.

Now, __soft_offline_page will handle both normal and hugetlb pages.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-11-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: rework soft offline for in-use pages</title>
<updated>2020-10-16T18:11:16+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2020-10-16T03:07:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=79f5f8fab482dfff62948214468ac4ebbf0a016f'/>
<id>79f5f8fab482dfff62948214468ac4ebbf0a016f</id>
<content type='text'>
This patch changes the way we set and handle in-use poisoned pages.  Until
now, poisoned pages were released to the buddy allocator, trusting that
the checks that take place at allocation time would act as a safe net and
would skip that page.

This has proved to be wrong, as we got some pfn walkers out there, like
compaction, that all they care is the page to be in a buddy freelist.

Although this might not be the only user, having poisoned pages in the
buddy allocator seems a bad idea as we should only have free pages that
are ready and meant to be used as such.

Before explaining the taken approach, let us break down the kind of pages
we can soft offline.

- Anonymous THP (after the split, they end up being 4K pages)
- Hugetlb
- Order-0 pages (that can be either migrated or invalited)

* Normal pages (order-0 and anon-THP)

  - If they are clean and unmapped page cache pages, we invalidate
    then by means of invalidate_inode_page().
  - If they are mapped/dirty, we do the isolate-and-migrate dance.

Either way, do not call put_page directly from those paths.  Instead, we
keep the page and send it to page_handle_poison to perform the right
handling.

page_handle_poison sets the HWPoison flag and does the last put_page.

Down the chain, we placed a check for HWPoison page in
free_pages_prepare, that just skips any poisoned page, so those pages
do not end up in any pcplist/freelist.

After that, we set the refcount on the page to 1 and we increment
the poisoned pages counter.

If we see that the check in free_pages_prepare creates trouble, we can
always do what we do for free pages:

  - wait until the page hits buddy's freelists
  - take it off, and flag it

The downside of the above approach is that we could race with an
allocation, so by the time we  want to take the page off the buddy, the
page has been already allocated so we cannot soft offline it.
But the user could always retry it.

* Hugetlb pages

  - We isolate-and-migrate them

After the migration has been successful, we call dissolve_free_huge_page,
and we set HWPoison on the page if we succeed.
Hugetlb has a slightly different handling though.

While for non-hugetlb pages we cared about closing the race with an
allocation, doing so for hugetlb pages requires quite some additional
and intrusive code (we would need to hook in free_huge_page and some other
places).
So I decided to not make the code overly complicated and just fail
normally if the page we allocated in the meantime.

We can always build on top of this.

As a bonus, because of the way we handle now in-use pages, we no longer
need the put-as-isolation-migratetype dance, that was guarding for poisoned
pages to end up in pcplists.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-10-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch changes the way we set and handle in-use poisoned pages.  Until
now, poisoned pages were released to the buddy allocator, trusting that
the checks that take place at allocation time would act as a safe net and
would skip that page.

This has proved to be wrong, as we got some pfn walkers out there, like
compaction, that all they care is the page to be in a buddy freelist.

Although this might not be the only user, having poisoned pages in the
buddy allocator seems a bad idea as we should only have free pages that
are ready and meant to be used as such.

Before explaining the taken approach, let us break down the kind of pages
we can soft offline.

- Anonymous THP (after the split, they end up being 4K pages)
- Hugetlb
- Order-0 pages (that can be either migrated or invalited)

* Normal pages (order-0 and anon-THP)

  - If they are clean and unmapped page cache pages, we invalidate
    then by means of invalidate_inode_page().
  - If they are mapped/dirty, we do the isolate-and-migrate dance.

Either way, do not call put_page directly from those paths.  Instead, we
keep the page and send it to page_handle_poison to perform the right
handling.

page_handle_poison sets the HWPoison flag and does the last put_page.

Down the chain, we placed a check for HWPoison page in
free_pages_prepare, that just skips any poisoned page, so those pages
do not end up in any pcplist/freelist.

After that, we set the refcount on the page to 1 and we increment
the poisoned pages counter.

If we see that the check in free_pages_prepare creates trouble, we can
always do what we do for free pages:

  - wait until the page hits buddy's freelists
  - take it off, and flag it

The downside of the above approach is that we could race with an
allocation, so by the time we  want to take the page off the buddy, the
page has been already allocated so we cannot soft offline it.
But the user could always retry it.

* Hugetlb pages

  - We isolate-and-migrate them

After the migration has been successful, we call dissolve_free_huge_page,
and we set HWPoison on the page if we succeed.
Hugetlb has a slightly different handling though.

While for non-hugetlb pages we cared about closing the race with an
allocation, doing so for hugetlb pages requires quite some additional
and intrusive code (we would need to hook in free_huge_page and some other
places).
So I decided to not make the code overly complicated and just fail
normally if the page we allocated in the meantime.

We can always build on top of this.

As a bonus, because of the way we handle now in-use pages, we no longer
need the put-as-isolation-migratetype dance, that was guarding for poisoned
pages to end up in pcplists.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-10-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: rework soft offline for free pages</title>
<updated>2020-10-16T18:11:16+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2020-10-16T03:07:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=06be6ff3d2ec8be806b859fc054a1909b16d2473'/>
<id>06be6ff3d2ec8be806b859fc054a1909b16d2473</id>
<content type='text'>
When trying to soft-offline a free page, we need to first take it off the
buddy allocator.  Once we know is out of reach, we can safely flag it as
poisoned.

take_page_off_buddy will be used to take a page meant to be poisoned off
the buddy allocator.  take_page_off_buddy calls break_down_buddy_pages,
which splits a higher-order page in case our page belongs to one.

Once the page is under our control, we call page_handle_poison to set it
as poisoned and grab a refcount on it.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-9-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When trying to soft-offline a free page, we need to first take it off the
buddy allocator.  Once we know is out of reach, we can safely flag it as
poisoned.

take_page_off_buddy will be used to take a page meant to be poisoned off
the buddy allocator.  take_page_off_buddy calls break_down_buddy_pages,
which splits a higher-order page in case our page belongs to one.

Once the page is under our control, we call page_handle_poison to set it
as poisoned and grab a refcount on it.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-9-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm,hwpoison: unify THP handling for hard and soft offline</title>
<updated>2020-10-16T18:11:16+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2020-10-16T03:07:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=694bf0b0cdf91be50e6f037ca93733ed83ca1187'/>
<id>694bf0b0cdf91be50e6f037ca93733ed83ca1187</id>
<content type='text'>
Place the THP's page handling in a helper and use it from both hard and
soft-offline machinery, so we get rid of some duplicated code.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-8-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Place the THP's page handling in a helper and use it from both hard and
soft-offline machinery, so we get rid of some duplicated code.

Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Aristeu Rozanski &lt;aris@ruivo.org&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200922135650.1634-8-osalvador@suse.de
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
