<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/memory-failure.c, branch v6.11</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu</title>
<updated>2024-08-16T05:16:14+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-08-06T16:41:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d75abd0d0bc29e6ebfebbf76d11b4067b35844af'/>
<id>d75abd0d0bc29e6ebfebbf76d11b4067b35844af</id>
<content type='text'>
The memory_failure_cpu structure is a per-cpu structure.  Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption.  The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.

Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.

  [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
  [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
  [12135.732252] preempt_count: 1, expected: 0
  [12135.732255] RCU nest depth: 2, expected: 2
    :
  [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
  [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
  [12135.732433] Call Trace:
  [12135.732436]  &lt;TASK&gt;
  [12135.732450]  dump_stack_lvl+0x57/0x81
  [12135.732461]  __might_resched.cold+0xf4/0x12f
  [12135.732479]  rt_spin_lock+0x4c/0x100
  [12135.732491]  memory_failure_queue+0x40/0xe0
  [12135.732503]  ghes_do_memory_failure+0x53/0x390
  [12135.732516]  ghes_do_proc.constprop.0+0x229/0x3e0
  [12135.732575]  ghes_proc+0xf9/0x1a0
  [12135.732591]  ghes_notify_hed+0x6a/0x150
  [12135.732602]  notifier_call_chain+0x43/0xb0
  [12135.732626]  blocking_notifier_call_chain+0x43/0x60
  [12135.732637]  acpi_ev_notify_dispatch+0x47/0x70
  [12135.732648]  acpi_os_execute_deferred+0x13/0x20
  [12135.732654]  process_one_work+0x41f/0x500
  [12135.732695]  worker_thread+0x192/0x360
  [12135.732715]  kthread+0x111/0x140
  [12135.732733]  ret_from_fork+0x29/0x50
  [12135.732779]  &lt;/TASK&gt;

Fix it by using a raw_spinlock_t for locking instead.

Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.

[longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
  Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The memory_failure_cpu structure is a per-cpu structure.  Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption.  The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.

Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.

  [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
  [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
  [12135.732252] preempt_count: 1, expected: 0
  [12135.732255] RCU nest depth: 2, expected: 2
    :
  [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
  [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
  [12135.732433] Call Trace:
  [12135.732436]  &lt;TASK&gt;
  [12135.732450]  dump_stack_lvl+0x57/0x81
  [12135.732461]  __might_resched.cold+0xf4/0x12f
  [12135.732479]  rt_spin_lock+0x4c/0x100
  [12135.732491]  memory_failure_queue+0x40/0xe0
  [12135.732503]  ghes_do_memory_failure+0x53/0x390
  [12135.732516]  ghes_do_proc.constprop.0+0x229/0x3e0
  [12135.732575]  ghes_proc+0xf9/0x1a0
  [12135.732591]  ghes_notify_hed+0x6a/0x150
  [12135.732602]  notifier_call_chain+0x43/0xb0
  [12135.732626]  blocking_notifier_call_chain+0x43/0x60
  [12135.732637]  acpi_ev_notify_dispatch+0x47/0x70
  [12135.732648]  acpi_os_execute_deferred+0x13/0x20
  [12135.732654]  process_one_work+0x41f/0x500
  [12135.732695]  worker_thread+0x192/0x360
  [12135.732715]  kthread+0x111/0x140
  [12135.732733]  ret_from_fork+0x29/0x50
  [12135.732779]  &lt;/TASK&gt;

Fix it by using a raw_spinlock_t for locking instead.

Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.

[longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
  Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: remove obsolete MF_MSG_DIFFERENT_COMPOUND</title>
<updated>2024-07-12T22:52:22+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2024-07-08T03:05:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8a78882dac1c8c464e047a29415edb50421651ce'/>
<id>8a78882dac1c8c464e047a29415edb50421651ce</id>
<content type='text'>
The page cannot become compound pages again just after a folio is split as
an extra refcnt is held.  So the MF_MSG_DIFFERENT_COMPOUND case is
obsolete and can be removed to get rid of this false assumption and code
burden.  But add one WARN_ON() here to keep the situation clear.

Link: https://lkml.kernel.org/r/20240708030544.196919-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The page cannot become compound pages again just after a folio is split as
an extra refcnt is held.  So the MF_MSG_DIFFERENT_COMPOUND case is
obsolete and can be removed to get rid of this false assumption and code
burden.  But add one WARN_ON() here to keep the situation clear.

Link: https://lkml.kernel.org/r/20240708030544.196919-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: provide mm_struct and address to huge_ptep_get()</title>
<updated>2024-07-12T22:52:15+00:00</updated>
<author>
<name>Christophe Leroy</name>
<email>christophe.leroy@csgroup.eu</email>
</author>
<published>2024-07-02T13:51:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e6c0c03245b14d6c205814aee67128257d0bea84'/>
<id>e6c0c03245b14d6c205814aee67128257d0bea84</id>
<content type='text'>
On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is
a PTE entry or a PMD entry.  This cannot be known with the PMD entry
itself because there is no easy way to know it from the content of the
entry.

So huge_ptep_get() will need to know either the size of the page or get
the pmd.

In order to be consistent with huge_ptep_get_and_clear(), give mm and
address to huge_ptep_get().

Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is
a PTE entry or a PMD entry.  This cannot be known with the PMD entry
itself because there is no easy way to know it from the content of the
entry.

So huge_ptep_get() will need to know either the size of the page or get
the pmd.

In order to be consistent with huge_ptep_get_and_clear(), give mm and
address to huge_ptep_get().

Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: userspace controls soft-offlining pages</title>
<updated>2024-07-05T01:05:59+00:00</updated>
<author>
<name>Jiaqi Yan</name>
<email>jiaqiyan@google.com</email>
</author>
<published>2024-06-26T05:08:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=56374430c5dfcf6d4f1df79514f797b45fbd0485'/>
<id>56374430c5dfcf6d4f1df79514f797b45fbd0485</id>
<content type='text'>
Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC.  Soft offline is kernel's additional
recovery handling for memory pages having (excessive) corrected memory
errors.  Impacted page is migrated to a healthy page if inuse; the
original page is discarded for any future use.

The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page. 
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.  If
userspace has not acknowledged such behavior, it may be surprised when
later failed to mmap hugepages due to lack of hugepages.  In case of a
transparent hugepage, it will be split into 4K pages as well; userspace
will stop enjoying the transparent performance.

In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not doing
under the hood.  But today there are at least 2 such cases doing so:
1. when GHES driver sees both GHES_SEV_CORRECTED and
   CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
   PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.

This commit gives userspace the control of softofflining any page: kernel
only soft offlines raw page / transparent hugepage / HugeTLB hugepage if
userspace has agreed to.  The interface to userspace is a new sysctl at
/proc/sys/vm/enable_soft_offline.  By default its value is set to 1 to
preserve existing behavior in kernel.  When set to 0, soft-offline (e.g. 
MADV_SOFT_OFFLINE) will fail with EOPNOTSUPP.

[jiaqiyan@google.com: v7]
  Link: https://lkml.kernel.org/r/20240628205958.2845610-3-jiaqiyan@google.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-3-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Lance Yang &lt;ioworker0@gmail.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC.  Soft offline is kernel's additional
recovery handling for memory pages having (excessive) corrected memory
errors.  Impacted page is migrated to a healthy page if inuse; the
original page is discarded for any future use.

The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page. 
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.  If
userspace has not acknowledged such behavior, it may be surprised when
later failed to mmap hugepages due to lack of hugepages.  In case of a
transparent hugepage, it will be split into 4K pages as well; userspace
will stop enjoying the transparent performance.

In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not doing
under the hood.  But today there are at least 2 such cases doing so:
1. when GHES driver sees both GHES_SEV_CORRECTED and
   CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
   PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.

This commit gives userspace the control of softofflining any page: kernel
only soft offlines raw page / transparent hugepage / HugeTLB hugepage if
userspace has agreed to.  The interface to userspace is a new sysctl at
/proc/sys/vm/enable_soft_offline.  By default its value is set to 1 to
preserve existing behavior in kernel.  When set to 0, soft-offline (e.g. 
MADV_SOFT_OFFLINE) will fail with EOPNOTSUPP.

[jiaqiyan@google.com: v7]
  Link: https://lkml.kernel.org/r/20240628205958.2845610-3-jiaqiyan@google.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-3-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Lance Yang &lt;ioworker0@gmail.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: refactor log format in soft offline code</title>
<updated>2024-07-05T01:05:59+00:00</updated>
<author>
<name>Jiaqi Yan</name>
<email>jiaqiyan@google.com</email>
</author>
<published>2024-06-26T05:08:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=865319f772e6d5b8c932ff0abd7a9e57fe1c04e0'/>
<id>865319f772e6d5b8c932ff0abd7a9e57fe1c04e0</id>
<content type='text'>
Patch series "Userspace controls soft-offline pages", v6.

Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC, but with two pain points to users:

1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
   errors can develop into uncorrectable memory error.

Soft offline is kernel's additional solution for memory pages having
(excessive) corrected memory errors.  Impacted page is migrated to healthy
page if it is in use, then the original page is discarded for any future
use.

The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page. 
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.  If
userspace has not acknowledged such behavior, it may be surprised when
later mmap hugepages MAP_FAILED due to lack of hugepages.  In case of a
transparent hugepage, it will be split into 4K pages as well; userspace
will stop enjoying the transparent performance.

In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not doing
under the hood.  But today there are at least 2 such cases:

1. GHES driver sees both GHES_SEV_CORRECTED and
   CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
   PFN and when the counter for a PFN reaches threshold

In both cases, userspace has no control of the soft offline performed by
kernel's memory failure recovery.

This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to.  The interface to userspace is a new
sysctl called enable_soft_offline under /proc/sys/vm.  By default
enable_soft_line is 1 to preserve existing behavior in kernel.


This patch (of 4):

Logs from soft_offline_page and soft_offline_in_use_page have different
formats than majority of the memory failure code:

  "Memory failure: 0x${pfn}: ${lower_case_message}"

Convert them to the following format:

  "Soft offline: 0x${pfn}: ${lower_case_message}"

No functional change in this commit.

Link: https://lkml.kernel.org/r/20240626050818.2277273-1-jiaqiyan@google.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-2-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: Lance Yang &lt;ioworker0@gmail.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "Userspace controls soft-offline pages", v6.

Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC, but with two pain points to users:

1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
   errors can develop into uncorrectable memory error.

Soft offline is kernel's additional solution for memory pages having
(excessive) corrected memory errors.  Impacted page is migrated to healthy
page if it is in use, then the original page is discarded for any future
use.

The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page. 
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.  If
userspace has not acknowledged such behavior, it may be surprised when
later mmap hugepages MAP_FAILED due to lack of hugepages.  In case of a
transparent hugepage, it will be split into 4K pages as well; userspace
will stop enjoying the transparent performance.

In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not doing
under the hood.  But today there are at least 2 such cases:

1. GHES driver sees both GHES_SEV_CORRECTED and
   CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
   PFN and when the counter for a PFN reaches threshold

In both cases, userspace has no control of the soft offline performed by
kernel's memory failure recovery.

This patch series give userspace the control of softofflining any page:
kernel only soft offlines raw page / transparent hugepage / HugeTLB
hugepage if userspace has agreed to.  The interface to userspace is a new
sysctl called enable_soft_offline under /proc/sys/vm.  By default
enable_soft_line is 1 to preserve existing behavior in kernel.


This patch (of 4):

Logs from soft_offline_page and soft_offline_in_use_page have different
formats than majority of the memory failure code:

  "Memory failure: 0x${pfn}: ${lower_case_message}"

Convert them to the following format:

  "Soft offline: 0x${pfn}: ${lower_case_message}"

No functional change in this commit.

Link: https://lkml.kernel.org/r/20240626050818.2277273-1-jiaqiyan@google.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-2-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: Lance Yang &lt;ioworker0@gmail.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: refactor log format in unpoison_memory</title>
<updated>2024-07-04T02:30:19+00:00</updated>
<author>
<name>Jiaqi Yan</name>
<email>jiaqiyan@google.com</email>
</author>
<published>2024-06-19T06:33:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5cea5666e4b556f4daa414a7379790ce8d225a48'/>
<id>5cea5666e4b556f4daa414a7379790ce8d225a48</id>
<content type='text'>
Logs from memory_failure and other memory-failure.c code follow the
format:

  "Memory failure: 0x{pfn}: ${lower_case_message}"

Convert the logs in unpoison_memory to follow similar format:

  "Unpoison: 0x${pfn}: ${lower_case_message}"

For example (from local test):
  [ 1331.938397] Unpoison: 0x144bc8: page was already unpoisoned

No functional change in this commit.

Link: https://lkml.kernel.org/r/20240619063355.171313-1-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Lance Yang &lt;ioworker0@gmail.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Logs from memory_failure and other memory-failure.c code follow the
format:

  "Memory failure: 0x{pfn}: ${lower_case_message}"

Convert the logs in unpoison_memory to follow similar format:

  "Unpoison: 0x${pfn}: ${lower_case_message}"

For example (from local test):
  [ 1331.938397] Unpoison: 0x144bc8: page was already unpoisoned

No functional change in this commit.

Link: https://lkml.kernel.org/r/20240619063355.171313-1-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan &lt;jiaqiyan@google.com&gt;
Acked-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Lance Yang &lt;ioworker0@gmail.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: correct comment in me_swapcache_dirty</title>
<updated>2024-07-04T02:30:12+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2024-06-12T07:18:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e5d896703d1318d539a972ff01dc887c124af427'/>
<id>e5d896703d1318d539a972ff01dc887c124af427</id>
<content type='text'>
Dirty swap cache page could live both in page table (not page cache) and
swap cache when freshly swapped in.  Correct comment.

Link: https://lkml.kernel.org/r/20240612071835.157004-14-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dirty swap cache page could live both in page table (not page cache) and
swap cache when freshly swapped in.  Correct comment.

Link: https://lkml.kernel.org/r/20240612071835.157004-14-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: remove obsolete comment in kill_proc()</title>
<updated>2024-07-04T02:30:12+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2024-06-12T07:18:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d49f2366e98081471dbef5c1bc0da00d10fa0776'/>
<id>d49f2366e98081471dbef5c1bc0da00d10fa0776</id>
<content type='text'>
When user sets SIGBUS to SIG_IGN, it won't cause loop now.  For action
required mce error, SIGBUS cannot be blocked.  Also when a hwpoisoned page
is re-accessed, kill_accessing_process() will be called to kill the
process.

Link: https://lkml.kernel.org/r/20240612071835.157004-13-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When user sets SIGBUS to SIG_IGN, it won't cause loop now.  For action
required mce error, SIGBUS cannot be blocked.  Also when a hwpoisoned page
is re-accessed, kill_accessing_process() will be called to kill the
process.

Link: https://lkml.kernel.org/r/20240612071835.157004-13-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: fix comment of get_hwpoison_page()</title>
<updated>2024-07-04T02:30:11+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2024-06-12T07:18:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b71340ef56a4cd5a5ed7f61a5c268ee346ff0640'/>
<id>b71340ef56a4cd5a5ed7f61a5c268ee346ff0640</id>
<content type='text'>
When return value is 0, it could also means the page is free hugetlb page
or free buddy page.  Fix the corresponding comment.

Link: https://lkml.kernel.org/r/20240612071835.157004-12-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When return value is 0, it could also means the page is free hugetlb page
or free buddy page.  Fix the corresponding comment.

Link: https://lkml.kernel.org/r/20240612071835.157004-12-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory-failure: remove obsolete comment in unpoison_memory()</title>
<updated>2024-07-04T02:30:11+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2024-06-12T07:18:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28eab7d4e7b4fefed7122d49eb8148c623540739'/>
<id>28eab7d4e7b4fefed7122d49eb8148c623540739</id>
<content type='text'>
Since commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to
allow larger rcu_head"), folio-&gt;_mapcount is not overloaded with SLAB. 
Update corresponding comment.

Link: https://lkml.kernel.org/r/20240612071835.157004-10-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to
allow larger rcu_head"), folio-&gt;_mapcount is not overloaded with SLAB. 
Update corresponding comment.

Link: https://lkml.kernel.org/r/20240612071835.157004-10-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
