linux.git/mm/memory-failure.c, branch v6.13

mm/memory-failure: replace sprintf() with sysfs_emit()

2024-11-11T08:26:46+00:00

As Documentation/filesystems/sysfs.rst suggested, show() should only use
sysfs_emit() or sysfs_emit_at() when formatting the value to be returned
to user space.

Link: https://lkml.kernel.org/r/20241029101853.37890-1-zhangguopeng@kylinos.cn
Signed-off-by: zhangguopeng 
Acked-by: Miaohe Lin 
Cc: Naoya Horiguchi 
Signed-off-by: Andrew Morton

mm: mass constification of folio/page pointers

2024-11-07T22:38:07+00:00

Now that page_pgoff() takes const pointers, we can constify the pointers
to a lot of functions.

Link: https://lkml.kernel.org/r/20241005200121.3231142-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton

mm: renovate page_address_in_vma()

2024-11-07T22:38:07+00:00

This function doesn't modify any of its arguments, so if we make a few
other functions take const pointers, we can make page_address_in_vma()
take const pointers too.  All of its callers have the containing folio
already, so pass that in as an argument instead of recalculating it.  Also
add kernel-doc

Link: https://lkml.kernel.org/r/20241005200121.3231142-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton

mm: convert page_to_pgoff() to page_pgoff()

2024-11-07T22:38:07+00:00

Patch series "page->index removals in mm", v2.

As part of shrinking struct page, we need to stop using page->index.  This
patchset gets rid of most of the remaining references to page->index in
mm, as well as increasing the number of functions which take a const
folio/page pointer.  It shrinks the text segment of mm by a few hundred
bytes in my test config, probably mostly from removing calls to
compound_head() in page_to_pgoff().


This patch (of 7):

Change the function signature to pass in the folio as all three callers
have it.  This removes a reference to page->index, which we're trying to
get rid of.  And add kernel-doc.

Link: https://lkml.kernel.org/r/20241005200121.3231142-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20241005200121.3231142-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton

mm: migrate: add isolate_folio_to_list()

2024-09-04T04:15:59+00:00

Add isolate_folio_to_list() helper to try to isolate HugeTLB, no-LRU
movable and LRU folios to a list, which will be reused by
do_migrate_range() from memory hotplug soon, also drop the
mf_isolate_folio() since we could directly use new helper in the
soft_offline_in_use_page().

Link: https://lkml.kernel.org/r/20240827114728.3212578-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang 
Acked-by: David Hildenbrand 
Acked-by: Miaohe Lin 
Tested-by: Miaohe Lin 
Cc: Dan Carpenter 
Cc: Jonathan Cameron 
Cc: Naoya Horiguchi 
Cc: Oscar Salvador 
Signed-off-by: Andrew Morton

mm: memory-failure: add unmap_poisoned_folio()

2024-09-04T04:15:59+00:00

Add unmap_poisoned_folio() helper which will be reused by
do_migrate_range() from memory hotplug soon.

[akpm@linux-foundation.org: whitespace tweak, per Miaohe Lin]
  Link: https://lkml.kernel.org/r/1f80c7e3-c30d-1ac1-6a36-d1a5f5907f7c@huawei.com
Link: https://lkml.kernel.org/r/20240827114728.3212578-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang 
Acked-by: David Hildenbrand 
Acked-by: Miaohe Lin 
Cc: Dan Carpenter 
Cc: Jonathan Cameron 
Cc: Naoya Horiguchi 
Cc: Oscar Salvador 
Signed-off-by: Andrew Morton

mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu

2024-08-16T05:16:14+00:00

The memory_failure_cpu structure is a per-cpu structure.  Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption.  The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.

Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.

  [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
  [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
  [12135.732252] preempt_count: 1, expected: 0
  [12135.732255] RCU nest depth: 2, expected: 2
    :
  [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
  [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
  [12135.732433] Call Trace:
  [12135.732436]  
  [12135.732450]  dump_stack_lvl+0x57/0x81
  [12135.732461]  __might_resched.cold+0xf4/0x12f
  [12135.732479]  rt_spin_lock+0x4c/0x100
  [12135.732491]  memory_failure_queue+0x40/0xe0
  [12135.732503]  ghes_do_memory_failure+0x53/0x390
  [12135.732516]  ghes_do_proc.constprop.0+0x229/0x3e0
  [12135.732575]  ghes_proc+0xf9/0x1a0
  [12135.732591]  ghes_notify_hed+0x6a/0x150
  [12135.732602]  notifier_call_chain+0x43/0xb0
  [12135.732626]  blocking_notifier_call_chain+0x43/0x60
  [12135.732637]  acpi_ev_notify_dispatch+0x47/0x70
  [12135.732648]  acpi_os_execute_deferred+0x13/0x20
  [12135.732654]  process_one_work+0x41f/0x500
  [12135.732695]  worker_thread+0x192/0x360
  [12135.732715]  kthread+0x111/0x140
  [12135.732733]  ret_from_fork+0x29/0x50
  [12135.732779]  

Fix it by using a raw_spinlock_t for locking instead.

Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.

[longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
  Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long 
Acked-by: Miaohe Lin 
Cc: "Huang, Ying" 
Cc: Juri Lelli 
Cc: Len Brown 
Cc: Naoya Horiguchi 
Cc: 
Signed-off-by: Andrew Morton

mm/memory-failure: remove obsolete MF_MSG_DIFFERENT_COMPOUND

2024-07-12T22:52:22+00:00

The page cannot become compound pages again just after a folio is split as
an extra refcnt is held.  So the MF_MSG_DIFFERENT_COMPOUND case is
obsolete and can be removed to get rid of this false assumption and code
burden.  But add one WARN_ON() here to keep the situation clear.

Link: https://lkml.kernel.org/r/20240708030544.196919-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin 
Cc: Borislav Petkov (AMD) 
Cc: Naoya Horiguchi 
Cc: Tony Luck 
Signed-off-by: Andrew Morton

mm: provide mm_struct and address to huge_ptep_get()

2024-07-12T22:52:15+00:00

On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is
a PTE entry or a PMD entry.  This cannot be known with the PMD entry
itself because there is no easy way to know it from the content of the
entry.

So huge_ptep_get() will need to know either the size of the page or get
the pmd.

In order to be consistent with huge_ptep_get_and_clear(), give mm and
address to huge_ptep_get().

Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy 
Reviewed-by: Oscar Salvador 
Cc: Jason Gunthorpe 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Peter Xu 
Signed-off-by: Andrew Morton

mm/memory-failure: userspace controls soft-offlining pages

2024-07-05T01:05:59+00:00

Correctable memory errors are very common on servers with large amount of
memory, and are corrected by ECC.  Soft offline is kernel's additional
recovery handling for memory pages having (excessive) corrected memory
errors.  Impacted page is migrated to a healthy page if inuse; the
original page is discarded for any future use.

The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of an 1G HugeTLB page. 
Soft-offline dissolves the HugeTLB page, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.  If
userspace has not acknowledged such behavior, it may be surprised when
later failed to mmap hugepages due to lack of hugepages.  In case of a
transparent hugepage, it will be split into 4K pages as well; userspace
will stop enjoying the transparent performance.

In addition, discarding the entire 1G HugeTLB page only because of
corrected memory errors sounds very costly and kernel better not doing
under the hood.  But today there are at least 2 such cases doing so:
1. when GHES driver sees both GHES_SEV_CORRECTED and
   CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
   PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.

This commit gives userspace the control of softofflining any page: kernel
only soft offlines raw page / transparent hugepage / HugeTLB hugepage if
userspace has agreed to.  The interface to userspace is a new sysctl at
/proc/sys/vm/enable_soft_offline.  By default its value is set to 1 to
preserve existing behavior in kernel.  When set to 0, soft-offline (e.g. 
MADV_SOFT_OFFLINE) will fail with EOPNOTSUPP.

[jiaqiyan@google.com: v7]
  Link: https://lkml.kernel.org/r/20240628205958.2845610-3-jiaqiyan@google.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-3-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan 
Acked-by: Miaohe Lin 
Acked-by: David Rientjes 
Cc: Frank van der Linden 
Cc: Jane Chu 
Cc: Jonathan Corbet 
Cc: Lance Yang 
Cc: Muchun Song 
Cc: Naoya Horiguchi 
Cc: Oscar Salvador 
Cc: Randy Dunlap 
Cc: Shuah Khan 
Signed-off-by: Andrew Morton