<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/Kconfig, branch v6.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: remove mm/io-mapping.c</title>
<updated>2025-08-02T19:06:10+00:00</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2025-07-25T14:29:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9a4f90e246615d1f42a9b907deb9b4c0a418d996'/>
<id>9a4f90e246615d1f42a9b907deb9b4c0a418d996</id>
<content type='text'>
This is dead code, which was used from commit b739f125e4eb ("i915: use
io_mapping_map_user") but reverted a month later by commit 0e4fe0c9f2f9
("Revert "i915: use io_mapping_map_user"") back in 2021.

Since then nobody has used it, so remove it.

[akpm@linux-foundation.org: update Documentation/core-api/mm-api.rst, per Vlastimil]
Link: https://lkml.kernel.org/r/20250725142901.81502-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is dead code, which was used from commit b739f125e4eb ("i915: use
io_mapping_map_user") but reverted a month later by commit 0e4fe0c9f2f9
("Revert "i915: use io_mapping_map_user"") back in 2021.

Since then nobody has used it, so remove it.

[akpm@linux-foundation.org: update Documentation/core-api/mm-api.rst, per Vlastimil]
Link: https://lkml.kernel.org/r/20250725142901.81502-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove devmap related functions and page table bits</title>
<updated>2025-07-10T05:42:18+00:00</updated>
<author>
<name>Alistair Popple</name>
<email>apopple@nvidia.com</email>
</author>
<published>2025-06-19T08:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d438d273417055241ebaaf1ba3be23459fc27cba'/>
<id>d438d273417055241ebaaf1ba3be23459fc27cba</id>
<content type='text'>
Now that DAX and all other reference counts to ZONE_DEVICE pages are
managed normally there is no need for the special devmap PTE/PMD/PUD page
table bits.  So drop all references to these, freeing up a software
defined page table bit on architectures supporting it.

Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple &lt;apopple@nvidia.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt; # arm64
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Suggested-by: Chunyan Zhang &lt;zhang.lyra@gmail.com&gt;
Reviewed-by: Björn Töpel &lt;bjorn@rivosinc.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Balbir Singh &lt;balbirs@nvidia.com&gt;
Cc: Björn Töpel &lt;bjorn@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Deepak Gupta &lt;debug@rivosinc.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Cc: Inki Dae &lt;m.szyprowski@samsung.com&gt;
Cc: John Groves &lt;john@groves.net&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that DAX and all other reference counts to ZONE_DEVICE pages are
managed normally there is no need for the special devmap PTE/PMD/PUD page
table bits.  So drop all references to these, freeing up a software
defined page table bit on architectures supporting it.

Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple &lt;apopple@nvidia.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt; # arm64
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Suggested-by: Chunyan Zhang &lt;zhang.lyra@gmail.com&gt;
Reviewed-by: Björn Töpel &lt;bjorn@rivosinc.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Balbir Singh &lt;balbirs@nvidia.com&gt;
Cc: Björn Töpel &lt;bjorn@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Deepak Gupta &lt;debug@rivosinc.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Cc: Inki Dae &lt;m.szyprowski@samsung.com&gt;
Cc: John Groves &lt;john@groves.net&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/percpu: conditionally define _shared_alloc_tag via CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU</title>
<updated>2025-07-10T05:42:15+00:00</updated>
<author>
<name>Hao Ge</name>
<email>gehao@kylinos.cn</email>
</author>
<published>2025-06-18T01:58:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=59b5ed409d03bc8b7bb153d78afcd7cea9d7bbfa'/>
<id>59b5ed409d03bc8b7bb153d78afcd7cea9d7bbfa</id>
<content type='text'>
Recently discovered this entry while checking kallsyms on ARM64:
ffff800083e509c0 D _shared_alloc_tag

If ARCH_NEEDS_WEAK_PER_CPU is not defined(it is only defined for s390 and
alpha architectures), there's no need to statically define the percpu
variable _shared_alloc_tag.

Therefore, we need to implement isolation for this purpose.

When building the core kernel code for s390 or alpha architectures,
ARCH_NEEDS_WEAK_PER_CPU remains undefined (as it is gated by #if
defined(MODULE)).  However, when building modules for these architectures,
the macro is explicitly defined.

Therefore, we remove all instances of ARCH_NEEDS_WEAK_PER_CPU from the
code and introduced CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU to replace the
relevant logic.  We can now conditionally define the perpcu variable
_shared_alloc_tag based on CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU.  This
allows architectures (such as s390/alpha) that require weak definitions
for percpu variables in modules to include the definition, while others
can omit it via compile-time exclusion.

Link: https://lkml.kernel.org/r/20250618015809.1235761-1-hao.ge@linux.dev
Signed-off-by: Hao Ge &lt;gehao@kylinos.cn&gt;
Suggested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;	[s390]
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: Chistoph Lameter &lt;cl@linux.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recently discovered this entry while checking kallsyms on ARM64:
ffff800083e509c0 D _shared_alloc_tag

If ARCH_NEEDS_WEAK_PER_CPU is not defined(it is only defined for s390 and
alpha architectures), there's no need to statically define the percpu
variable _shared_alloc_tag.

Therefore, we need to implement isolation for this purpose.

When building the core kernel code for s390 or alpha architectures,
ARCH_NEEDS_WEAK_PER_CPU remains undefined (as it is gated by #if
defined(MODULE)).  However, when building modules for these architectures,
the macro is explicitly defined.

Therefore, we remove all instances of ARCH_NEEDS_WEAK_PER_CPU from the
code and introduced CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU to replace the
relevant logic.  We can now conditionally define the perpcu variable
_shared_alloc_tag based on CONFIG_ARCH_MODULE_NEEDS_WEAK_PER_CPU.  This
allows architectures (such as s390/alpha) that require weak definitions
for percpu variables in modules to include the definition, while others
can omit it via compile-time exclusion.

Link: https://lkml.kernel.org/r/20250618015809.1235761-1-hao.ge@linux.dev
Signed-off-by: Hao Ge &lt;gehao@kylinos.cn&gt;
Suggested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;	[s390]
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: Chistoph Lameter &lt;cl@linux.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Richard Henderson &lt;richard.henderson@linaro.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: rename CONFIG_PAGE_BLOCK_ORDER to CONFIG_PAGE_BLOCK_MAX_ORDER</title>
<updated>2025-07-10T05:41:56+00:00</updated>
<author>
<name>Zi Yan</name>
<email>ziy@nvidia.com</email>
</author>
<published>2025-06-04T21:14:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3800d55250976b7a4bd42c255b267bc242669709'/>
<id>3800d55250976b7a4bd42c255b267bc242669709</id>
<content type='text'>
The config is in fact an additional upper limit of pageblock_order, so
rename it to avoid confusion.

Link: https://lkml.kernel.org/r/20250604211427.1590859-1-ziy@nvidia.com
Signed-off-by: Zi Yan &lt;ziy@nvidia.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Juan Yescas &lt;jyescas@google.com&gt;
Reviewed-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: "Isaac J. Manjarres" &lt;isaacmanjarres@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: T.J. Mercier &lt;tjmercier@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The config is in fact an additional upper limit of pageblock_order, so
rename it to avoid confusion.

Link: https://lkml.kernel.org/r/20250604211427.1590859-1-ziy@nvidia.com
Signed-off-by: Zi Yan &lt;ziy@nvidia.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Juan Yescas &lt;jyescas@google.com&gt;
Reviewed-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: "Isaac J. Manjarres" &lt;isaacmanjarres@google.com&gt;
Cc: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: T.J. Mercier &lt;tjmercier@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: Kconfig: use verb *use* in plural form in description</title>
<updated>2025-07-10T05:41:54+00:00</updated>
<author>
<name>Paul Menzel</name>
<email>pmenzel@molgen.mpg.de</email>
</author>
<published>2025-06-03T06:13:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bafa31a1ceab70c70df40ff09b88f359cd0117c4'/>
<id>bafa31a1ceab70c70df40ff09b88f359cd0117c4</id>
<content type='text'>
*workloads* is plural requiring the verb *use* in plural form.

Link: https://lkml.kernel.org/r/20250603061303.479551-2-pmenzel@molgen.mpg.de
Fixes: e13e7922d034 ("mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order")
Signed-off-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
*workloads* is plural requiring the verb *use* in plural form.

Link: https://lkml.kernel.org/r/20250603061303.479551-2-pmenzel@molgen.mpg.de
Fixes: e13e7922d034 ("mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order")
Signed-off-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reviewed-by: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson</title>
<updated>2025-06-07T16:56:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-06-07T16:56:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b7191581a973ab2fca45d2ca64416065f1660ae0'/>
<id>b7191581a973ab2fca45d2ca64416065f1660ae0</id>
<content type='text'>
Pull LoongArch updates from Huacai Chen:

 - Adjust the 'make install' operation

 - Support SCHED_MC (Multi-core scheduler)

 - Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS

 - Enable HAVE_ARCH_STACKLEAK

 - Increase max supported CPUs up to 2048

 - Introduce the numa_memblks conversion

 - Add PWM controller nodes in dts

 - Some bug fixes and other small changes

* tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  platform/loongarch: laptop: Unregister generic_sub_drivers on exit
  platform/loongarch: laptop: Add backlight power control support
  platform/loongarch: laptop: Get brightness setting from EC on probe
  LoongArch: dts: Add PWM support to Loongson-2K2000
  LoongArch: dts: Add PWM support to Loongson-2K1000
  LoongArch: dts: Add PWM support to Loongson-2K0500
  LoongArch: vDSO: Correctly use asm parameters in syscall wrappers
  LoongArch: Fix panic caused by NULL-PMD in huge_pte_offset()
  LoongArch: Preserve firmware configuration when desired
  LoongArch: Avoid using $r0/$r1 as "mask" for csrxchg
  LoongArch: Introduce the numa_memblks conversion
  LoongArch: Increase max supported CPUs up to 2048
  LoongArch: Enable HAVE_ARCH_STACKLEAK
  LoongArch: Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
  LoongArch: Add SCHED_MC (Multi-core scheduler) support
  LoongArch: Add some annotations in archhelp
  LoongArch: Using generic scripts/install.sh in `make install`
  LoongArch: Add a default install.sh
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull LoongArch updates from Huacai Chen:

 - Adjust the 'make install' operation

 - Support SCHED_MC (Multi-core scheduler)

 - Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS

 - Enable HAVE_ARCH_STACKLEAK

 - Increase max supported CPUs up to 2048

 - Introduce the numa_memblks conversion

 - Add PWM controller nodes in dts

 - Some bug fixes and other small changes

* tag 'loongarch-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  platform/loongarch: laptop: Unregister generic_sub_drivers on exit
  platform/loongarch: laptop: Add backlight power control support
  platform/loongarch: laptop: Get brightness setting from EC on probe
  LoongArch: dts: Add PWM support to Loongson-2K2000
  LoongArch: dts: Add PWM support to Loongson-2K1000
  LoongArch: dts: Add PWM support to Loongson-2K0500
  LoongArch: vDSO: Correctly use asm parameters in syscall wrappers
  LoongArch: Fix panic caused by NULL-PMD in huge_pte_offset()
  LoongArch: Preserve firmware configuration when desired
  LoongArch: Avoid using $r0/$r1 as "mask" for csrxchg
  LoongArch: Introduce the numa_memblks conversion
  LoongArch: Increase max supported CPUs up to 2048
  LoongArch: Enable HAVE_ARCH_STACKLEAK
  LoongArch: Enable ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
  LoongArch: Add SCHED_MC (Multi-core scheduler) support
  LoongArch: Add some annotations in archhelp
  LoongArch: Using generic scripts/install.sh in `make install`
  LoongArch: Add a default install.sh
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order</title>
<updated>2025-06-01T05:46:13+00:00</updated>
<author>
<name>Juan Yescas</name>
<email>jyescas@google.com</email>
</author>
<published>2025-05-21T21:57:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e13e7922d03439e374c263049af5f740ceae6346'/>
<id>e13e7922d03439e374c263049af5f740ceae6346</id>
<content type='text'>
Problem: On large page size configurations (16KiB, 64KiB), the CMA
alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
and this causes the CMA reservations to be larger than necessary.  This
means that system will have less available MIGRATE_UNMOVABLE and
MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.

The CMA_MIN_ALIGNMENT_BYTES increases because it depends on MAX_PAGE_ORDER
which depends on ARCH_FORCE_MAX_ORDER.  The value of ARCH_FORCE_MAX_ORDER
increases on 16k and 64k kernels.

For example, in ARM, the CMA alignment requirement when:

- CONFIG_ARCH_FORCE_MAX_ORDER default value is used
- CONFIG_TRANSPARENT_HUGEPAGE is set:

PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
-----------------------------------------------------------------------
   4KiB   |      10        |       9         |  4KiB * (2 ^  9) =   2MiB
  16Kib   |      11        |      11         | 16KiB * (2 ^ 11) =  32MiB
  64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB

There are some extreme cases for the CMA alignment requirement when:

- CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
- CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
- CONFIG_HUGETLB_PAGE is NOT set

PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order |  CMA_MIN_ALIGNMENT_BYTES
------------------------------------------------------------------------
   4KiB   |      15        |      15         |  4KiB * (2 ^ 15) = 128MiB
  16Kib   |      13        |      13         | 16KiB * (2 ^ 13) = 128MiB
  64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB

This affects the CMA reservations for the drivers. If a driver in a
4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
reservation has to be 32MiB due to the alignment requirements:

reserved-memory {
    ...
    cma_test_reserve: cma_test_reserve {
        compatible = "shared-dma-pool";
        size = &lt;0x0 0x400000&gt;; /* 4 MiB */
        ...
    };
};

reserved-memory {
    ...
    cma_test_reserve: cma_test_reserve {
        compatible = "shared-dma-pool";
        size = &lt;0x0 0x2000000&gt;; /* 32 MiB */
        ...
    };
};

Solution: Add a new config CONFIG_PAGE_BLOCK_ORDER that allows to set the
page block order in all the architectures.  The maximum page block order
will be given by ARCH_FORCE_MAX_ORDER.

By default, CONFIG_PAGE_BLOCK_ORDER will have the same value that
ARCH_FORCE_MAX_ORDER.  This will make sure that current kernel
configurations won't be affected by this change.  It is a opt-in change.

This patch will allow to have the same CMA alignment requirements for
large page sizes (16KiB, 64KiB) as that in 4kb kernels by setting a lower
pageblock_order.

Tests:

- Verified that HugeTLB pages work when pageblock_order is 1, 7, 10 on
  4k and 16k kernels.

- Verified that Transparent Huge Pages work when pageblock_order is 1,
  7, 10 on 4k and 16k kernels.

- Verified that dma-buf heaps allocations work when pageblock_order is
  1, 7, 10 on 4k and 16k kernels.

Benchmarks:

The benchmarks compare 16kb kernels with pageblock_order 10 and 7.  The
reason for the pageblock_order 7 is because this value makes the min CMA
alignment requirement the same as that in 4kb kernels (2MB).

- Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
  SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4.  Use simpleperf
  (https://developer.android.com/ndk/guides/simpleperf) to measure the #
  of instructions and page-faults on 16k kernels.  The benchmark was
  executed 10 times.  The averages are below:

           # instructions         |     #page-faults
    order 10     |  order 7       | order 10 | order 7
--------------------------------------------------------
 13,891,765,770	 | 11,425,777,314 |    220   |   217
 14,456,293,487	 | 12,660,819,302 |    224   |   219
 13,924,261,018	 | 13,243,970,736 |    217   |   221
 13,910,886,504	 | 13,845,519,630 |    217   |   221
 14,388,071,190	 | 13,498,583,098 |    223   |   224
 13,656,442,167	 | 12,915,831,681 |    216   |   218
 13,300,268,343	 | 12,930,484,776 |    222   |   218
 13,625,470,223	 | 14,234,092,777 |    219   |   218
 13,508,964,965	 | 13,432,689,094 |    225   |   219
 13,368,950,667	 | 13,683,587,37  |    219   |   225
-------------------------------------------------------------------
 13,803,137,433  | 13,131,974,268 |    220   |   220    Averages

There were 4.85% #instructions when order was 7, in comparison with order
10.

     13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)

The number of page faults in order 7 and 10 were the same.

These results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.

- Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
  on the 16k kernels with pageblock_order 7 and 10.

order 10 | order 7  | order 7 - order 10 | (order 7 - order 10) %
-------------------------------------------------------------------
  15.8	 |  16.4    |         0.6        |     3.80%
  16.4	 |  16.2    |        -0.2        |    -1.22%
  16.6	 |  16.3    |        -0.3        |    -1.81%
  16.8	 |  16.3    |        -0.5        |    -2.98%
  16.6	 |  16.8    |         0.2        |     1.20%
-------------------------------------------------------------------
  16.44     16.4            -0.04	          -0.24%   Averages

The results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.

Link: https://lkml.kernel.org/r/20250521215807.1860663-1-jyescas@google.com
Signed-off-by: Juan Yescas &lt;jyescas@google.com&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: On large page size configurations (16KiB, 64KiB), the CMA
alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
and this causes the CMA reservations to be larger than necessary.  This
means that system will have less available MIGRATE_UNMOVABLE and
MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.

The CMA_MIN_ALIGNMENT_BYTES increases because it depends on MAX_PAGE_ORDER
which depends on ARCH_FORCE_MAX_ORDER.  The value of ARCH_FORCE_MAX_ORDER
increases on 16k and 64k kernels.

For example, in ARM, the CMA alignment requirement when:

- CONFIG_ARCH_FORCE_MAX_ORDER default value is used
- CONFIG_TRANSPARENT_HUGEPAGE is set:

PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
-----------------------------------------------------------------------
   4KiB   |      10        |       9         |  4KiB * (2 ^  9) =   2MiB
  16Kib   |      11        |      11         | 16KiB * (2 ^ 11) =  32MiB
  64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB

There are some extreme cases for the CMA alignment requirement when:

- CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
- CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
- CONFIG_HUGETLB_PAGE is NOT set

PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order |  CMA_MIN_ALIGNMENT_BYTES
------------------------------------------------------------------------
   4KiB   |      15        |      15         |  4KiB * (2 ^ 15) = 128MiB
  16Kib   |      13        |      13         | 16KiB * (2 ^ 13) = 128MiB
  64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB

This affects the CMA reservations for the drivers. If a driver in a
4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
reservation has to be 32MiB due to the alignment requirements:

reserved-memory {
    ...
    cma_test_reserve: cma_test_reserve {
        compatible = "shared-dma-pool";
        size = &lt;0x0 0x400000&gt;; /* 4 MiB */
        ...
    };
};

reserved-memory {
    ...
    cma_test_reserve: cma_test_reserve {
        compatible = "shared-dma-pool";
        size = &lt;0x0 0x2000000&gt;; /* 32 MiB */
        ...
    };
};

Solution: Add a new config CONFIG_PAGE_BLOCK_ORDER that allows to set the
page block order in all the architectures.  The maximum page block order
will be given by ARCH_FORCE_MAX_ORDER.

By default, CONFIG_PAGE_BLOCK_ORDER will have the same value that
ARCH_FORCE_MAX_ORDER.  This will make sure that current kernel
configurations won't be affected by this change.  It is a opt-in change.

This patch will allow to have the same CMA alignment requirements for
large page sizes (16KiB, 64KiB) as that in 4kb kernels by setting a lower
pageblock_order.

Tests:

- Verified that HugeTLB pages work when pageblock_order is 1, 7, 10 on
  4k and 16k kernels.

- Verified that Transparent Huge Pages work when pageblock_order is 1,
  7, 10 on 4k and 16k kernels.

- Verified that dma-buf heaps allocations work when pageblock_order is
  1, 7, 10 on 4k and 16k kernels.

Benchmarks:

The benchmarks compare 16kb kernels with pageblock_order 10 and 7.  The
reason for the pageblock_order 7 is because this value makes the min CMA
alignment requirement the same as that in 4kb kernels (2MB).

- Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
  SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4.  Use simpleperf
  (https://developer.android.com/ndk/guides/simpleperf) to measure the #
  of instructions and page-faults on 16k kernels.  The benchmark was
  executed 10 times.  The averages are below:

           # instructions         |     #page-faults
    order 10     |  order 7       | order 10 | order 7
--------------------------------------------------------
 13,891,765,770	 | 11,425,777,314 |    220   |   217
 14,456,293,487	 | 12,660,819,302 |    224   |   219
 13,924,261,018	 | 13,243,970,736 |    217   |   221
 13,910,886,504	 | 13,845,519,630 |    217   |   221
 14,388,071,190	 | 13,498,583,098 |    223   |   224
 13,656,442,167	 | 12,915,831,681 |    216   |   218
 13,300,268,343	 | 12,930,484,776 |    222   |   218
 13,625,470,223	 | 14,234,092,777 |    219   |   218
 13,508,964,965	 | 13,432,689,094 |    225   |   219
 13,368,950,667	 | 13,683,587,37  |    219   |   225
-------------------------------------------------------------------
 13,803,137,433  | 13,131,974,268 |    220   |   220    Averages

There were 4.85% #instructions when order was 7, in comparison with order
10.

     13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)

The number of page faults in order 7 and 10 were the same.

These results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.

- Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
  on the 16k kernels with pageblock_order 7 and 10.

order 10 | order 7  | order 7 - order 10 | (order 7 - order 10) %
-------------------------------------------------------------------
  15.8	 |  16.4    |         0.6        |     3.80%
  16.4	 |  16.2    |        -0.2        |    -1.22%
  16.6	 |  16.3    |        -0.3        |    -1.81%
  16.8	 |  16.3    |        -0.5        |    -2.98%
  16.6	 |  16.8    |         0.2        |     1.20%
-------------------------------------------------------------------
  16.44     16.4            -0.04	          -0.24%   Averages

The results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.

Link: https://lkml.kernel.org/r/20250521215807.1860663-1-jyescas@google.com
Signed-off-by: Juan Yescas &lt;jyescas@google.com&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>LoongArch: Introduce the numa_memblks conversion</title>
<updated>2025-05-30T13:45:43+00:00</updated>
<author>
<name>Huacai Chen</name>
<email>chenhuacai@loongson.cn</email>
</author>
<published>2025-05-30T13:45:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a24f2fb70cb62180486ad4d74f809ff35ddd1cf9'/>
<id>a24f2fb70cb62180486ad4d74f809ff35ddd1cf9</id>
<content type='text'>
Commit 87482708210ff3333a ("mm: introduce numa_memblks") has moved
numa_memblks from x86 to the generic code, but LoongArch was left out
of this conversion.

This patch introduces the generic numa_memblks for LoongArch.

In detail:
1. Enable NUMA_MEMBLKS (but disable NUMA_EMU) in Kconfig;
2. Use generic definition for numa_memblk and numa_meminfo;
3. Use generic implementation for numa_add_memblk() and its friends;
4. Use generic implementation for numa_set_distance() and its friends;
5. Use generic implementation for memory_add_physaddr_to_nid() and its
   friends.

Note: Disable NUMA_EMU because it needs more efforts and no obvious
demand now.

Tested-by: Binbin Zhou &lt;zhoubinbin@loongson.cn&gt;
Signed-off-by: Yuquan Wang &lt;wangyuquan1236@phytium.com.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 87482708210ff3333a ("mm: introduce numa_memblks") has moved
numa_memblks from x86 to the generic code, but LoongArch was left out
of this conversion.

This patch introduces the generic numa_memblks for LoongArch.

In detail:
1. Enable NUMA_MEMBLKS (but disable NUMA_EMU) in Kconfig;
2. Use generic definition for numa_memblk and numa_meminfo;
3. Use generic implementation for numa_add_memblk() and its friends;
4. Use generic implementation for numa_set_distance() and its friends;
5. Use generic implementation for memory_add_physaddr_to_nid() and its
   friends.

Note: Disable NUMA_EMU because it needs more efforts and no obvious
demand now.

Tested-by: Binbin Zhou &lt;zhoubinbin@loongson.cn&gt;
Signed-off-by: Yuquan Wang &lt;wangyuquan1236@phytium.com.cn&gt;
Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: khugepaged: decouple SHMEM and file folios' collapse</title>
<updated>2025-05-22T21:55:38+00:00</updated>
<author>
<name>Baolin Wang</name>
<email>baolin.wang@linux.alibaba.com</email>
</author>
<published>2025-05-13T06:56:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cc79061b8fc119807111b615aa791562374b15b2'/>
<id>cc79061b8fc119807111b615aa791562374b15b2</id>
<content type='text'>
Originally, the file pages collapse was intended for tmpfs/shmem to merge
into THP in the background.  However, now not only tmpfs/shmem can support
large folios, but some other file systems (such as XFS, erofs ...) also
support large folios.  Therefore, it is time to decouple the support of
file folios collapse from SHMEM.

Link: https://lkml.kernel.org/r/ce5c2314e0368cf34bda26f9bacf01c982d4da17.1747119309.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Mariano Pache &lt;npache@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Originally, the file pages collapse was intended for tmpfs/shmem to merge
into THP in the background.  However, now not only tmpfs/shmem can support
large folios, but some other file systems (such as XFS, erofs ...) also
support large folios.  Therefore, it is time to decouple the support of
file folios collapse from SHMEM.

Link: https://lkml.kernel.org/r/ce5c2314e0368cf34bda26f9bacf01c982d4da17.1747119309.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Mariano Pache &lt;npache@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memblock: add support for scratch memory</title>
<updated>2025-05-13T06:50:39+00:00</updated>
<author>
<name>Alexander Graf</name>
<email>graf@amazon.com</email>
</author>
<published>2025-05-09T07:46:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d59f43b5748092557d34244e29a618221a250501'/>
<id>d59f43b5748092557d34244e29a618221a250501</id>
<content type='text'>
With KHO (Kexec HandOver), we need a way to ensure that the new kernel
does not allocate memory on top of any memory regions that the previous
kernel was handing over.  But to know where those are, we need to include
them in the memblock.reserved array which may not be big enough to hold
all ranges that need to be persisted across kexec.  To resize the array,
we need to allocate memory.  That brings us into a catch 22 situation.

The solution to that is limit memblock allocations to the scratch regions:
safe regions to operate in the case when there is memory that should
remain intact across kexec.

KHO provides several "scratch regions" as part of its metadata.  These
scratch regions are contiguous memory blocks that known not to contain any
memory that should be persisted across kexec.  These regions should be
large enough to accommodate all memblock allocations done by the kexeced
kernel.

We introduce a new memblock_set_scratch_only() function that allows KHO to
indicate that any memblock allocation must happen from the scratch
regions.

Later, we may want to perform another KHO kexec.  For that, we reuse the
same scratch regions.  To ensure that no eventually handed over data gets
allocated inside a scratch region, we flip the semantics of the scratch
region with memblock_clear_scratch_only(): After that call, no allocations
may happen from scratch memblock regions.  We will lift that restriction
in the next patch.

Link: https://lkml.kernel.org/r/20250509074635.3187114-3-changyuanl@google.com
Signed-off-by: Alexander Graf &lt;graf@amazon.com&gt;
Co-developed-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Signed-off-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Signed-off-by: Changyuan Lyu &lt;changyuanl@google.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Ashish Kalra &lt;ashish.kalra@amd.com&gt;
Cc: Ben Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Betkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Gowans &lt;jgowans@amazon.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Krzysztof Kozlowski &lt;krzk@kernel.org&gt;
Cc: Marc Rutland &lt;mark.rutland@arm.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Pratyush Yadav &lt;ptyadav@amazon.de&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Saravana Kannan &lt;saravanak@google.com&gt;
Cc: Stanislav Kinsburskii &lt;skinsburskii@linux.microsoft.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: Thomas Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With KHO (Kexec HandOver), we need a way to ensure that the new kernel
does not allocate memory on top of any memory regions that the previous
kernel was handing over.  But to know where those are, we need to include
them in the memblock.reserved array which may not be big enough to hold
all ranges that need to be persisted across kexec.  To resize the array,
we need to allocate memory.  That brings us into a catch 22 situation.

The solution to that is limit memblock allocations to the scratch regions:
safe regions to operate in the case when there is memory that should
remain intact across kexec.

KHO provides several "scratch regions" as part of its metadata.  These
scratch regions are contiguous memory blocks that known not to contain any
memory that should be persisted across kexec.  These regions should be
large enough to accommodate all memblock allocations done by the kexeced
kernel.

We introduce a new memblock_set_scratch_only() function that allows KHO to
indicate that any memblock allocation must happen from the scratch
regions.

Later, we may want to perform another KHO kexec.  For that, we reuse the
same scratch regions.  To ensure that no eventually handed over data gets
allocated inside a scratch region, we flip the semantics of the scratch
region with memblock_clear_scratch_only(): After that call, no allocations
may happen from scratch memblock regions.  We will lift that restriction
in the next patch.

Link: https://lkml.kernel.org/r/20250509074635.3187114-3-changyuanl@google.com
Signed-off-by: Alexander Graf &lt;graf@amazon.com&gt;
Co-developed-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Signed-off-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Signed-off-by: Changyuan Lyu &lt;changyuanl@google.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anthony Yznaga &lt;anthony.yznaga@oracle.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Ashish Kalra &lt;ashish.kalra@amd.com&gt;
Cc: Ben Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Betkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Gowans &lt;jgowans@amazon.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Krzysztof Kozlowski &lt;krzk@kernel.org&gt;
Cc: Marc Rutland &lt;mark.rutland@arm.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Pratyush Yadav &lt;ptyadav@amazon.de&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Saravana Kannan &lt;saravanak@google.com&gt;
Cc: Stanislav Kinsburskii &lt;skinsburskii@linux.microsoft.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: Thomas Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
