<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/Makefile, branch v6.5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: page_alloc: split out DEBUG_PAGEALLOC</title>
<updated>2023-06-09T23:25:23+00:00</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2023-05-16T06:38:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=884c175f12ce1fabff18ac113349628149fc6cf2'/>
<id>884c175f12ce1fabff18ac113349628149fc6cf2</id>
<content type='text'>
Move DEBUG_PAGEALLOC related functions into a single file to reduce a bit
of page_alloc.c.

Link: https://lkml.kernel.org/r/20230516063821.121844-9-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move DEBUG_PAGEALLOC related functions into a single file to reduce a bit
of page_alloc.c.

Link: https://lkml.kernel.org/r/20230516063821.121844-9-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: page_alloc: split out FAIL_PAGE_ALLOC</title>
<updated>2023-06-09T23:25:23+00:00</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2023-05-16T06:38:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0866e82e40fba45dae07e6e8385929b574201752'/>
<id>0866e82e40fba45dae07e6e8385929b574201752</id>
<content type='text'>
... to a single file to reduce a bit of page_alloc.c.

Link: https://lkml.kernel.org/r/20230516063821.121844-8-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
... to a single file to reduce a bit of page_alloc.c.

Link: https://lkml.kernel.org/r/20230516063821.121844-8-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: page_alloc: collect mem statistic into show_mem.c</title>
<updated>2023-06-09T23:25:22+00:00</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2023-05-16T06:38:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e9aae1709264b2353d67d4846bd617c445a49fe0'/>
<id>e9aae1709264b2353d67d4846bd617c445a49fe0</id>
<content type='text'>
Let's move show_mem.c from lib to mm, as it belongs memory subsystem, also
split some memory statistic related functions from page_alloc.c to
show_mem.c, and we cleanup some unneeded include.

There is no functional change.

Link: https://lkml.kernel.org/r/20230516063821.121844-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Let's move show_mem.c from lib to mm, as it belongs memory subsystem, also
split some memory statistic related functions from page_alloc.c to
show_mem.c, and we cleanup some unneeded include.

There is no functional change.

Link: https://lkml.kernel.org/r/20230516063821.121844-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Iurii Zaikin &lt;yzaikin@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2023-04-28T02:42:02+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-04-28T02:42:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7fa8a8ee9400fe8ec188426e40e481717bc5e924'/>
<id>7fa8a8ee9400fe8ec188426e40e481717bc5e924</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in -&gt;map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in -&gt;map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>dmapool: add alloc/free performance test</title>
<updated>2023-04-06T02:42:38+00:00</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2023-01-26T21:51:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=def8574308edbc3bca821fb965e429a2fe5f4971'/>
<id>def8574308edbc3bca821fb965e429a2fe5f4971</id>
<content type='text'>
Patch series "dmapool enhancements", v4.

Time spent in dma_pool alloc/free increases linearly with the number of
pages backing the pool.  We can reduce this to constant time with minor
changes to how free pages are tracked.


This patch (of 12):

Provide a module that allocates and frees many blocks of various sizes and
report how long it takes.  This is intended to provide a consistent way to
measure how changes to the dma_pool_alloc/free routines affect timing.

Link: https://lkml.kernel.org/r/20230126215125.4069751-1-kbusch@meta.com
Link: https://lkml.kernel.org/r/20230126215125.4069751-2-kbusch@meta.com
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Tony Battersby &lt;tonyb@cybernetics.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "dmapool enhancements", v4.

Time spent in dma_pool alloc/free increases linearly with the number of
pages backing the pool.  We can reduce this to constant time with minor
changes to how free pages are tracked.


This patch (of 12):

Provide a module that allocates and frees many blocks of various sizes and
report how long it takes.  This is intended to provide a consistent way to
measure how changes to the dma_pool_alloc/free routines affect timing.

Link: https://lkml.kernel.org/r/20230126215125.4069751-1-kbusch@meta.com
Link: https://lkml.kernel.org/r/20230126215125.4069751-2-kbusch@meta.com
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Tony Battersby &lt;tonyb@cybernetics.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/slob: remove CONFIG_SLOB</title>
<updated>2023-03-29T08:31:40+00:00</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2023-02-27T16:46:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c9929f0e344a28b3a57b3b84fc237a7dd4134ef8'/>
<id>c9929f0e344a28b3a57b3b84fc237a7dd4134ef8</id>
<content type='text'>
Remove SLOB from Kconfig and Makefile. Everything under #ifdef
CONFIG_SLOB, and mm/slob.c is now dead code.

Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Acked-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove SLOB from Kconfig and Makefile. Everything under #ifdef
CONFIG_SLOB, and mm/slob.c is now dead code.

Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Acked-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Acked-by: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol</title>
<updated>2022-10-03T21:03:36+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2022-09-26T13:57:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e55b9f96860f6c6026cff97966a740576285e07b'/>
<id>e55b9f96860f6c6026cff97966a740576285e07b</id>
<content type='text'>
Since 2d1c498072de ("mm: memcontrol: make swap tracking an integral part
of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config
option anymore, it just means CONFIG_MEMCG &amp;&amp; CONFIG_SWAP.

Update the sites accordingly and drop the symbol.

[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
  which hasn't been a user-visible symbol for over half a decade. ]

Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since 2d1c498072de ("mm: memcontrol: make swap tracking an integral part
of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config
option anymore, it just means CONFIG_MEMCG &amp;&amp; CONFIG_SWAP.

Update the sites accordingly and drop the symbol.

[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
  which hasn't been a user-visible symbol for over half a decade. ]

Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kmsan: add KMSAN runtime core</title>
<updated>2022-10-03T21:03:19+00:00</updated>
<author>
<name>Alexander Potapenko</name>
<email>glider@google.com</email>
</author>
<published>2022-09-15T15:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f80be4571b19b9fd8dd1528cd2a2f123aff51f70'/>
<id>f80be4571b19b9fd8dd1528cd2a2f123aff51f70</id>
<content type='text'>
For each memory location KernelMemorySanitizer maintains two types of
metadata:

1. The so-called shadow of that location - а byte:byte mapping describing
   whether or not individual bits of memory are initialized (shadow is 0)
   or not (shadow is 1).
2. The origins of that location - а 4-byte:4-byte mapping containing
   4-byte IDs of the stack traces where uninitialized values were
   created.

Each struct page now contains pointers to two struct pages holding KMSAN
metadata (shadow and origins) for the original struct page.  Utility
routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the metadata
creation, addressing, copying and checking.  mm/kmsan/report.c performs
error reporting in the cases an uninitialized value is used in a way that
leads to undefined behavior.

KMSAN compiler instrumentation is responsible for tracking the metadata
along with the kernel memory.  mm/kmsan/instrumentation.c provides the
implementation for instrumentation hooks that are called from files
compiled with -fsanitize=kernel-memory.

To aid parameter passing (also done at instrumentation level), each
task_struct now contains a struct kmsan_task_state used to track the
metadata of function parameters and return values for that task.

Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and declares
CFLAGS_KMSAN, which are applied to files compiled with KMSAN.  The
KMSAN_SANITIZE:=n Makefile directive can be used to completely disable
KMSAN instrumentation for certain files.

Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
created stack memory initialized.

Users can also use functions from include/linux/kmsan-checks.h to mark
certain memory regions as uninitialized or initialized (this is called
"poisoning" and "unpoisoning") or check that a particular region is
initialized.

Link: https://lkml.kernel.org/r/20220915150417.722975-12-glider@google.com
Signed-off-by: Alexander Potapenko &lt;glider@google.com&gt;
Acked-by: Marco Elver &lt;elver@google.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Eric Biggers &lt;ebiggers@google.com&gt;
Cc: Eric Biggers &lt;ebiggers@kernel.org&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For each memory location KernelMemorySanitizer maintains two types of
metadata:

1. The so-called shadow of that location - а byte:byte mapping describing
   whether or not individual bits of memory are initialized (shadow is 0)
   or not (shadow is 1).
2. The origins of that location - а 4-byte:4-byte mapping containing
   4-byte IDs of the stack traces where uninitialized values were
   created.

Each struct page now contains pointers to two struct pages holding KMSAN
metadata (shadow and origins) for the original struct page.  Utility
routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the metadata
creation, addressing, copying and checking.  mm/kmsan/report.c performs
error reporting in the cases an uninitialized value is used in a way that
leads to undefined behavior.

KMSAN compiler instrumentation is responsible for tracking the metadata
along with the kernel memory.  mm/kmsan/instrumentation.c provides the
implementation for instrumentation hooks that are called from files
compiled with -fsanitize=kernel-memory.

To aid parameter passing (also done at instrumentation level), each
task_struct now contains a struct kmsan_task_state used to track the
metadata of function parameters and return values for that task.

Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and declares
CFLAGS_KMSAN, which are applied to files compiled with KMSAN.  The
KMSAN_SANITIZE:=n Makefile directive can be used to completely disable
KMSAN instrumentation for certain files.

Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
created stack memory initialized.

Users can also use functions from include/linux/kmsan-checks.h to mark
certain memory regions as uninitialized or initialized (this is called
"poisoning" and "unpoisoning") or check that a particular region is
initialized.

Link: https://lkml.kernel.org/r/20220915150417.722975-12-glider@google.com
Signed-off-by: Alexander Potapenko &lt;glider@google.com&gt;
Acked-by: Marco Elver &lt;elver@google.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Eric Biggers &lt;ebiggers@google.com&gt;
Cc: Eric Biggers &lt;ebiggers@kernel.org&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove vmacache</title>
<updated>2022-09-27T02:46:18+00:00</updated>
<author>
<name>Liam R. Howlett</name>
<email>Liam.Howlett@Oracle.com</email>
</author>
<published>2022-09-06T19:48:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7964cf8caa4dfa42c4149f3833d3878713cda3dc'/>
<id>7964cf8caa4dfa42c4149f3833d3878713cda3dc</id>
<content type='text'>
By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code.  Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Tested-by: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code.  Remove the vmacache
to reduce the work in keeping it up to date and code complexity.

Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Tested-by: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/demotion: add support for explicit memory tiers</title>
<updated>2022-09-27T02:46:11+00:00</updated>
<author>
<name>Aneesh Kumar K.V</name>
<email>aneesh.kumar@linux.ibm.com</email>
</author>
<published>2022-08-18T13:10:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=992bf77591cb7e696fcc59aa7e64d1200b673513'/>
<id>992bf77591cb7e696fcc59aa7e64d1200b673513</id>
<content type='text'>
Patch series "mm/demotion: Memory tiers and demotion", v15.

The current kernel has the basic memory tiering support: Inactive pages on
a higher tier NUMA node can be migrated (demoted) to a lower tier NUMA
node to make room for new allocations on the higher tier NUMA node. 
Frequently accessed pages on a lower tier NUMA node can be migrated
(promoted) to a higher tier NUMA node to improve the performance.

In the current kernel, memory tiers are defined implicitly via a demotion
path relationship between NUMA nodes, which is created during the kernel
initialization and updated when a NUMA node is hot-added or hot-removed. 
The current implementation puts all nodes with CPU into the highest tier,
and builds the tier hierarchy tier-by-tier by establishing the per-node
demotion targets based on the distances between nodes.

This current memory tier kernel implementation needs to be improved for
several important use cases:

* The current tier initialization code always initializes each
  memory-only NUMA node into a lower tier.  But a memory-only NUMA node
  may have a high performance memory device (e.g.  a DRAM-backed
  memory-only node on a virtual machine) and that should be put into a
  higher tier.

* The current tier hierarchy always puts CPU nodes into the top tier. 
  But on a system with HBM (e.g.  GPU memory) devices, these memory-only
  HBM NUMA nodes should be in the top tier, and DRAM nodes with CPUs are
  better to be placed into the next lower tier.

* Also because the current tier hierarchy always puts CPU nodes into the
  top tier, when a CPU is hot-added (or hot-removed) and triggers a memory
  node from CPU-less into a CPU node (or vice versa), the memory tier
  hierarchy gets changed, even though no memory node is added or removed. 
  This can make the tier hierarchy unstable and make it difficult to
  support tier-based memory accounting.

* A higher tier node can only be demoted to nodes with shortest distance
  on the next lower tier as defined by the demotion path, not any other
  node from any lower tier.  This strict, demotion order does not work in
  all use cases (e.g.  some use cases may want to allow cross-socket
  demotion to another node in the same demotion tier as a fallback when
  the preferred demotion node is out of space), and has resulted in the
  feature request for an interface to override the system-wide, per-node
  demotion order from the userspace.  This demotion order is also
  inconsistent with the page allocation fallback order when all the nodes
  in a higher tier are out of space: The page allocation can fall back to
  any node from any lower tier, whereas the demotion order doesn't allow
  that.

This patch series make the creation of memory tiers explicit under the
control of device driver.

Memory Tier Initialization
==========================

Linux kernel presents memory devices as NUMA nodes and each memory device
is of a specific type.  The memory type of a device is represented by its
abstract distance.  A memory tier corresponds to a range of abstract
distance.  This allows for classifying memory devices with a specific
performance range into a memory tier.

By default, all memory nodes are assigned to the default tier with
abstract distance 512.

A device driver can move its memory nodes from the default tier.  For
example, PMEM can move its memory nodes below the default tier, whereas
GPU can move its memory nodes above the default tier.

The kernel initialization code makes the decision on which exact tier a
memory node should be assigned to based on the requests from the device
drivers as well as the memory device hardware information provided by the
firmware.

Hot-adding/removing CPUs doesn't affect memory tier hierarchy.


This patch (of 10):

In the current kernel, memory tiers are defined implicitly via a demotion
path relationship between NUMA nodes, which is created during the kernel
initialization and updated when a NUMA node is hot-added or hot-removed. 
The current implementation puts all nodes with CPU into the highest tier,
and builds the tier hierarchy by establishing the per-node demotion
targets based on the distances between nodes.

This current memory tier kernel implementation needs to be improved for
several important use cases,

The current tier initialization code always initializes each memory-only
NUMA node into a lower tier.  But a memory-only NUMA node may have a high
performance memory device (e.g.  a DRAM-backed memory-only node on a
virtual machine) that should be put into a higher tier.

The current tier hierarchy always puts CPU nodes into the top tier.  But
on a system with HBM or GPU devices, the memory-only NUMA nodes mapping
these devices should be in the top tier, and DRAM nodes with CPUs are
better to be placed into the next lower tier.

With current kernel higher tier node can only be demoted to nodes with
shortest distance on the next lower tier as defined by the demotion path,
not any other node from any lower tier.  This strict, demotion order does
not work in all use cases (e.g.  some use cases may want to allow
cross-socket demotion to another node in the same demotion tier as a
fallback when the preferred demotion node is out of space), This demotion
order is also inconsistent with the page allocation fallback order when
all the nodes in a higher tier are out of space: The page allocation can
fall back to any node from any lower tier, whereas the demotion order
doesn't allow that.

This patch series address the above by defining memory tiers explicitly.

Linux kernel presents memory devices as NUMA nodes and each memory device
is of a specific type.  The memory type of a device is represented by its
abstract distance.  A memory tier corresponds to a range of abstract
distance.  This allows for classifying memory devices with a specific
performance range into a memory tier.

This patch configures the range/chunk size to be 128.  The default DRAM
abstract distance is 512.  We can have 4 memory tiers below the default
DRAM with abstract distance range 0 - 127, 127 - 255, 256- 383, 384 - 511.
Faster memory devices can be placed in these faster(higher) memory tiers.
Slower memory devices like persistent memory will have abstract distance
higher than the default DRAM level.

[akpm@linux-foundation.org: fix comment, per Aneesh]
Link: https://lkml.kernel.org/r/20220818131042.113280-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20220818131042.113280-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Acked-by: Wei Xu &lt;weixugc@google.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Bharata B Rao &lt;bharata@amd.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Hesham Almatary &lt;hesham.almatary@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Jagdish Gediya &lt;jvgediya.oss@gmail.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "mm/demotion: Memory tiers and demotion", v15.

The current kernel has the basic memory tiering support: Inactive pages on
a higher tier NUMA node can be migrated (demoted) to a lower tier NUMA
node to make room for new allocations on the higher tier NUMA node. 
Frequently accessed pages on a lower tier NUMA node can be migrated
(promoted) to a higher tier NUMA node to improve the performance.

In the current kernel, memory tiers are defined implicitly via a demotion
path relationship between NUMA nodes, which is created during the kernel
initialization and updated when a NUMA node is hot-added or hot-removed. 
The current implementation puts all nodes with CPU into the highest tier,
and builds the tier hierarchy tier-by-tier by establishing the per-node
demotion targets based on the distances between nodes.

This current memory tier kernel implementation needs to be improved for
several important use cases:

* The current tier initialization code always initializes each
  memory-only NUMA node into a lower tier.  But a memory-only NUMA node
  may have a high performance memory device (e.g.  a DRAM-backed
  memory-only node on a virtual machine) and that should be put into a
  higher tier.

* The current tier hierarchy always puts CPU nodes into the top tier. 
  But on a system with HBM (e.g.  GPU memory) devices, these memory-only
  HBM NUMA nodes should be in the top tier, and DRAM nodes with CPUs are
  better to be placed into the next lower tier.

* Also because the current tier hierarchy always puts CPU nodes into the
  top tier, when a CPU is hot-added (or hot-removed) and triggers a memory
  node from CPU-less into a CPU node (or vice versa), the memory tier
  hierarchy gets changed, even though no memory node is added or removed. 
  This can make the tier hierarchy unstable and make it difficult to
  support tier-based memory accounting.

* A higher tier node can only be demoted to nodes with shortest distance
  on the next lower tier as defined by the demotion path, not any other
  node from any lower tier.  This strict, demotion order does not work in
  all use cases (e.g.  some use cases may want to allow cross-socket
  demotion to another node in the same demotion tier as a fallback when
  the preferred demotion node is out of space), and has resulted in the
  feature request for an interface to override the system-wide, per-node
  demotion order from the userspace.  This demotion order is also
  inconsistent with the page allocation fallback order when all the nodes
  in a higher tier are out of space: The page allocation can fall back to
  any node from any lower tier, whereas the demotion order doesn't allow
  that.

This patch series make the creation of memory tiers explicit under the
control of device driver.

Memory Tier Initialization
==========================

Linux kernel presents memory devices as NUMA nodes and each memory device
is of a specific type.  The memory type of a device is represented by its
abstract distance.  A memory tier corresponds to a range of abstract
distance.  This allows for classifying memory devices with a specific
performance range into a memory tier.

By default, all memory nodes are assigned to the default tier with
abstract distance 512.

A device driver can move its memory nodes from the default tier.  For
example, PMEM can move its memory nodes below the default tier, whereas
GPU can move its memory nodes above the default tier.

The kernel initialization code makes the decision on which exact tier a
memory node should be assigned to based on the requests from the device
drivers as well as the memory device hardware information provided by the
firmware.

Hot-adding/removing CPUs doesn't affect memory tier hierarchy.


This patch (of 10):

In the current kernel, memory tiers are defined implicitly via a demotion
path relationship between NUMA nodes, which is created during the kernel
initialization and updated when a NUMA node is hot-added or hot-removed. 
The current implementation puts all nodes with CPU into the highest tier,
and builds the tier hierarchy by establishing the per-node demotion
targets based on the distances between nodes.

This current memory tier kernel implementation needs to be improved for
several important use cases,

The current tier initialization code always initializes each memory-only
NUMA node into a lower tier.  But a memory-only NUMA node may have a high
performance memory device (e.g.  a DRAM-backed memory-only node on a
virtual machine) that should be put into a higher tier.

The current tier hierarchy always puts CPU nodes into the top tier.  But
on a system with HBM or GPU devices, the memory-only NUMA nodes mapping
these devices should be in the top tier, and DRAM nodes with CPUs are
better to be placed into the next lower tier.

With current kernel higher tier node can only be demoted to nodes with
shortest distance on the next lower tier as defined by the demotion path,
not any other node from any lower tier.  This strict, demotion order does
not work in all use cases (e.g.  some use cases may want to allow
cross-socket demotion to another node in the same demotion tier as a
fallback when the preferred demotion node is out of space), This demotion
order is also inconsistent with the page allocation fallback order when
all the nodes in a higher tier are out of space: The page allocation can
fall back to any node from any lower tier, whereas the demotion order
doesn't allow that.

This patch series address the above by defining memory tiers explicitly.

Linux kernel presents memory devices as NUMA nodes and each memory device
is of a specific type.  The memory type of a device is represented by its
abstract distance.  A memory tier corresponds to a range of abstract
distance.  This allows for classifying memory devices with a specific
performance range into a memory tier.

This patch configures the range/chunk size to be 128.  The default DRAM
abstract distance is 512.  We can have 4 memory tiers below the default
DRAM with abstract distance range 0 - 127, 127 - 255, 256- 383, 384 - 511.
Faster memory devices can be placed in these faster(higher) memory tiers.
Slower memory devices like persistent memory will have abstract distance
higher than the default DRAM level.

[akpm@linux-foundation.org: fix comment, per Aneesh]
Link: https://lkml.kernel.org/r/20220818131042.113280-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20220818131042.113280-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Acked-by: Wei Xu &lt;weixugc@google.com&gt;
Cc: Alistair Popple &lt;apopple@nvidia.com&gt;
Cc: Bharata B Rao &lt;bharata@amd.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Hesham Almatary &lt;hesham.almatary@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Cc: Jagdish Gediya &lt;jvgediya.oss@gmail.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
