linux.git/Documentation/admin-guide/mm, branch v6.15

Docs/admin-guide/mm/damon/usage: update for {core,ops}_filters directories

2025-03-18T05:06:50+00:00

Document {core,ops}_filters directories on usage document.

Link: https://lkml.kernel.org/r/20250305222733.59089-9-sj@kernel.org
Signed-off-by: SeongJae Park 
Cc: Jonathan Corbet 
Signed-off-by: Andrew Morton

fs/proc/task_mmu: remove per-page mapcount dependency for PM_MMAP_EXCLUSIVE (CONFIG_NO_PAGE_MAPCOUNT)

2025-03-18T05:06:47+00:00

Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.

PM_MMAP_EXCLUSIVE will now be set if folio_likely_mapped_shared() is true
-- when the folio is considered "mapped shared", including when it once
was "mapped shared" but no longer is, as documented.

This might result in and under-indication of "exclusively mapped", which
is considered better than over-indicating it: under-estimating the USS
(Unique Set Size) is better than over-estimating it.

As an alternative, we could simply remove that flag with
CONFIG_NO_PAGE_MAPCOUNT completely, but there might be value to it.  So,
let's keep it like that and document the behavior.

Link: https://lkml.kernel.org/r/20250303163014.1128035-18-david@redhat.com
Signed-off-by: David Hildenbrand 
Cc: Andy Lutomirks^H^Hski 
Cc: Borislav Betkov 
Cc: Dave Hansen 
Cc: Ingo Molnar 
Cc: Jann Horn 
Cc: Johannes Weiner 
Cc: Jonathan Corbet 
Cc: Kirill A. Shutemov 
Cc: Lance Yang 
Cc: Liam Howlett 
Cc: Lorenzo Stoakes 
Cc: Matthew Wilcow (Oracle) 
Cc: Michal Koutn 
Cc: Muchun Song 
Cc: tejun heo 
Cc: Thomas Gleixner 
Cc: Vlastimil Babka 
Cc: Zefan Li 
Signed-off-by: Andrew Morton

fs/proc/page: remove per-page mapcount dependency for /proc/kpagecount (CONFIG_NO_PAGE_MAPCOUNT)

2025-03-18T05:06:47+00:00

Let's implement an alternative when per-page mapcounts in large folios are
no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.

For large folios, we'll return the per-page average mapcount within the
folio, whereby we round to the closest integer when calculating the
average: however, we'll always return at least 1 if the folio is mapped.

So assuming a folio with 512 pages, the average would be:
* 0 if not pages are mapped
* 1 if there are 1 .. 767 per-page mappings
* 2 if there are 767 .. 1279 per-page mappings
...

For hugetlb folios and for large folios that are fully mapped into all
address spaces, there is no change.

We'll make use of this helper in other context next.

As an alternative, we could simply return 0 for non-hugetlb large folios,
or disable this legacy interface with CONFIG_NO_PAGE_MAPCOUNT.

But the information exposed by this interface can still be valuable, and
frequently we deal with fully-mapped large folios where the average
corresponds to the actual page mapcount.  So we'll leave it like this for
now and document the new behavior.

Note: this interface is likely not very relevant for performance.  If ever
required, we could try doing a rather expensive rmap walk to collect
precisely how often this folio page is mapped.

Link: https://lkml.kernel.org/r/20250303163014.1128035-17-david@redhat.com
Signed-off-by: David Hildenbrand 
Cc: Andy Lutomirks^H^Hski 
Cc: Borislav Betkov 
Cc: Dave Hansen 
Cc: Ingo Molnar 
Cc: Jann Horn 
Cc: Johannes Weiner 
Cc: Jonathan Corbet 
Cc: Kirill A. Shutemov 
Cc: Lance Yang 
Cc: Liam Howlett 
Cc: Lorenzo Stoakes 
Cc: Matthew Wilcow (Oracle) 
Cc: Michal Koutn 
Cc: Muchun Song 
Cc: tejun heo 
Cc: Thomas Gleixner 
Cc: Vlastimil Babka 
Cc: Zefan Li 
Signed-off-by: Andrew Morton

mm: hugetlb: add hugetlb_alloc_threads cmdline option

2025-03-17T07:05:36+00:00

Add a command line option that enables control of how many threads should
be used to allocate huge pages.

[akpm@linux-foundation.org: tidy up a comment]
Link: https://lkml.kernel.org/r/20250227-hugepage-parameter-v2-2-7db8c6dc0453@cyberus-technology.de
Signed-off-by: Thomas Prescher 
Cc: Jonathan Corbet 
Cc: Muchun Song 
Signed-off-by: Andrew Morton

Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy

2025-03-17T07:05:34+00:00

Document DAMON sysfs interface usage for DAMON sampling and aggregation
intervals auto-tuning.

Link: https://lkml.kernel.org/r/20250303221726.484227-9-sj@kernel.org
Signed-off-by: SeongJae Park 
Cc: Jonathan Corbet 
Signed-off-by: Andrew Morton

fs/proc/task_mmu: add guard region bit to pagemap

2025-03-17T05:06:41+00:00

Patch series "fs/proc/task_mmu: add guard region bit to pagemap".

Currently there is no means of determining whether a given page in a
mapping range is designated a guard region (as installed via madvise()
using the MADV_GUARD_INSTALL flag).

This is generally not an issue, but in some instances users may wish to
determine whether this is the case.

This series adds this ability via /proc/$pid/pagemap, updates the
documentation and adds a self test to assert that this functions
correctly.


This patch (of 2):

Currently there is no means by which users can determine whether a given
page in memory is in fact a guard region, that is having had the
MADV_GUARD_INSTALL madvise() flag applied to it.

This is intentional, as to provide this information in VMA metadata would
contradict the intent of the feature (providing a means to change fault
behaviour at a page table level rather than a VMA level), and would
require VMA metadata operations to scan page tables, which is
unacceptable.

In many cases, users have no need to reflect and determine what regions
have been designated guard regions, as it is the user who has established
them in the first place.

But in some instances, such as monitoring software, or software that
relies upon being able to ascertain the nature of mappings within a remote
process for instance, it becomes useful to be able to determine which
pages have the guard region marker applied.

This patch makes use of an unused pagemap bit (58) to provide this
information.

This patch updates the documentation at the same time as making the change
such that the implementation of the feature and the documentation of it
are tied together.

Link: https://lkml.kernel.org/r/cover.1740139449.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/521d99c08b975fb06a1e7201e971cc24d68196d1.1740139449.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes 
Acked-by: David Hildenbrand 
Cc: Jann Horn 
Cc: Jonathan Corbet 
Cc: Kalesh Singh 
Cc: Liam Howlett 
Cc: Matthew Wilcow (Oracle) 
Cc: "Paul E . McKenney" 
Cc: Shuah Khan 
Cc: Suren Baghdasaryan 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton

mm, cma: support multiple contiguous ranges, if requested

2025-03-17T05:06:25+00:00

Currently, CMA manages one range of physically contiguous memory. 
Creation of larger CMA areas with hugetlb_cma may run in to gaps in
physical memory, so that they are not able to allocate that contiguous
physical range from memblock when creating the CMA area.

This can happen, for example, on an AMD system with > 1TB of memory, where
there will be a gap just below the 1TB (40bit DMA) line.  If you have set
aside most of memory for potential hugetlb CMA allocation,
cma_declare_contiguous_nid will fail.

hugetlb_cma doesn't need the entire area to be one physically contiguous
range.  It just cares about being able to get physically contiguous chunks
of a certain size (e.g.  1G), and it is fine to have the CMA area backed
by multiple physical ranges, as long as it gets 1G contiguous allocations.

Multi-range support is implemented by introducing an array of ranges,
instead of just one big one.  Each range has its own bitmap.  Effectively,
the allocate and release operations work as before, just per-range.  So,
instead of going through one large bitmap, they now go through a number of
smaller ones.

The maximum number of supported ranges is 8, as defined in CMA_MAX_RANGES.

Since some current users of CMA expect a CMA area to just use one
physically contiguous range, only allow for multiple ranges if a new
interface, cma_declare_contiguous_nid_multi, is used.  The other
interfaces will work like before, creating only CMA areas with 1 range.

cma_declare_contiguous_nid_multi works as follows, mimicking the
default "bottom-up, above 4G" reservation approach:

0) Try cma_declare_contiguous_nid, which will use only one
   region. If this succeeds, return. This makes sure that for
   all the cases that currently work, the behavior remains
   unchanged even if the caller switches from
   cma_declare_contiguous_nid to cma_declare_contiguous_nid_multi.
1) Select the largest free memblock ranges above 4G, with
   a maximum number of CMA_MAX_RANGES.
2) If we did not find at most CMA_MAX_RANGES that add
   up to the total size requested, return -ENOMEM.
3) Sort the selected ranges by base address.
4) Reserve them bottom-up until we get what we wanted.

Link: https://lkml.kernel.org/r/20250228182928.2645936-3-fvdl@google.com
Signed-off-by: Frank van der Linden 
Cc: Arnd Bergmann 
Cc: Alexander Gordeev 
Cc: Andy Lutomirski 
Cc: Dan Carpenter 
Cc: Dave Hansen 
Cc: David Hildenbrand 
Cc: Heiko Carstens 
Cc: Joao Martins 
Cc: Johannes Weiner 
Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Muchun Song 
Cc: Oscar Salvador 
Cc: Peter Zijlstra 
Cc: Roman Gushchin (Cruise) 
Cc: Usama Arif 
Cc: Vasily Gorbik 
Cc: Yu Zhao 
Cc: Zi Yan 
Signed-off-by: Andrew Morton

Docs/damon: move DAMOS filter type names and meaning to design doc

2025-03-17T05:06:24+00:00

DAMON sysfs usage doc is describing DAMOS filter type names and their
meanings in short.  The design doc is providing the short meaning and
detailed descriptions, too.  This is unnecessary duplicates and confuses
where to document new DAMOS filter types and features.  Move the details
from usage to design doc.

Link: https://lkml.kernel.org/r/20250218223708.53437-4-sj@kernel.org
Signed-off-by: SeongJae Park 
Cc: Jonathan Corbet 
Signed-off-by: Andrew Morton

Docs/admin-guide/mm/damon/usage: document hugepage_size filter type

2025-03-17T05:06:13+00:00

This includes both the 'hugepage_size' filter type and the min/max files
used to decide range of sizes to filter on.

Link: https://lkml.kernel.org/r/20250211124437.278873-5-usamaarif642@gmail.com
Signed-off-by: Usama Arif 
Reviewed-by: SeongJae Park 
Cc: David Hildenbrand 
Cc: Johannes Weiner 
Signed-off-by: Andrew Morton

mm: zbud: remove zbud

2025-03-17T05:06:01+00:00

The zbud compressed pages allocator is rarely used, most users use
zsmalloc.  zbud consumes much more memory (only stores 1 or 2 compressed
pages per physical page).  The only advantage of zbud is a marginal
performance improvement that by no means justify the memory overhead.

Historically, zsmalloc had significantly worse latency than zbud and
z3fold but offered better memory savings.  This is no longer the case as
shown by a simple recent analysis [1].  In a kernel build test on tmpfs in
a limited cgroup, zbud 2-3% less time than zsmalloc, but at the cost of
using ~32% more memory (1.5G vs 1.13G).  The tradeoff does not make sense
for zbud in any practical scenario.

The only alleged advantage of zbud is not having the dependency on
CONFIG_MMU, but CONFIG_SWAP already depends on CONFIG_MMU anyway, and zbud
is only used by zswap.

Remove zbud after z3fold's removal, leaving zsmalloc as the one and only
zpool allocator.  Leave the removal of the zpool API (and its associated
config options) to a followup cleanup after no more allocators show up.

Deprecating zbud for a few cycles before removing it was initially
proposed [2], like z3fold was marked as deprecated for 2 cycles [3]. 
However, Johannes rightfully pointed out that the 2 cycles is too short
for most downstream consumers, and z3fold was deprecated first only as a
courtesy anyway.

[1]https://lore.kernel.org/lkml/CAJD7tkbRF6od-2x_L8-A1QL3=2Ww13sCj4S3i4bNndqF+3+_Vg@mail.gmail.com/
[2]https://lore.kernel.org/lkml/Z5gdnSX5Lv-nfjQL@google.com/
[3]https://lore.kernel.org/lkml/20240904233343.933462-1-yosryahmed@google.com/

Link: https://lkml.kernel.org/r/20250129180633.3501650-3-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed 
Reviewed-by: Shakeel Butt 
Acked-by: Johannes Weiner 
Acked-by: Nhat Pham 
Cc: Alexander Gordeev 
Cc: Chengming Zhou 
Cc: Christian Borntraeger 
Cc: Dan Streetman 
Cc: Heiko Carstens 
Cc: Huacai Chen 
Cc: Miaohe Lin 
Cc: Seth Jennings 
Cc: Sven Schnelle 
Cc: Vasily Gorbik 
Cc: Vitaly Wool 
Cc: Vlastimil Babka 
Cc: WANG Xuerui 
Signed-off-by: Andrew Morton