<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/migrate.c, branch v5.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: Clear page-&gt;private when splitting or migrating a page</title>
<updated>2022-06-23T16:21:44+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2022-06-19T14:37:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b653db77350c7307a513b81856fe53e94cf42446'/>
<id>b653db77350c7307a513b81856fe53e94cf42446</id>
<content type='text'>
In our efforts to remove uses of PG_private, we have found folios with
the private flag clear and folio-&gt;private not-NULL.  That is the root
cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead
of folio-&gt;private").  It can also affect a few other filesystems that
haven't yet reported a problem.

compaction_alloc() can return a page with uninitialised page-&gt;private,
and rather than checking all the callers of migrate_pages(), just zero
page-&gt;private after calling get_new_page().  Similarly, the tail pages
from split_huge_page() may also have an uninitialised page-&gt;private.

Reported-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Tested-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In our efforts to remove uses of PG_private, we have found folios with
the private flag clear and folio-&gt;private not-NULL.  That is the root
cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead
of folio-&gt;private").  It can also affect a few other filesystems that
haven't yet reported a problem.

compaction_alloc() can return a page with uninitialised page-&gt;private,
and rather than checking all the callers of migrate_pages(), just zero
page-&gt;private after calling get_new_page().  Similarly, the tail pages
from split_huge_page() may also have an uninitialised page-&gt;private.

Reported-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Tested-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2022-05-26T19:32:41+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-26T19:32:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=98931dd95fd489fcbfa97da563505a6f071d7c77'/>
<id>98931dd95fd489fcbfa97da563505a6f071d7c77</id>
<content type='text'>
Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/migrate: convert move_to_new_page() into move_to_new_folio()</title>
<updated>2022-05-13T14:20:17+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2022-05-13T03:23:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e7e3ffeb274f1ff5bc68bb9135128e1ba14a7d53'/>
<id>e7e3ffeb274f1ff5bc68bb9135128e1ba14a7d53</id>
<content type='text'>
Pass in the folios that we already have in each caller.  Saves a
lot of calls to compound_head().

Link: https://lkml.kernel.org/r/20220504182857.4013401-27-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pass in the folios that we already have in each caller.  Saves a
lot of calls to compound_head().

Link: https://lkml.kernel.org/r/20220504182857.4013401-27-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: convert sysfs input to bool using kstrtobool()</title>
<updated>2022-05-13T14:20:13+00:00</updated>
<author>
<name>Jagdish Gediya</name>
<email>jvgediya@linux.ibm.com</email>
</author>
<published>2022-05-13T03:22:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=717aeab42943efa7cfa876b3b687c6ff36eae867'/>
<id>717aeab42943efa7cfa876b3b687c6ff36eae867</id>
<content type='text'>
Sysfs input conversion to corrosponding bool value e.g.  "false" or "0" to
false, "true" or "1" to true are currently handled through strncmp at
multiple places.  Use kstrtobool() to convert sysfs input to bool value.

[akpm@linux-foundation.org: propagate kstrtobool() return value, per Andy]
Link: https://lkml.kernel.org/r/20220426180203.70782-2-jvgediya@linux.ibm.com
Signed-off-by: Jagdish Gediya &lt;jvgediya@linux.ibm.com&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Richard Fitzgerald &lt;rf@opensource.cirrus.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Sysfs input conversion to corrosponding bool value e.g.  "false" or "0" to
false, "true" or "1" to true are currently handled through strncmp at
multiple places.  Use kstrtobool() to convert sysfs input to bool value.

[akpm@linux-foundation.org: propagate kstrtobool() return value, per Andy]
Link: https://lkml.kernel.org/r/20220426180203.70782-2-jvgediya@linux.ibm.com
Signed-off-by: Jagdish Gediya &lt;jvgediya@linux.ibm.com&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Andy Shevchenko &lt;andy.shevchenko@gmail.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Richard Fitzgerald &lt;rf@opensource.cirrus.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: Change try_to_free_buffers() to take a folio</title>
<updated>2022-05-10T03:12:34+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2022-05-01T05:08:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=68189fef88c7d02eb92e038be3d6428ebd0d2945'/>
<id>68189fef88c7d02eb92e038be3d6428ebd0d2945</id>
<content type='text'>
All but two of the callers already have a folio; pass a folio into
try_to_free_buffers().  This removes the last user of cancel_dirty_page()
so remove that wrapper function too.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All but two of the callers already have a folio; pass a folio into
try_to_free_buffers().  This removes the last user of cancel_dirty_page()
so remove that wrapper function too.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remember exclusively mapped anonymous pages with PG_anon_exclusive</title>
<updated>2022-05-10T01:20:44+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2022-05-10T01:20:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6c287605fd56466e645693eff3ae7c08fba56e0a'/>
<id>6c287605fd56466e645693eff3ae7c08fba56e0a</id>
<content type='text'>
Let's mark exclusively mapped anonymous pages with PG_anon_exclusive as
exclusive, and use that information to make GUP pins reliable and stay
consistent with the page mapped into the page table even if the page table
entry gets write-protected.

With that information at hand, we can extend our COW logic to always reuse
anonymous pages that are exclusive.  For anonymous pages that might be
shared, the existing logic applies.

As already documented, PG_anon_exclusive is usually only expressive in
combination with a page table entry.  Especially PTE vs.  PMD-mapped
anonymous pages require more thought, some examples: due to mremap() we
can easily have a single compound page PTE-mapped into multiple page
tables exclusively in a single process -- multiple page table locks apply.
Further, due to MADV_WIPEONFORK we might not necessarily write-protect
all PTEs, and only some subpages might be pinned.  Long story short: once
PTE-mapped, we have to track information about exclusivity per sub-page,
but until then, we can just track it for the compound page in the head
page and not having to update a whole bunch of subpages all of the time
for a simple PMD mapping of a THP.

For simplicity, this commit mostly talks about "anonymous pages", while
it's for THP actually "the part of an anonymous folio referenced via a
page table entry".

To not spill PG_anon_exclusive code all over the mm code-base, we let the
anon rmap code to handle all PG_anon_exclusive logic it can easily handle.

If a writable, present page table entry points at an anonymous (sub)page,
that (sub)page must be PG_anon_exclusive.  If GUP wants to take a reliably
pin (FOLL_PIN) on an anonymous page references via a present page table
entry, it must only pin if PG_anon_exclusive is set for the mapped
(sub)page.

This commit doesn't adjust GUP, so this is only implicitly handled for
FOLL_WRITE, follow-up commits will teach GUP to also respect it for
FOLL_PIN without FOLL_WRITE, to make all GUP pins of anonymous pages fully
reliable.

Whenever an anonymous page is to be shared (fork(), KSM), or when
temporarily unmapping an anonymous page (swap, migration), the relevant
PG_anon_exclusive bit has to be cleared to mark the anonymous page
possibly shared.  Clearing will fail if there are GUP pins on the page:

* For fork(), this means having to copy the page and not being able to
  share it.  fork() protects against concurrent GUP using the PT lock and
  the src_mm-&gt;write_protect_seq.

* For KSM, this means sharing will fail.  For swap this means, unmapping
  will fail, For migration this means, migration will fail early.  All
  three cases protect against concurrent GUP using the PT lock and a
  proper clear/invalidate+flush of the relevant page table entry.

This fixes memory corruptions reported for FOLL_PIN | FOLL_WRITE, when a
pinned page gets mapped R/O and the successive write fault ends up
replacing the page instead of reusing it.  It improves the situation for
O_DIRECT/vmsplice/...  that still use FOLL_GET instead of FOLL_PIN, if
fork() is *not* involved, however swapout and fork() are still
problematic.  Properly using FOLL_PIN instead of FOLL_GET for these GUP
users will fix the issue for them.

I. Details about basic handling

I.1. Fresh anonymous pages

page_add_new_anon_rmap() and hugepage_add_new_anon_rmap() will mark the
given page exclusive via __page_set_anon_rmap(exclusive=1).  As that is
the mechanism fresh anonymous pages come into life (besides migration code
where we copy the page-&gt;mapping), all fresh anonymous pages will start out
as exclusive.

I.2. COW reuse handling of anonymous pages

When a COW handler stumbles over a (sub)page that's marked exclusive, it
simply reuses it.  Otherwise, the handler tries harder under page lock to
detect if the (sub)page is exclusive and can be reused.  If exclusive,
page_move_anon_rmap() will mark the given (sub)page exclusive.

Note that hugetlb code does not yet check for PageAnonExclusive(), as it
still uses the old COW logic that is prone to the COW security issue
because hugetlb code cannot really tolerate unnecessary/wrong COW as huge
pages are a scarce resource.

I.3. Migration handling

try_to_migrate() has to try marking an exclusive anonymous page shared via
page_try_share_anon_rmap().  If it fails because there are GUP pins on the
page, unmap fails.  migrate_vma_collect_pmd() and
__split_huge_pmd_locked() are handled similarly.

Writable migration entries implicitly point at shared anonymous pages. 
For readable migration entries that information is stored via a new
"readable-exclusive" migration entry, specific to anonymous pages.

When restoring a migration entry in remove_migration_pte(), information
about exlusivity is detected via the migration entry type, and
RMAP_EXCLUSIVE is set accordingly for
page_add_anon_rmap()/hugepage_add_anon_rmap() to restore that information.

I.4. Swapout handling

try_to_unmap() has to try marking the mapped page possibly shared via
page_try_share_anon_rmap().  If it fails because there are GUP pins on the
page, unmap fails.  For now, information about exclusivity is lost.  In
the future, we might want to remember that information in the swap entry
in some cases, however, it requires more thought, care, and a way to store
that information in swap entries.

I.5. Swapin handling

do_swap_page() will never stumble over exclusive anonymous pages in the
swap cache, as try_to_migrate() prohibits that.  do_swap_page() always has
to detect manually if an anonymous page is exclusive and has to set
RMAP_EXCLUSIVE for page_add_anon_rmap() accordingly.

I.6. THP handling

__split_huge_pmd_locked() has to move the information about exclusivity
from the PMD to the PTEs.

a) In case we have a readable-exclusive PMD migration entry, simply
   insert readable-exclusive PTE migration entries.

b) In case we have a present PMD entry and we don't want to freeze
   ("convert to migration entries"), simply forward PG_anon_exclusive to
   all sub-pages, no need to temporarily clear the bit.

c) In case we have a present PMD entry and want to freeze, handle it
   similar to try_to_migrate(): try marking the page shared first.  In
   case we fail, we ignore the "freeze" instruction and simply split
   ordinarily.  try_to_migrate() will properly fail because the THP is
   still mapped via PTEs.

When splitting a compound anonymous folio (THP), the information about
exclusivity is implicitly handled via the migration entries: no need to
replicate PG_anon_exclusive manually.

I.7.  fork() handling fork() handling is relatively easy, because
PG_anon_exclusive is only expressive for some page table entry types.

a) Present anonymous pages

page_try_dup_anon_rmap() will mark the given subpage shared -- which will
fail if the page is pinned.  If it failed, we have to copy (or PTE-map a
PMD to handle it on the PTE level).

Note that device exclusive entries are just a pointer at a PageAnon()
page.  fork() will first convert a device exclusive entry to a present
page table and handle it just like present anonymous pages.

b) Device private entry

Device private entries point at PageAnon() pages that cannot be mapped
directly and, therefore, cannot get pinned.

page_try_dup_anon_rmap() will mark the given subpage shared, which cannot
fail because they cannot get pinned.

c) HW poison entries

PG_anon_exclusive will remain untouched and is stale -- the page table
entry is just a placeholder after all.

d) Migration entries

Writable and readable-exclusive entries are converted to readable entries:
possibly shared.

I.8. mprotect() handling

mprotect() only has to properly handle the new readable-exclusive
migration entry:

When write-protecting a migration entry that points at an anonymous page,
remember the information about exclusivity via the "readable-exclusive"
migration entry type.

II. Migration and GUP-fast

Whenever replacing a present page table entry that maps an exclusive
anonymous page by a migration entry, we have to mark the page possibly
shared and synchronize against GUP-fast by a proper clear/invalidate+flush
to make the following scenario impossible:

1. try_to_migrate() places a migration entry after checking for GUP pins
   and marks the page possibly shared.

2. GUP-fast pins the page due to lack of synchronization

3. fork() converts the "writable/readable-exclusive" migration entry into a
   readable migration entry

4. Migration fails due to the GUP pin (failing to freeze the refcount)

5. Migration entries are restored. PG_anon_exclusive is lost

-&gt; We have a pinned page that is not marked exclusive anymore.

Note that we move information about exclusivity from the page to the
migration entry as it otherwise highly overcomplicates fork() and
PTE-mapping a THP.

III. Swapout and GUP-fast

Whenever replacing a present page table entry that maps an exclusive
anonymous page by a swap entry, we have to mark the page possibly shared
and synchronize against GUP-fast by a proper clear/invalidate+flush to
make the following scenario impossible:

1. try_to_unmap() places a swap entry after checking for GUP pins and
   clears exclusivity information on the page.

2. GUP-fast pins the page due to lack of synchronization.

-&gt; We have a pinned page that is not marked exclusive anymore.

If we'd ever store information about exclusivity in the swap entry,
similar to migration handling, the same considerations as in II would
apply.  This is future work.

Link: https://lkml.kernel.org/r/20220428083441.37290-13-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Let's mark exclusively mapped anonymous pages with PG_anon_exclusive as
exclusive, and use that information to make GUP pins reliable and stay
consistent with the page mapped into the page table even if the page table
entry gets write-protected.

With that information at hand, we can extend our COW logic to always reuse
anonymous pages that are exclusive.  For anonymous pages that might be
shared, the existing logic applies.

As already documented, PG_anon_exclusive is usually only expressive in
combination with a page table entry.  Especially PTE vs.  PMD-mapped
anonymous pages require more thought, some examples: due to mremap() we
can easily have a single compound page PTE-mapped into multiple page
tables exclusively in a single process -- multiple page table locks apply.
Further, due to MADV_WIPEONFORK we might not necessarily write-protect
all PTEs, and only some subpages might be pinned.  Long story short: once
PTE-mapped, we have to track information about exclusivity per sub-page,
but until then, we can just track it for the compound page in the head
page and not having to update a whole bunch of subpages all of the time
for a simple PMD mapping of a THP.

For simplicity, this commit mostly talks about "anonymous pages", while
it's for THP actually "the part of an anonymous folio referenced via a
page table entry".

To not spill PG_anon_exclusive code all over the mm code-base, we let the
anon rmap code to handle all PG_anon_exclusive logic it can easily handle.

If a writable, present page table entry points at an anonymous (sub)page,
that (sub)page must be PG_anon_exclusive.  If GUP wants to take a reliably
pin (FOLL_PIN) on an anonymous page references via a present page table
entry, it must only pin if PG_anon_exclusive is set for the mapped
(sub)page.

This commit doesn't adjust GUP, so this is only implicitly handled for
FOLL_WRITE, follow-up commits will teach GUP to also respect it for
FOLL_PIN without FOLL_WRITE, to make all GUP pins of anonymous pages fully
reliable.

Whenever an anonymous page is to be shared (fork(), KSM), or when
temporarily unmapping an anonymous page (swap, migration), the relevant
PG_anon_exclusive bit has to be cleared to mark the anonymous page
possibly shared.  Clearing will fail if there are GUP pins on the page:

* For fork(), this means having to copy the page and not being able to
  share it.  fork() protects against concurrent GUP using the PT lock and
  the src_mm-&gt;write_protect_seq.

* For KSM, this means sharing will fail.  For swap this means, unmapping
  will fail, For migration this means, migration will fail early.  All
  three cases protect against concurrent GUP using the PT lock and a
  proper clear/invalidate+flush of the relevant page table entry.

This fixes memory corruptions reported for FOLL_PIN | FOLL_WRITE, when a
pinned page gets mapped R/O and the successive write fault ends up
replacing the page instead of reusing it.  It improves the situation for
O_DIRECT/vmsplice/...  that still use FOLL_GET instead of FOLL_PIN, if
fork() is *not* involved, however swapout and fork() are still
problematic.  Properly using FOLL_PIN instead of FOLL_GET for these GUP
users will fix the issue for them.

I. Details about basic handling

I.1. Fresh anonymous pages

page_add_new_anon_rmap() and hugepage_add_new_anon_rmap() will mark the
given page exclusive via __page_set_anon_rmap(exclusive=1).  As that is
the mechanism fresh anonymous pages come into life (besides migration code
where we copy the page-&gt;mapping), all fresh anonymous pages will start out
as exclusive.

I.2. COW reuse handling of anonymous pages

When a COW handler stumbles over a (sub)page that's marked exclusive, it
simply reuses it.  Otherwise, the handler tries harder under page lock to
detect if the (sub)page is exclusive and can be reused.  If exclusive,
page_move_anon_rmap() will mark the given (sub)page exclusive.

Note that hugetlb code does not yet check for PageAnonExclusive(), as it
still uses the old COW logic that is prone to the COW security issue
because hugetlb code cannot really tolerate unnecessary/wrong COW as huge
pages are a scarce resource.

I.3. Migration handling

try_to_migrate() has to try marking an exclusive anonymous page shared via
page_try_share_anon_rmap().  If it fails because there are GUP pins on the
page, unmap fails.  migrate_vma_collect_pmd() and
__split_huge_pmd_locked() are handled similarly.

Writable migration entries implicitly point at shared anonymous pages. 
For readable migration entries that information is stored via a new
"readable-exclusive" migration entry, specific to anonymous pages.

When restoring a migration entry in remove_migration_pte(), information
about exlusivity is detected via the migration entry type, and
RMAP_EXCLUSIVE is set accordingly for
page_add_anon_rmap()/hugepage_add_anon_rmap() to restore that information.

I.4. Swapout handling

try_to_unmap() has to try marking the mapped page possibly shared via
page_try_share_anon_rmap().  If it fails because there are GUP pins on the
page, unmap fails.  For now, information about exclusivity is lost.  In
the future, we might want to remember that information in the swap entry
in some cases, however, it requires more thought, care, and a way to store
that information in swap entries.

I.5. Swapin handling

do_swap_page() will never stumble over exclusive anonymous pages in the
swap cache, as try_to_migrate() prohibits that.  do_swap_page() always has
to detect manually if an anonymous page is exclusive and has to set
RMAP_EXCLUSIVE for page_add_anon_rmap() accordingly.

I.6. THP handling

__split_huge_pmd_locked() has to move the information about exclusivity
from the PMD to the PTEs.

a) In case we have a readable-exclusive PMD migration entry, simply
   insert readable-exclusive PTE migration entries.

b) In case we have a present PMD entry and we don't want to freeze
   ("convert to migration entries"), simply forward PG_anon_exclusive to
   all sub-pages, no need to temporarily clear the bit.

c) In case we have a present PMD entry and want to freeze, handle it
   similar to try_to_migrate(): try marking the page shared first.  In
   case we fail, we ignore the "freeze" instruction and simply split
   ordinarily.  try_to_migrate() will properly fail because the THP is
   still mapped via PTEs.

When splitting a compound anonymous folio (THP), the information about
exclusivity is implicitly handled via the migration entries: no need to
replicate PG_anon_exclusive manually.

I.7.  fork() handling fork() handling is relatively easy, because
PG_anon_exclusive is only expressive for some page table entry types.

a) Present anonymous pages

page_try_dup_anon_rmap() will mark the given subpage shared -- which will
fail if the page is pinned.  If it failed, we have to copy (or PTE-map a
PMD to handle it on the PTE level).

Note that device exclusive entries are just a pointer at a PageAnon()
page.  fork() will first convert a device exclusive entry to a present
page table and handle it just like present anonymous pages.

b) Device private entry

Device private entries point at PageAnon() pages that cannot be mapped
directly and, therefore, cannot get pinned.

page_try_dup_anon_rmap() will mark the given subpage shared, which cannot
fail because they cannot get pinned.

c) HW poison entries

PG_anon_exclusive will remain untouched and is stale -- the page table
entry is just a placeholder after all.

d) Migration entries

Writable and readable-exclusive entries are converted to readable entries:
possibly shared.

I.8. mprotect() handling

mprotect() only has to properly handle the new readable-exclusive
migration entry:

When write-protecting a migration entry that points at an anonymous page,
remember the information about exclusivity via the "readable-exclusive"
migration entry type.

II. Migration and GUP-fast

Whenever replacing a present page table entry that maps an exclusive
anonymous page by a migration entry, we have to mark the page possibly
shared and synchronize against GUP-fast by a proper clear/invalidate+flush
to make the following scenario impossible:

1. try_to_migrate() places a migration entry after checking for GUP pins
   and marks the page possibly shared.

2. GUP-fast pins the page due to lack of synchronization

3. fork() converts the "writable/readable-exclusive" migration entry into a
   readable migration entry

4. Migration fails due to the GUP pin (failing to freeze the refcount)

5. Migration entries are restored. PG_anon_exclusive is lost

-&gt; We have a pinned page that is not marked exclusive anymore.

Note that we move information about exclusivity from the page to the
migration entry as it otherwise highly overcomplicates fork() and
PTE-mapping a THP.

III. Swapout and GUP-fast

Whenever replacing a present page table entry that maps an exclusive
anonymous page by a swap entry, we have to mark the page possibly shared
and synchronize against GUP-fast by a proper clear/invalidate+flush to
make the following scenario impossible:

1. try_to_unmap() places a swap entry after checking for GUP pins and
   clears exclusivity information on the page.

2. GUP-fast pins the page due to lack of synchronization.

-&gt; We have a pinned page that is not marked exclusive anymore.

If we'd ever store information about exclusivity in the swap entry,
similar to migration handling, the same considerations as in II would
apply.  This is future work.

Link: https://lkml.kernel.org/r/20220428083441.37290-13-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/rmap: pass rmap flags to hugepage_add_anon_rmap()</title>
<updated>2022-05-10T01:20:43+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2022-05-10T01:20:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28c5209dfd5f86f4398ce01bfac8508b2c4d4050'/>
<id>28c5209dfd5f86f4398ce01bfac8508b2c4d4050</id>
<content type='text'>
Let's prepare for passing RMAP_EXCLUSIVE, similarly as we do for
page_add_anon_rmap() now.  RMAP_COMPOUND is implicit for hugetlb pages and
ignored.

Link: https://lkml.kernel.org/r/20220428083441.37290-8-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Let's prepare for passing RMAP_EXCLUSIVE, similarly as we do for
page_add_anon_rmap() now.  RMAP_COMPOUND is implicit for hugetlb pages and
ignored.

Link: https://lkml.kernel.org/r/20220428083441.37290-8-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/rmap: remove do_page_add_anon_rmap()</title>
<updated>2022-05-10T01:20:43+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2022-05-10T01:20:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f1e2db12e45baaa2d366f87c885968096c2ff5aa'/>
<id>f1e2db12e45baaa2d366f87c885968096c2ff5aa</id>
<content type='text'>
... and instead convert page_add_anon_rmap() to accept flags.

Passing flags instead of bools is usually nicer either way, and we want to
more often also pass RMAP_EXCLUSIVE in follow up patches when detecting
that an anonymous page is exclusive: for example, when restoring an
anonymous page from a writable migration entry.

This is a preparation for marking an anonymous page inside
page_add_anon_rmap() as exclusive when RMAP_EXCLUSIVE is passed.

Link: https://lkml.kernel.org/r/20220428083441.37290-7-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
... and instead convert page_add_anon_rmap() to accept flags.

Passing flags instead of bools is usually nicer either way, and we want to
more often also pass RMAP_EXCLUSIVE in follow up patches when detecting
that an anonymous page is exclusive: for example, when restoring an
anonymous page from a writable migration entry.

This is a preparation for marking an anonymous page inside
page_add_anon_rmap() as exclusive when RMAP_EXCLUSIVE is passed.

Link: https://lkml.kernel.org/r/20220428083441.37290-7-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()</title>
<updated>2022-05-10T01:20:43+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2022-05-10T01:20:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fb3d824d1a46c5bb0584ea88f32dc2495544aebf'/>
<id>fb3d824d1a46c5bb0584ea88f32dc2495544aebf</id>
<content type='text'>
...  and move the special check for pinned pages into
page_try_dup_anon_rmap() to prepare for tracking exclusive anonymous pages
via a new pageflag, clearing it only after making sure that there are no
GUP pins on the anonymous page.

We really only care about pins on anonymous pages, because they are prone
to getting replaced in the COW handler once mapped R/O.  For !anon pages
in cow-mappings (!VM_SHARED &amp;&amp; VM_MAYWRITE) we shouldn't really care about
that, at least not that I could come up with an example.

Let's drop the is_cow_mapping() check from page_needs_cow_for_dma(), as we
know we're dealing with anonymous pages.  Also, drop the handling of
pinned pages from copy_huge_pud() and add a comment if ever supporting
anonymous pages on the PUD level.

This is a preparation for tracking exclusivity of anonymous pages in the
rmap code, and disallowing marking a page shared (-&gt; failing to duplicate)
if there are GUP pins on a page.

Link: https://lkml.kernel.org/r/20220428083441.37290-5-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
...  and move the special check for pinned pages into
page_try_dup_anon_rmap() to prepare for tracking exclusive anonymous pages
via a new pageflag, clearing it only after making sure that there are no
GUP pins on the anonymous page.

We really only care about pins on anonymous pages, because they are prone
to getting replaced in the COW handler once mapped R/O.  For !anon pages
in cow-mappings (!VM_SHARED &amp;&amp; VM_MAYWRITE) we shouldn't really care about
that, at least not that I could come up with an example.

Let's drop the is_cow_mapping() check from page_needs_cow_for_dma(), as we
know we're dealing with anonymous pages.  Also, drop the handling of
pinned pages from copy_huge_pud() and add a comment if ever supporting
anonymous pages on the PUD level.

This is a preparation for tracking exclusivity of anonymous pages in the
rmap code, and disallowing marking a page shared (-&gt; failing to duplicate)
if there are GUP pins on a page.

Link: https://lkml.kernel.org/r/20220428083441.37290-5-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Liang Zhang &lt;zhangliang5@huawei.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oded Gabbay &lt;oded.gabbay@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Pedro Demarchi Gomes &lt;pedrodemargomes@gmail.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Yang Shi &lt;shy828301@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: untangle config dependencies for demote-on-reclaim</title>
<updated>2022-04-29T06:16:09+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2022-04-29T06:16:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7d6e2d96384556a4f30547803be1f606eb805a62'/>
<id>7d6e2d96384556a4f30547803be1f606eb805a62</id>
<content type='text'>
At the time demote-on-reclaim was introduced, it was tied to
CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate.

The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE,
so clean this up.  Furthermore, we only register the hotplug memory
notifier when the system has CONFIG_MEMORY_HOTPLUG.

Link: https://lkml.kernel.org/r/20220322224016.4574-1-osalvador@suse.de
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Suggested-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At the time demote-on-reclaim was introduced, it was tied to
CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate.

The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE,
so clean this up.  Furthermore, we only register the hotplug memory
notifier when the system has CONFIG_MEMORY_HOTPLUG.

Link: https://lkml.kernel.org/r/20220322224016.4574-1-osalvador@suse.de
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Suggested-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
