linux.git/mm/migrate.c, branch v6.1

mm: migrate: fix return value if all subpages of THPs are migrated successfully

2022-10-28T20:37:22+00:00

During THP migration, if THPs are not migrated but they are split and all
subpages are migrated successfully, migrate_pages() will still return the
number of THP pages that were not migrated.  This will confuse the callers
of migrate_pages().  For example, the longterm pinning will failed though
all pages are migrated successfully.

Thus we should return 0 to indicate that all pages are migrated in this
case

Link: https://lkml.kernel.org/r/de386aa864be9158d2f3b344091419ea7c38b2f7.1666599848.git.baolin.wang@linux.alibaba.com
Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()")
Signed-off-by: Baolin Wang 
Reviewed-by: Alistair Popple 
Reviewed-by: Yang Shi 
Cc: David Hildenbrand 
Cc: "Huang, Ying" 
Cc: Zi Yan 
Cc: 
Signed-off-by: Andrew Morton

mm/memory.c: fix race when faulting a device private page

2022-10-13T01:51:49+00:00

Patch series "Fix several device private page reference counting issues",
v2

This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages.  These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.

During normal usage it is unlikely these will cause any problems.  However
without these fixes it is possible to crash the kernel from userspace. 
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting. 
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.

This involves some minor changes to both PowerPC and AMD GPU code. 
Unfortunately I lack hardware to test either of those so any help there
would be appreciated.  The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.


This patch (of 8):

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called.  However no
reference is taken on the faulting page.  Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap.  This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid.  It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed.  Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail.  To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.

[mpe@ellerman.id.au: fix build]
  Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple 
Acked-by: Felix Kuehling 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: Ralph Campbell 
Cc: Michael Ellerman 
Cc: Lyude Paul 
Cc: Alex Deucher 
Cc: Alex Sierra 
Cc: Ben Skeggs 
Cc: Christian König 
Cc: Dan Williams 
Cc: David Hildenbrand 
Cc: "Huang, Ying" 
Cc: Matthew Wilcox 
Cc: Yang Shi 
Cc: Zi Yan 
Signed-off-by: Andrew Morton

mm: convert page_get_anon_vma() to folio_get_anon_vma()

2022-10-03T21:02:54+00:00

With all callers now passing in a folio, rename the function and convert
all callers.  Removes a couple of calls to compound_head() and a reference
to page->mapping.

Link: https://lkml.kernel.org/r/20220902194653.1739778-55-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton

migrate: convert unmap_and_move_huge_page() to use folios

2022-10-03T21:02:54+00:00

Saves several calls to compound_head() and removes a couple of uses of
page->lru.

Link: https://lkml.kernel.org/r/20220902194653.1739778-52-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton

migrate: convert __unmap_and_move() to use folios

2022-10-03T21:02:53+00:00

Removes a lot of calls to compound_head().  Also remove a VM_BUG_ON that
can never trigger as the PageAnon bit is the bottom bit of page->mapping.

Link: https://lkml.kernel.org/r/20220902194653.1739778-51-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton

mm: fix the handling Non-LRU pages returned by follow_page

2022-09-27T02:46:28+00:00

The handling Non-LRU pages returned by follow_page() jumps directly, it
doesn't call put_page() to handle the reference count, since 'FOLL_GET'
flag for follow_page() has get_page() called.  Fix the zone device page
check by handling the page reference count correctly before returning.

And as David reviewed, "device pages are never PageKsm pages".  Drop this
zone device page check for break_ksm().

Since the zone device page can't be a transparent huge page, so drop the
redundant zone device page check for split_huge_pages_pid().  (by Miaohe)

Link: https://lkml.kernel.org/r/20220823135841.934465-3-haiyue.wang@intel.com
Fixes: 3218f8712d6b ("mm: handling Non-LRU pages returned by vm_normal_pages")
Signed-off-by: Haiyue Wang 
Reviewed-by: "Huang, Ying" 
Reviewed-by: Felix Kuehling 
Reviewed-by: Alistair Popple 
Reviewed-by: Miaohe Lin 
Acked-by: David Hildenbrand 
Cc: Alex Sierra 
Cc: Gerald Schaefer 
Cc: Mike Kravetz 
Cc: Muchun Song 
Signed-off-by: Andrew Morton

mm/demotion: update node_is_toptier to work with memory tiers

2022-09-27T02:46:12+00:00

With memory tier support we can have memory only NUMA nodes in the top
tier from which we want to avoid promotion tracking NUMA faults.  Update
node_is_toptier to work with memory tiers.  All NUMA nodes are by default
top tier nodes.  With lower(slower) memory tiers added we consider all
memory tiers above a memory tier having CPU NUMA nodes as a top memory
tier

[sj@kernel.org: include missed header file, memory-tiers.h]
  Link: https://lkml.kernel.org/r/20220820190720.248704-1-sj@kernel.org
[akpm@linux-foundation.org: mm/memory.c needs linux/memory-tiers.h]
[aneesh.kumar@linux.ibm.com: make toptier_distance inclusive upper bound of toptiers]
  Link: https://lkml.kernel.org/r/20220830081457.118960-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20220818131042.113280-10-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Reviewed-by: "Huang, Ying" 
Acked-by: Wei Xu 
Cc: Alistair Popple 
Cc: Bharata B Rao 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Davidlohr Bueso 
Cc: Hesham Almatary 
Cc: Jagdish Gediya 
Cc: Johannes Weiner 
Cc: Jonathan Cameron 
Cc: Michal Hocko 
Cc: Tim Chen 
Cc: Yang Shi 
Cc: SeongJae Park 
Signed-off-by: Andrew Morton

mm/demotion: build demotion targets based on explicit memory tiers

2022-09-27T02:46:12+00:00

This patch switch the demotion target building logic to use memory tiers
instead of NUMA distance.  All N_MEMORY NUMA nodes will be placed in the
default memory tier and additional memory tiers will be added by drivers
like dax kmem.

This patch builds the demotion target for a NUMA node by looking at all
memory tiers below the tier to which the NUMA node belongs.  The closest
node in the immediately following memory tier is used as a demotion
target.

Since we are now only building demotion target for N_MEMORY NUMA nodes the
CPU hotplug calls are removed in this patch.

Link: https://lkml.kernel.org/r/20220818131042.113280-6-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Reviewed-by: "Huang, Ying" 
Acked-by: Wei Xu 
Cc: Alistair Popple 
Cc: Bharata B Rao 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Davidlohr Bueso 
Cc: Hesham Almatary 
Cc: Jagdish Gediya 
Cc: Johannes Weiner 
Cc: Jonathan Cameron 
Cc: Michal Hocko 
Cc: Tim Chen 
Cc: Yang Shi 
Cc: SeongJae Park 
Signed-off-by: Andrew Morton

mm/demotion: move memory demotion related code

2022-09-27T02:46:11+00:00

This moves memory demotion related code to mm/memory-tiers.c.  No
functional change in this patch.

Link: https://lkml.kernel.org/r/20220818131042.113280-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Reviewed-by: "Huang, Ying" 
Acked-by: Wei Xu 
Cc: Alistair Popple 
Cc: Bharata B Rao 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Davidlohr Bueso 
Cc: Hesham Almatary 
Cc: Jagdish Gediya 
Cc: Johannes Weiner 
Cc: Jonathan Cameron 
Cc: Michal Hocko 
Cc: Tim Chen 
Cc: Yang Shi 
Cc: SeongJae Park 
Signed-off-by: Andrew Morton

mm: migrate: do not retry 10 times for the subpages of fail-to-migrate THP

2022-09-27T02:46:07+00:00

If THP is failed to migrate due to -ENOSYS or -ENOMEM case, the THP will
be split, and the subpages of fail-to-migrate THP will be tried to migrate
again, so we should not account the retry counter in the second loop,
since we already accounted 'nr_thp_failed' in the first loop.

Moreover we also do not need retry 10 times for -EAGAIN case for the
subpages of fail-to-migrate THP in the second loop, since we already
regarded the THP as migration failure, and save some migration time (for
the worst case, will try 512 * 10 times) according to previous discussion
[1].

[1] https://lore.kernel.org/linux-mm/87r13a7n04.fsf@yhuang6-desk2.ccr.corp.intel.com/

Link: https://lkml.kernel.org/r/20220817081408.513338-9-ying.huang@intel.com
Tested-by: "Huang, Ying" 
Signed-off-by: Baolin Wang 
Signed-off-by: "Huang, Ying" 
Cc: Oscar Salvador 
Cc: Zi Yan 
Cc: Yang Shi 
Signed-off-by: Andrew Morton