linux.git/mm/pgtable-generic.c, branch v6.5

mm: ptep_get() conversion

2023-06-19T23:19:25+00:00

Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper.  This means that by default, the accesses change from a
C dereference to a READ_ONCE().  This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.

But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte.  Arch code
is deliberately not converted, as the arch code knows best.  It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.

Conversion was done using Coccinelle:

----

// $ make coccicheck \
//          COCCI=ptepget.cocci \
//          SPFLAGS="--include-headers" \
//          MODE=patch

virtual patch

@ depends on patch @
pte_t *v;
@@

- *v
+ ptep_get(v)

----

Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so.  This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.

Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot.  The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get().  HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined.  Fix by continuing to do a direct dereference
when MMU=n.  This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.

Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot 
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts 
Cc: Adrian Hunter 
Cc: Alexander Potapenko 
Cc: Alexander Shishkin 
Cc: Alex Williamson 
Cc: Al Viro 
Cc: Andrey Konovalov 
Cc: Andrey Ryabinin 
Cc: Christian Brauner 
Cc: Christoph Hellwig 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Dimitri Sivanich 
Cc: Dmitry Vyukov 
Cc: Ian Rogers 
Cc: Jason Gunthorpe 
Cc: Jérôme Glisse 
Cc: Jiri Olsa 
Cc: Johannes Weiner 
Cc: Kirill A. Shutemov 
Cc: Lorenzo Stoakes 
Cc: Mark Rutland 
Cc: Matthew Wilcox 
Cc: Miaohe Lin 
Cc: Michal Hocko 
Cc: Mike Kravetz 
Cc: Mike Rapoport (IBM) 
Cc: Muchun Song 
Cc: Namhyung Kim 
Cc: Naoya Horiguchi 
Cc: Oleksandr Tyshchenko 
Cc: Pavel Tatashin 
Cc: Roman Gushchin 
Cc: SeongJae Park 
Cc: Shakeel Butt 
Cc: Uladzislau Rezki (Sony) 
Cc: Vincenzo Frascino 
Cc: Yu Zhao 
Signed-off-by: Andrew Morton

mm/pgtable: allow pte_offset_map[_lock]() to fail

2023-06-19T23:19:12+00:00

Make pte_offset_map() a wrapper for __pte_offset_map() (optionally outputs
pmdval), pte_offset_map_lock() a sparse __cond_lock wrapper for
__pte_offset_map_lock(): those __funcs added in mm/pgtable-generic.c.

__pte_offset_map() do pmdval validation (including pmd_clear_bad() when
pmd_bad()), returning NULL if pmdval is not for a page table. 
__pte_offset_map_lock() verify pmdval unchanged after getting the lock,
trying again if it changed.

No #ifdef CONFIG_TRANSPARENT_HUGEPAGE around them: that could be done to
cover the imminent case, but we expect to generalize it later, and it
makes a mess of where to do the pmd_bad() clearing.

Add pte_offset_map_nolock(): outputs ptl like pte_offset_map_lock(),
without actually taking the lock.  This will be preferred to open uses of
pte_lockptr(), because (when split ptlock is in page table's struct page)
it points to the right lock for the returned pte pointer, even if *pmd
gets changed racily afterwards.

Update corresponding Documentation.

Do not add the anticipated rcu_read_lock() and rcu_read_unlock()s yet:
they have to wait until all architectures are balancing pte_offset_map()s
with pte_unmap()s (as in the arch series posted earlier).  But comment
where they will go, so that it's easy to add them for experiments.  And
only when those are in place can transient racy failure cases be enabled. 
Add more safety for the PAE mismatched pmd_low pmd_high case at that time.

Link: https://lkml.kernel.org/r/2929bfd-9893-a374-e463-4c3127ff9b9d@google.com
Signed-off-by: Hugh Dickins 
Cc: Alistair Popple 
Cc: Anshuman Khandual 
Cc: Axel Rasmussen 
Cc: Christophe Leroy 
Cc: Christoph Hellwig 
Cc: David Hildenbrand 
Cc: "Huang, Ying" 
Cc: Ira Weiny 
Cc: Jason Gunthorpe 
Cc: Kirill A. Shutemov 
Cc: Lorenzo Stoakes 
Cc: Matthew Wilcox 
Cc: Mel Gorman 
Cc: Miaohe Lin 
Cc: Mike Kravetz 
Cc: Mike Rapoport (IBM) 
Cc: Minchan Kim 
Cc: Naoya Horiguchi 
Cc: Pavel Tatashin 
Cc: Peter Xu 
Cc: Peter Zijlstra 
Cc: Qi Zheng 
Cc: Ralph Campbell 
Cc: Ryan Roberts 
Cc: SeongJae Park 
Cc: Song Liu 
Cc: Steven Price 
Cc: Suren Baghdasaryan 
Cc: Thomas Hellström 
Cc: Will Deacon 
Cc: Yang Shi 
Cc: Yu Zhao 
Cc: Zack Rusin 
Signed-off-by: Andrew Morton

mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()

2023-03-28T23:20:12+00:00

s390 can do more fine-grained handling of spurious TLB protection faults,
when there also is the PTE pointer available.

Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an
additional parameter.

This will add no functional change to other architectures, but those with
private flush_tlb_fix_spurious_fault() implementations need to be made
aware of the new parameter.

Link: https://lkml.kernel.org/r/20230306161548.661740-1-gerald.schaefer@linux.ibm.com
Signed-off-by: Gerald Schaefer 
Reviewed-by: Alexander Gordeev 
Acked-by: Catalin Marinas 	[arm64]
Acked-by: Michael Ellerman 		[powerpc]
Acked-by: David Hildenbrand 
Cc: Anshuman Khandual 
Cc: Borislav Petkov (AMD) 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Ingo Molnar 
Cc: Matthew Wilcox (Oracle) 
Cc: Nicholas Piggin 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Signed-off-by: Andrew Morton

mm: avoid unnecessary flush on change_huge_pmd()

2022-05-13T14:20:05+00:00

Calls to change_protection_range() on THP can trigger, at least on x86,
two TLB flushes for one page: one immediately, when pmdp_invalidate() is
called by change_huge_pmd(), and then another one later (that can be
batched) when change_protection_range() finishes.

The first TLB flush is only necessary to prevent the dirty bit (and with a
lesser importance the access bit) from changing while the PTE is modified.
However, this is not necessary as the x86 CPUs set the dirty-bit
atomically with an additional check that the PTE is (still) present.  One
caveat is Intel's Knights Landing that has a bug and does not do so.

Leverage this behavior to eliminate the unnecessary TLB flush in
change_huge_pmd().  Introduce a new arch specific pmdp_invalidate_ad()
that only invalidates the access and dirty bit from further changes.

Link: https://lkml.kernel.org/r/20220401180821.1986781-4-namit@vmware.com
Signed-off-by: Nadav Amit 
Cc: Andrea Arcangeli 
Cc: Andrew Cooper 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Peter Xu 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Yu Zhao 
Cc: Nick Piggin 
Signed-off-by: Andrew Morton

mm: move tlb_flush_pending inline helpers to mm_inline.h

2022-01-15T14:30:27+00:00

linux/mm_types.h should only define structure definitions, to make it
cheap to include elsewhere.  The atomic_t helper function definitions
are particularly large, so it's better to move the helpers using those
into the existing linux/mm_inline.h and only include that where needed.

As a follow-up, we may want to go through all the indirect includes in
mm_types.h and reduce them as much as possible.

Link: https://lkml.kernel.org/r/20211207125710.2503446-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann 
Cc: Al Viro 
Cc: Stephen Rothwell 
Cc: Suren Baghdasaryan 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Peter Xu 
Cc: Peter Zijlstra (Intel) 
Cc: Yu Zhao 
Cc: Vlastimil Babka 
Cc: Matthew Wilcox (Oracle) 
Cc: Eric Biederman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/thp: fix __split_huge_pmd_locked() on shmem migration entry

2021-06-16T16:24:42+00:00

Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.

Here is v2 batch of long-standing THP bug fixes that I had not got
around to sending before, but prompted now by Wang Yugui's report
https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/

Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and
they have done no harm, but have *not* fixed that issue: something more
is needed and I have no idea of what.

This patch (of 7):

Stressing huge tmpfs page migration racing hole punch often crashed on
the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y
kernel; or shortly afterwards, on a bad dereference in
__split_huge_pmd_locked() when DEBUG_VM=n.  They forgot to allow for pmd
migration entries in the non-anonymous case.

Full disclosure: those particular experiments were on a kernel with more
relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the
vanilla kernel: it is conceivable that stricter locking happens to avoid
those cases, or makes them less likely; but __split_huge_pmd_locked()
already allowed for pmd migration entries when handling anonymous THPs,
so this commit brings the shmem and file THP handling into line.

And while there: use old_pmd rather than _pmd, as in the following
blocks; and make it clearer to the eye that the !vma_is_anonymous()
block is self-contained, making an early return after accounting for
unmapping.

Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com
Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Hugh Dickins 
Cc: Kirill A. Shutemov 
Cc: Yang Shi 
Cc: Wang Yugui 
Cc: "Matthew Wilcox (Oracle)" 
Cc: Naoya Horiguchi 
Cc: Alistair Popple 
Cc: Ralph Campbell 
Cc: Zi Yan 
Cc: Miaohe Lin 
Cc: Minchan Kim 
Cc: Jue Wang 
Cc: Peter Xu 
Cc: Jan Kara 
Cc: Shakeel Butt 
Cc: Oscar Salvador 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/pgtable-generic.c: optimize the VM_BUG_ON condition in pmdp_huge_clear_flush()

2021-02-24T21:38:30+00:00

The developer will have trouble figuring out why the BUG actually
triggered when there is a complex expression in the VM_BUG_ON.  Because we
can only identify the condition triggered BUG via line number provided by
VM_BUG_ON.  Optimize this by spliting such a complex expression into two
simple conditions.

Link: https://lkml.kernel.org/r/20210203084137.25522-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin 
Suggested-by: Andrew Morton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/pgtable-generic.c: simplify the VM_BUG_ON condition in pmdp_huge_clear_flush()

2021-02-24T21:38:30+00:00

The condition (A && !C && !D) || !A is equivalent to !A || (A && !C && !D)
and can be further simplified to !A || (!C && !D).

Link: https://lkml.kernel.org/r/20210201114319.34720-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin 
Reviewed-by: Andrew Morton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: introduce include/linux/pgtable.h

2020-06-09T16:39:13+00:00

The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.

Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.

Signed-off-by: Mike Rapoport 
Signed-off-by: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Borislav Petkov 
Cc: Brian Cain 
Cc: Catalin Marinas 
Cc: Chris Zankel 
Cc: "David S. Miller" 
Cc: Geert Uytterhoeven 
Cc: Greentime Hu 
Cc: Greg Ungerer 
Cc: Guan Xuetao 
Cc: Guo Ren 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Ingo Molnar 
Cc: Ley Foon Tan 
Cc: Mark Salter 
Cc: Matthew Wilcox 
Cc: Matt Turner 
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Michal Simek 
Cc: Nick Hu 
Cc: Paul Walmsley 
Cc: Richard Weinberger 
Cc: Rich Felker 
Cc: Russell King 
Cc: Stafford Horne 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: Vincent Chen 
Cc: Vineet Gupta 
Cc: Will Deacon 
Cc: Yoshinori Sato 
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds

mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()

2020-06-04T03:09:49+00:00

pmd_present() is expected to test positive after pmdp_mknotpresent() as
the PMD entry still points to a valid huge page in memory.
pmdp_mknotpresent() implies that given PMD entry is just invalidated from
MMU perspective while still holding on to pmd_page() referred valid huge
page thus also clearing pmd_present() test.  This creates the following
situation which is counter intuitive.

[pmd_present(pmd_mknotpresent(pmd)) = true]

This renames pmd_mknotpresent() as pmd_mkinvalid() reflecting the helper's
functionality more accurately while changing the above mentioned situation
as follows.  This does not create any functional change.

[pmd_present(pmd_mkinvalid(pmd)) = true]

This is not applicable for platforms that define own pmdp_invalidate() via
__HAVE_ARCH_PMDP_INVALIDATE.  Suggestion for renaming came during a
previous discussion here.

https://patchwork.kernel.org/patch/11019637/

[anshuman.khandual@arm.com: change pmd_mknotvalid() to pmd_mkinvalid() per Will]
  Link: http://lkml.kernel.org/r/1587520326-10099-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas 
Signed-off-by: Anshuman Khandual 
Signed-off-by: Andrew Morton 
Acked-by: Will Deacon 
Cc: Vineet Gupta 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Steven Rostedt 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Link: http://lkml.kernel.org/r/1584680057-13753-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds