linux.git/include/linux/mm.h, branch v5.7-rc2

mm/special: create generic fallbacks for pte_special() and pte_mkspecial()

2020-04-10T22:36:21+00:00

Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL
but required to define quite similar fallback stubs for special page
table entry helpers such as pte_special() and pte_mkspecial(), as they
get build in generic MM without a config check.  This creates two
generic fallback stub definitions for these helpers, eliminating much
code duplication.

mips platform has a special case where pte_special() and pte_mkspecial()
visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires.
This restricts those symbol visibility in order to avoid redefinitions
which is now exposed through this new generic stubs and subsequent build
failure.  arm platform set_pte_at() definition needs to be moved into a
C file just to prevent a build failure.

[anshuman.khandual@arm.com: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas]
  Link: http://lkml.kernel.org/r/1583851924-21603-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual 
Signed-off-by: Andrew Morton 
Acked-by: Guo Ren 			[csky]
Acked-by: Geert Uytterhoeven 	[m68k]
Acked-by: Stafford Horne 		[openrisc]
Acked-by: Helge Deller 			[parisc]
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Russell King 
Cc: Brian Cain 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Sam Creasey 
Cc: Michal Simek 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Nick Hu 
Cc: Greentime Hu 
Cc: Vincent Chen 
Cc: Ley Foon Tan 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: "James E.J. Bottomley" 
Cc: "David S. Miller" 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Anton Ivanov 
Cc: Guan Xuetao 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: Thomas Bogendoerfer 
Link: http://lkml.kernel.org/r/1583802551-15406-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

mm/vma: introduce VM_ACCESS_FLAGS

2020-04-10T22:36:21+00:00

There are many places where all basic VMA access flags (read, write,
exec) are initialized or checked against as a group.  One such example
is during page fault.  Existing vma_is_accessible() wrapper already
creates the notion of VMA accessibility as a group access permissions.

Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
will not only reduce code duplication but also extend the VMA
accessibility concept in general.

Signed-off-by: Anshuman Khandual 
Signed-off-by: Andrew Morton 
Reviewed-by: Vlastimil Babka 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Mark Salter 
Cc: Nick Hu 
Cc: Ley Foon Tan 
Cc: Michael Ellerman 
Cc: Heiko Carstens 
Cc: Yoshinori Sato 
Cc: Guan Xuetao 
Cc: Dave Hansen 
Cc: Thomas Gleixner 
Cc: Rob Springer 
Cc: Greg Kroah-Hartman 
Cc: Geert Uytterhoeven 
Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS

2020-04-10T22:36:21+00:00

There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
existing VM_STACK_DEFAULT_FLAGS.  While here, also define some more
macros with standard VMA access flag combinations that are used
frequently across many platforms.  Apart from simplification, this
reduces code duplication as well.

Signed-off-by: Anshuman Khandual 
Signed-off-by: Andrew Morton 
Reviewed-by: Vlastimil Babka 
Acked-by: Geert Uytterhoeven 
Cc: Richard Henderson 
Cc: Vineet Gupta 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Mark Salter 
Cc: Guo Ren 
Cc: Yoshinori Sato 
Cc: Brian Cain 
Cc: Tony Luck 
Cc: Michal Simek 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Nick Hu 
Cc: Ley Foon Tan 
Cc: Jonas Bonn 
Cc: "James E.J. Bottomley" 
Cc: Michael Ellerman 
Cc: Paul Walmsley 
Cc: Heiko Carstens 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Guan Xuetao 
Cc: Thomas Gleixner 
Cc: Jeff Dike 
Cc: Chris Zankel 
Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

mm/memory.c: add vm_insert_pages()

2020-04-10T22:36:21+00:00

Add the ability to insert multiple pages at once to a user VM with lower
PTE spinlock operations.

The intention of this patch-set is to reduce atomic ops for tcp zerocopy
receives, which normally hits the same spinlock multiple times
consecutively.

[akpm@linux-foundation.org: pte_alloc() no longer takes the `addr' argument]
[arjunroy@google.com: add missing page_count() check to vm_insert_pages()]
  Link: http://lkml.kernel.org/r/20200214005929.104481-1-arjunroy.kdev@gmail.com
[arjunroy@google.com: vm_insert_pages() checks if pte_index defined]
  Link: http://lkml.kernel.org/r/20200228054714.204424-2-arjunroy.kdev@gmail.com
Signed-off-by: Arjun Roy 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Andrew Morton 
Cc: David Miller 
Cc: Matthew Wilcox 
Cc: Jason Gunthorpe 
Cc: Stephen Rothwell 
Link: http://lkml.kernel.org/r/20200128025958.43490-2-arjunroy.kdev@gmail.com
Signed-off-by: Linus Torvalds

userfaultfd: wp: apply _PAGE_UFFD_WP bit

2020-04-07T17:43:39+00:00

Firstly, introduce two new flags MM_CP_UFFD_WP[_RESOLVE] for
change_protection() when used with uffd-wp and make sure the two new flags
are exclusively used.  Then,

  - For MM_CP_UFFD_WP: apply the _PAGE_UFFD_WP bit and remove _PAGE_RW
    when a range of memory is write protected by uffd

  - For MM_CP_UFFD_WP_RESOLVE: remove the _PAGE_UFFD_WP bit and recover
    _PAGE_RW when write protection is resolved from userspace

And use this new interface in mwriteprotect_range() to replace the old
MM_CP_DIRTY_ACCT.

Do this change for both PTEs and huge PMDs.  Then we can start to identify
which PTE/PMD is write protected by general (e.g., COW or soft dirty
tracking), and which is for userfaultfd-wp.

Since we should keep the _PAGE_UFFD_WP when doing pte_modify(), add it
into _PAGE_CHG_MASK as well.  Meanwhile, since we have this new bit, we
can be even more strict when detecting uffd-wp page faults in either
do_wp_page() or wp_huge_pmd().

After we're with _PAGE_UFFD_WP, a special case is when a page is both
protected by the general COW logic and also userfault-wp.  Here the
userfault-wp will have higher priority and will be handled first.  Only
after the uffd-wp bit is cleared on the PTE/PMD will we continue to handle
the general COW.  These are the steps on what will happen with such a
page:

  1. CPU accesses write protected shared page (so both protected by
     general COW and uffd-wp), blocked by uffd-wp first because in
     do_wp_page we'll handle uffd-wp first, so it has higher priority
     than general COW.

  2. Uffd service thread receives the request, do UFFDIO_WRITEPROTECT
     to remove the uffd-wp bit upon the PTE/PMD.  However here we
     still keep the write bit cleared.  Notify the blocked CPU.

  3. The blocked CPU resumes the page fault process with a fault
     retry, during retry it'll notice it was not with the uffd-wp bit
     this time but it is still write protected by general COW, then
     it'll go though the COW path in the fault handler, copy the page,
     apply write bit where necessary, and retry again.

  4. The CPU will be able to access this page with write bit set.

Suggested-by: Andrea Arcangeli 
Signed-off-by: Peter Xu 
Signed-off-by: Andrew Morton 
Cc: Brian Geffon 
Cc: Pavel Emelyanov 
Cc: Mike Kravetz 
Cc: David Hildenbrand 
Cc: Martin Cracauer 
Cc: Mel Gorman 
Cc: Bobby Powers 
Cc: Mike Rapoport 
Cc: "Kirill A . Shutemov" 
Cc: Maya Gokhale 
Cc: Johannes Weiner 
Cc: Marty McFadden 
Cc: Denis Plotnikov 
Cc: Hugh Dickins 
Cc: "Dr . David Alan Gilbert" 
Cc: Jerome Glisse 
Cc: Rik van Riel 
Cc: Shaohua Li 
Link: http://lkml.kernel.org/r/20200220163112.11409-8-peterx@redhat.com
Signed-off-by: Linus Torvalds

mm: merge parameters for change_protection()

2020-04-07T17:43:39+00:00

change_protection() was used by either the NUMA or mprotect() code,
there's one parameter for each of the callers (dirty_accountable and
prot_numa).  Further, these parameters are passed along the calls:

  - change_protection_range()
  - change_p4d_range()
  - change_pud_range()
  - change_pmd_range()
  - ...

Now we introduce a flag for change_protect() and all these helpers to
replace these parameters.  Then we can avoid passing multiple parameters
multiple times along the way.

More importantly, it'll greatly simplify the work if we want to introduce
any new parameters to change_protection().  In the follow up patches, a
new parameter for userfaultfd write protection will be introduced.

No functional change at all.

Signed-off-by: Peter Xu 
Signed-off-by: Andrew Morton 
Reviewed-by: Jerome Glisse 
Cc: Andrea Arcangeli 
Cc: Bobby Powers 
Cc: Brian Geffon 
Cc: David Hildenbrand 
Cc: Denis Plotnikov 
Cc: "Dr . David Alan Gilbert" 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: "Kirill A . Shutemov" 
Cc: Martin Cracauer 
Cc: Marty McFadden 
Cc: Maya Gokhale 
Cc: Mel Gorman 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Pavel Emelyanov 
Cc: Rik van Riel 
Cc: Shaohua Li 
Link: http://lkml.kernel.org/r/20200220163112.11409-7-peterx@redhat.com
Signed-off-by: Linus Torvalds

mm/vma: make vma_is_accessible() available for general use

2020-04-07T17:43:37+00:00

Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use.  While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().

Signed-off-by: Anshuman Khandual 
Signed-off-by: Andrew Morton 
Acked-by: Geert Uytterhoeven 
Acked-by: Guo Ren 
Acked-by: Vlastimil Babka 
Cc: Guo Ren 
Cc: Geert Uytterhoeven 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Steven Rostedt 
Cc: Mel Gorman 
Cc: Alexander Viro 
Cc: "Aneesh Kumar K.V" 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnd Bergmann 
Cc: Nick Piggin 
Cc: Paul Mackerras 
Cc: Will Deacon 
Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds

Merge tag 'drm-next-2020-04-03-1' of git://anongit.freedesktop.org/drm/drm

2020-04-04T18:58:55+00:00

Pull drm hugepage support from Dave Airlie:
 "This adds support for hugepages to TTM and has been tested with the
  vmwgfx drivers, though I expect other drivers to start using it"

* tag 'drm-next-2020-04-03-1' of git://anongit.freedesktop.org/drm/drm:
  drm/vmwgfx: Hook up the helpers to align buffer objects
  drm/vmwgfx: Introduce a huge page aligning TTM range manager
  drm: Add a drm_get_unmapped_area() helper
  drm/vmwgfx: Support huge page faults
  drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries
  mm: Split huge pages on write-notify or COW
  mm: Introduce vma_is_special_huge
  fs: Constify vma argument to vma_is_dax

mmap: remove inline of vm_unmapped_area

2020-04-02T16:35:30+00:00

Patch series "mm: mmap: add mmap trace point", v3.

Create mmap trace file and add trace point of vm_unmapped_area().

This patch (of 2):

In preparation for next patch remove inline of vm_unmapped_area and move
code to mmap.c.  There is no logical change.

Also remove unmapped_area[_topdown] out of mm.h, there is no code
calling to them.

Signed-off-by: Jaewon Kim 
Signed-off-by: Andrew Morton 
Reviewed-by: Vlastimil Babka 
Cc: Matthew Wilcox (Oracle) 
Cc: Michel Lespinasse 
Cc: Borislav Petkov 
Link: http://lkml.kernel.org/r/20200320055823.27089-2-jaewon31.kim@samsung.com
Signed-off-by: Linus Torvalds

mm: allow VM_FAULT_RETRY for multiple times

2020-04-02T16:35:30+00:00

The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page.  However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event.  Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                             retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
                             retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                             to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths.  One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault.  It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

Suggested-by: Linus Torvalds 
Suggested-by: Andrea Arcangeli 
Signed-off-by: Peter Xu 
Signed-off-by: Andrew Morton 
Tested-by: Brian Geffon 
Cc: Bobby Powers 
Cc: David Hildenbrand 
Cc: Denis Plotnikov 
Cc: "Dr . David Alan Gilbert" 
Cc: Hugh Dickins 
Cc: Jerome Glisse 
Cc: Johannes Weiner 
Cc: "Kirill A . Shutemov" 
Cc: Martin Cracauer 
Cc: Marty McFadden 
Cc: Matthew Wilcox 
Cc: Maya Gokhale 
Cc: Mel Gorman 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Pavel Emelyanov 
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds