<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/mincore.c, branch v5.8</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mmap locking API: use coccinelle to convert mmap_sem rwsem call sites</title>
<updated>2020-06-09T16:39:14+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d8ed45c5dcd455fc5848d47f86883a1b872ac0d0'/>
<id>d8ed45c5dcd455fc5848d47f86883a1b872ac0d0</id>
<content type='text'>
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&amp;mm-&gt;mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Reviewed-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&amp;mm-&gt;mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Reviewed-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: reorder includes after introduction of linux/pgtable.h</title>
<updated>2020-06-09T16:39:13+00:00</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2020-06-09T04:32:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=65fddcfca8ad14778f71a57672fd01e8112d30fa'/>
<id>65fddcfca8ad14778f71a57672fd01e8112d30fa</id>
<content type='text'>
The replacement of &lt;asm/pgrable.h&gt; with &lt;linux/pgtable.h&gt; made the include
of the latter in the middle of asm includes.  Fix this up with the aid of
the below script and manual adjustments here and there.

	import sys
	import re

	if len(sys.argv) is not 3:
	    print "USAGE: %s &lt;file&gt; &lt;header&gt;" % (sys.argv[0])
	    sys.exit(1)

	hdr_to_move="#include &lt;linux/%s&gt;" % sys.argv[2]
	moved = False
	in_hdrs = False

	with open(sys.argv[1], "r") as f:
	    lines = f.readlines()
	    for _line in lines:
		line = _line.rstrip('
')
		if line == hdr_to_move:
		    continue
		if line.startswith("#include &lt;linux/"):
		    in_hdrs = True
		elif not moved and in_hdrs:
		    moved = True
		    print hdr_to_move
		print line

Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Cain &lt;bcain@codeaurora.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Cc: Guan Xuetao &lt;gxt@pku.edu.cn&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ley Foon Tan &lt;ley.foon.tan@intel.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Nick Hu &lt;nickhu@andestech.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Stafford Horne &lt;shorne@gmail.com&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vincent Chen &lt;deanbo422@gmail.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The replacement of &lt;asm/pgrable.h&gt; with &lt;linux/pgtable.h&gt; made the include
of the latter in the middle of asm includes.  Fix this up with the aid of
the below script and manual adjustments here and there.

	import sys
	import re

	if len(sys.argv) is not 3:
	    print "USAGE: %s &lt;file&gt; &lt;header&gt;" % (sys.argv[0])
	    sys.exit(1)

	hdr_to_move="#include &lt;linux/%s&gt;" % sys.argv[2]
	moved = False
	in_hdrs = False

	with open(sys.argv[1], "r") as f:
	    lines = f.readlines()
	    for _line in lines:
		line = _line.rstrip('
')
		if line == hdr_to_move:
		    continue
		if line.startswith("#include &lt;linux/"):
		    in_hdrs = True
		elif not moved and in_hdrs:
		    moved = True
		    print hdr_to_move
		print line

Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Cain &lt;bcain@codeaurora.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Cc: Guan Xuetao &lt;gxt@pku.edu.cn&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ley Foon Tan &lt;ley.foon.tan@intel.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Nick Hu &lt;nickhu@andestech.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Stafford Horne &lt;shorne@gmail.com&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vincent Chen &lt;deanbo422@gmail.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: introduce include/linux/pgtable.h</title>
<updated>2020-06-09T16:39:13+00:00</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2020-06-09T04:32:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ca5999fde0a1761665a38e4c9a72dbcd7d190a81'/>
<id>ca5999fde0a1761665a38e4c9a72dbcd7d190a81</id>
<content type='text'>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.

Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.

Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Cain &lt;bcain@codeaurora.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Cc: Guan Xuetao &lt;gxt@pku.edu.cn&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ley Foon Tan &lt;ley.foon.tan@intel.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Nick Hu &lt;nickhu@andestech.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Stafford Horne &lt;shorne@gmail.com&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vincent Chen &lt;deanbo422@gmail.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.

Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.

Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Cain &lt;bcain@codeaurora.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chris Zankel &lt;chris@zankel.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Cc: Guan Xuetao &lt;gxt@pku.edu.cn&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ley Foon Tan &lt;ley.foon.tan@intel.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Max Filippov &lt;jcmvbkbc@gmail.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Nick Hu &lt;nickhu@andestech.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Stafford Horne &lt;shorne@gmail.com&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vincent Chen &lt;deanbo422@gmail.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: pagewalk: add 'depth' parameter to pte_hole</title>
<updated>2020-02-04T03:05:25+00:00</updated>
<author>
<name>Steven Price</name>
<email>steven.price@arm.com</email>
</author>
<published>2020-02-04T01:36:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b7a16c7ad790d0ecb44dcb08a6a75d0d0455ab5f'/>
<id>b7a16c7ad790d0ecb44dcb08a6a75d0d0455ab5f</id>
<content type='text'>
The pte_hole() callback is called at multiple levels of the page tables.
Code dumping the kernel page tables needs to know what at what depth the
missing entry is.  Add this is an extra parameter to pte_hole().  When the
depth isn't know (e.g.  processing a vma) then -1 is passed.

The depth that is reported is the actual level where the entry is missing
(ignoring any folding that is in place), i.e.  any levels where
PTRS_PER_P?D is set to 1 are ignored.

Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
natural numbers as levels 2/3/4.

Link: http://lkml.kernel.org/r/20191218162402.45610-16-steven.price@arm.com
Signed-off-by: Steven Price &lt;steven.price@arm.com&gt;
Tested-by: Zong Li &lt;zong.li@sifive.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Hogan &lt;jhogan@kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: "Liang, Kan" &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Paul Burton &lt;paul.burton@mips.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pte_hole() callback is called at multiple levels of the page tables.
Code dumping the kernel page tables needs to know what at what depth the
missing entry is.  Add this is an extra parameter to pte_hole().  When the
depth isn't know (e.g.  processing a vma) then -1 is passed.

The depth that is reported is the actual level where the entry is missing
(ignoring any folding that is in place), i.e.  any levels where
PTRS_PER_P?D is set to 1 are ignored.

Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
natural numbers as levels 2/3/4.

Link: http://lkml.kernel.org/r/20191218162402.45610-16-steven.price@arm.com
Signed-off-by: Steven Price &lt;steven.price@arm.com&gt;
Tested-by: Zong Li &lt;zong.li@sifive.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Hogan &lt;jhogan@kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: "Liang, Kan" &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Paul Burton &lt;paul.burton@mips.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: untag user pointers passed to memory syscalls</title>
<updated>2019-09-26T00:51:41+00:00</updated>
<author>
<name>Andrey Konovalov</name>
<email>andreyknvl@google.com</email>
</author>
<published>2019-09-25T23:48:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=057d3389108eda8a20c7f496f011846932680d88'/>
<id>057d3389108eda8a20c7f496f011846932680d88</id>
<content type='text'>
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

This patch allows tagged pointers to be passed to the following memory
syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect,
mremap, msync, munlock, move_pages.

The mmap and mremap syscalls do not currently accept tagged addresses.
Architectures may interpret the tag as a background colour for the
corresponding vma.

Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Reviewed-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Eric Auger &lt;eric.auger@redhat.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Jens Wiklander &lt;jens.wiklander@linaro.org&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

This patch allows tagged pointers to be passed to the following memory
syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect,
mremap, msync, munlock, move_pages.

The mmap and mremap syscalls do not currently accept tagged addresses.
Architectures may interpret the tag as a background colour for the
corresponding vma.

Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Reviewed-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Eric Auger &lt;eric.auger@redhat.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Jens Wiklander &lt;jens.wiklander@linaro.org&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pagewalk: separate function pointers from iterator data</title>
<updated>2019-09-07T07:28:04+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2019-08-28T14:19:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7b86ac3371b70c3fd8fd95501719beb1faab719f'/>
<id>7b86ac3371b70c3fd8fd95501719beb1faab719f</id>
<content type='text'>
The mm_walk structure currently mixed data and code.  Split out the
operations vectors into a new mm_walk_ops structure, and while we are
changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.

Based on patch from Linus Torvalds.

Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Reviewed-by: Steven Price &lt;steven.price@arm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The mm_walk structure currently mixed data and code.  Split out the
operations vectors into a new mm_walk_ops structure, and while we are
changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.

Based on patch from Linus Torvalds.

Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Reviewed-by: Steven Price &lt;steven.price@arm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: split out a new pagewalk.h header from mm.h</title>
<updated>2019-09-07T07:28:04+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2019-08-28T14:19:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a520110e4a15ceb385304d9cab22bb51438f6080'/>
<id>a520110e4a15ceb385304d9cab22bb51438f6080</id>
<content type='text'>
Add a new header for the two handful of users of the walk_page_range /
walk_page_vma interface instead of polluting all users of mm.h with it.

Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Reviewed-by: Steven Price &lt;steven.price@arm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new header for the two handful of users of the walk_page_range /
walk_page_vma interface instead of polluting all users of mm.h with it.

Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Reviewed-by: Steven Price &lt;steven.price@arm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/mincore.c: fix race between swapoff and mincore</title>
<updated>2019-07-12T18:05:43+00:00</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2019-07-12T03:55:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aeb309b81c6bada783c3695528a3e10748e97285'/>
<id>aeb309b81c6bada783c3695528a3e10748e97285</id>
<content type='text'>
Via commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"),
after swapoff, the address_space associated with the swap device will be
freed.  So swap_address_space() users which touch the address_space need
some kind of mechanism to prevent the address_space from being freed
during accessing.

When mincore processes an unmapped range for swapped shmem pages, it
doesn't hold the lock to prevent swap device from being swapped off.  So
the following race is possible:

CPU1					CPU2
do_mincore()				swapoff()
  walk_page_range()
    mincore_unmapped_range()
      __mincore_unmapped_range
        mincore_page
	  as = swap_address_space()
          ...				  exit_swap_address_space()
          ...				    kvfree(spaces)
	  find_get_page(as)

The address space may be accessed after being freed.

To fix the race, get_swap_device()/put_swap_device() is used to enclose
find_get_page() to check whether the swap entry is valid and prevent the
swap device from being swapoff during accessing.

Link: http://lkml.kernel.org/r/20190611020510.28251-1-ying.huang@intel.com
Fixes: 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks")
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Yang Shi &lt;yang.shi@linux.alibaba.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Via commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"),
after swapoff, the address_space associated with the swap device will be
freed.  So swap_address_space() users which touch the address_space need
some kind of mechanism to prevent the address_space from being freed
during accessing.

When mincore processes an unmapped range for swapped shmem pages, it
doesn't hold the lock to prevent swap device from being swapped off.  So
the following race is possible:

CPU1					CPU2
do_mincore()				swapoff()
  walk_page_range()
    mincore_unmapped_range()
      __mincore_unmapped_range
        mincore_page
	  as = swap_address_space()
          ...				  exit_swap_address_space()
          ...				    kvfree(spaces)
	  find_get_page(as)

The address space may be accessed after being freed.

To fix the race, get_swap_device()/put_swap_device() is used to enclose
find_get_page() to check whether the swap entry is valid and prevent the
swap device from being swapoff during accessing.

Link: http://lkml.kernel.org/r/20190611020510.28251-1-ying.huang@intel.com
Fixes: 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks")
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Yang Shi &lt;yang.shi@linux.alibaba.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/mincore.c: make mincore() more conservative</title>
<updated>2019-05-15T02:52:48+00:00</updated>
<author>
<name>Jiri Kosina</name>
<email>jkosina@suse.cz</email>
</author>
<published>2019-05-14T22:41:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=134fca9063ad4851de767d1768180e5dede9a881'/>
<id>134fca9063ad4851de767d1768180e5dede9a881</id>
<content type='text'>
The semantics of what mincore() considers to be resident is not
completely clear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache".

That's potentially a problem, as that [in]directly exposes
meta-information about pagecache / memory mapping state even about
memory not strictly belonging to the process executing the syscall,
opening possibilities for sidechannel attacks.

Change the semantics of mincore() so that it only reveals pagecache
information for non-anonymous mappings that belog to files that the
calling process could (if it tried to) successfully open for writing;
otherwise we'd be including shared non-exclusive mappings, which

 - is the sidechannel

 - is not the usecase for mincore(), as that's primarily used for data,
   not (shared) text

[jkosina@suse.cz: v2]
  Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz
[mhocko@suse.com: restructure can_do_mincore() conditions]
Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Josh Snyder &lt;joshs@netflix.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Originally-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Originally-by: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Kevin Easton &lt;kevin@guarana.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Cc: Daniel Gruss &lt;daniel@gruss.cc&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The semantics of what mincore() considers to be resident is not
completely clear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache".

That's potentially a problem, as that [in]directly exposes
meta-information about pagecache / memory mapping state even about
memory not strictly belonging to the process executing the syscall,
opening possibilities for sidechannel attacks.

Change the semantics of mincore() so that it only reveals pagecache
information for non-anonymous mappings that belog to files that the
calling process could (if it tried to) successfully open for writing;
otherwise we'd be including shared non-exclusive mappings, which

 - is the sidechannel

 - is not the usecase for mincore(), as that's primarily used for data,
   not (shared) text

[jkosina@suse.cz: v2]
  Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz
[mhocko@suse.com: restructure can_do_mincore() conditions]
Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Josh Snyder &lt;joshs@netflix.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Originally-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Originally-by: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Kevin Easton &lt;kevin@guarana.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Cc: Daniel Gruss &lt;daniel@gruss.cc&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "Change mincore() to count "mapped" pages rather than "cached" pages"</title>
<updated>2019-01-23T20:04:37+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-01-23T20:04:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=30bac164aca750892b93eef350439a0562a68647'/>
<id>30bac164aca750892b93eef350439a0562a68647</id>
<content type='text'>
This reverts commit 574823bfab82d9d8fa47f422778043fbb4b4f50e.

It turns out that my hope that we could just remove the code that
exposes the cache residency status from mincore() was too optimistic.

There are various random users that want it, and one example would be
the Netflix database cluster maintenance. To quote Josh Snyder:

 "For Netflix, losing accurate information from the mincore syscall
  would lengthen database cluster maintenance operations from days to
  months. We rely on cross-process mincore to migrate the contents of a
  page cache from machine to machine, and across reboots.

  To do this, I wrote and maintain happycache [1], a page cache
  dumper/loader tool. It is quite similar in architecture to pgfincore,
  except that it is agnostic to workload. The gist of happycache's
  operation is "produce a dump of residence status for each page, do
  some operation, then reload exactly the same pages which were present
  before." happycache is entirely dependent on accurate reporting of the
  in-core status of file-backed pages, as accessed by another process.

  We primarily use happycache with Cassandra, which (like Postgres +
  pgfincore) relies heavily on OS page cache to reduce disk accesses.
  Because our workloads never experience a cold page cache, we are able
  to provision hardware for a peak utilization level that is far lower
  than the hypothetical "every query is a cache miss" peak.

  A database warmed by happycache can be ready for service in seconds
  (bounded only by the performance of the drives and the I/O subsystem),
  with no period of in-service degradation. By contrast, putting a
  database in service without a page cache entails a potentially
  unbounded period of degradation (at Netflix, the time to populate a
  single node's cache via natural cache misses varies by workload from
  hours to weeks). If a single node upgrade were to take weeks, then
  upgrading an entire cluster would take months. Since we want to apply
  security upgrades (and other things) on a somewhat tighter schedule,
  we would have to develop more complex solutions to provide the same
  functionality already provided by mincore.

  At the bottom line, happycache is designed to benignly exploit the
  same information leak documented in the paper [2]. I think it makes
  perfect sense to remove cross-process mincore functionality from
  unprivileged users, but not to remove it entirely"

We do have an alternate approach that limits the cache residency
reporting only to processes that have write permissions to the file, so
we can fix the original information leak issue that way.  It involves
_adding_ code rather than removing it, which is sad, but hey, at least
we haven't found any users that would find the restrictions
unacceptable.

So revert the optimistic first approach to make room for that alternate
fix instead.

Reported-by: Josh Snyder &lt;joshs@netflix.com&gt;
Cc: Jiri Kosina &lt;jikos@kernel.org&gt;
Cc: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Kevin Easton &lt;kevin@guarana.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Cc: Daniel Gruss &lt;daniel@gruss.cc&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 574823bfab82d9d8fa47f422778043fbb4b4f50e.

It turns out that my hope that we could just remove the code that
exposes the cache residency status from mincore() was too optimistic.

There are various random users that want it, and one example would be
the Netflix database cluster maintenance. To quote Josh Snyder:

 "For Netflix, losing accurate information from the mincore syscall
  would lengthen database cluster maintenance operations from days to
  months. We rely on cross-process mincore to migrate the contents of a
  page cache from machine to machine, and across reboots.

  To do this, I wrote and maintain happycache [1], a page cache
  dumper/loader tool. It is quite similar in architecture to pgfincore,
  except that it is agnostic to workload. The gist of happycache's
  operation is "produce a dump of residence status for each page, do
  some operation, then reload exactly the same pages which were present
  before." happycache is entirely dependent on accurate reporting of the
  in-core status of file-backed pages, as accessed by another process.

  We primarily use happycache with Cassandra, which (like Postgres +
  pgfincore) relies heavily on OS page cache to reduce disk accesses.
  Because our workloads never experience a cold page cache, we are able
  to provision hardware for a peak utilization level that is far lower
  than the hypothetical "every query is a cache miss" peak.

  A database warmed by happycache can be ready for service in seconds
  (bounded only by the performance of the drives and the I/O subsystem),
  with no period of in-service degradation. By contrast, putting a
  database in service without a page cache entails a potentially
  unbounded period of degradation (at Netflix, the time to populate a
  single node's cache via natural cache misses varies by workload from
  hours to weeks). If a single node upgrade were to take weeks, then
  upgrading an entire cluster would take months. Since we want to apply
  security upgrades (and other things) on a somewhat tighter schedule,
  we would have to develop more complex solutions to provide the same
  functionality already provided by mincore.

  At the bottom line, happycache is designed to benignly exploit the
  same information leak documented in the paper [2]. I think it makes
  perfect sense to remove cross-process mincore functionality from
  unprivileged users, but not to remove it entirely"

We do have an alternate approach that limits the cache residency
reporting only to processes that have write permissions to the file, so
we can fix the original information leak issue that way.  It involves
_adding_ code rather than removing it, which is sad, but hey, at least
we haven't found any users that would find the restrictions
unacceptable.

So revert the optimistic first approach to make room for that alternate
fix instead.

Reported-by: Josh Snyder &lt;joshs@netflix.com&gt;
Cc: Jiri Kosina &lt;jikos@kernel.org&gt;
Cc: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Kevin Easton &lt;kevin@guarana.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Cc: Daniel Gruss &lt;daniel@gruss.cc&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
