<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/memory.c, branch v5.5.7</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm/memory.c: add apply_to_existing_page_range() helper</title>
<updated>2019-12-18T04:59:59+00:00</updated>
<author>
<name>Daniel Axtens</name>
<email>dja@axtens.net</email>
</author>
<published>2019-12-18T04:51:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=be1db4753ee6a0db80a900df9dbbf6ad2acc4bd1'/>
<id>be1db4753ee6a0db80a900df9dbbf6ad2acc4bd1</id>
<content type='text'>
apply_to_page_range() takes an address range, and if any parts of it are
not covered by the existing page table hierarchy, it allocates memory to
fill them in.

In some use cases, this is not what we want - we want to be able to
operate exclusively on PTEs that are already in the tables.

Add apply_to_existing_page_range() for this.  Adjust the walker
functions for apply_to_page_range to take 'create', which switches them
between the old and new modes.

This will be used in KASAN vmalloc.

[akpm@linux-foundation.org: reduce code duplication]
[akpm@linux-foundation.org: s/apply_to_existing_pages/apply_to_existing_page_range/]
[akpm@linux-foundation.org: initialize __apply_to_page_range::err]
Link: http://lkml.kernel.org/r/20191205140407.1874-1-dja@axtens.net
Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Daniel Axtens &lt;dja@axtens.net&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
apply_to_page_range() takes an address range, and if any parts of it are
not covered by the existing page table hierarchy, it allocates memory to
fill them in.

In some use cases, this is not what we want - we want to be able to
operate exclusively on PTEs that are already in the tables.

Add apply_to_existing_page_range() for this.  Adjust the walker
functions for apply_to_page_range to take 'create', which switches them
between the old and new modes.

This will be used in KASAN vmalloc.

[akpm@linux-foundation.org: reduce code duplication]
[akpm@linux-foundation.org: s/apply_to_existing_pages/apply_to_existing_page_range/]
[akpm@linux-foundation.org: initialize __apply_to_page_range::err]
Link: http://lkml.kernel.org/r/20191205140407.1874-1-dja@axtens.net
Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Daniel Axtens &lt;dja@axtens.net&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove __ARCH_HAS_4LEVEL_HACK and include/asm-generic/4level-fixup.h</title>
<updated>2019-12-05T03:44:15+00:00</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2019-12-05T00:54:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f949286c668aed5aa24acdb5838be9cfd9513bd3'/>
<id>f949286c668aed5aa24acdb5838be9cfd9513bd3</id>
<content type='text'>
There are no architectures that use include/asm-generic/4level-fixup.h
therefore it can be removed along with __ARCH_HAS_4LEVEL_HACK define.

Link: http://lkml.kernel.org/r/1572938135-31886-14-git-send-email-rppt@kernel.org
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Anatoly Pugachev &lt;matorola@gmail.com&gt;
Cc: Anton Ivanov &lt;anton.ivanov@cambridgegreys.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: "James E.J. Bottomley" &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Peter Rosin &lt;peda@axentia.se&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rolf Eike Beer &lt;eike-kernel@sf-tec.de&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
Cc: Sam Creasey &lt;sammy@sammy.net&gt;
Cc: Vincent Chen &lt;deanbo422@gmail.com&gt;
Cc: Vineet Gupta &lt;Vineet.Gupta1@synopsys.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are no architectures that use include/asm-generic/4level-fixup.h
therefore it can be removed along with __ARCH_HAS_4LEVEL_HACK define.

Link: http://lkml.kernel.org/r/1572938135-31886-14-git-send-email-rppt@kernel.org
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Anatoly Pugachev &lt;matorola@gmail.com&gt;
Cc: Anton Ivanov &lt;anton.ivanov@cambridgegreys.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Greentime Hu &lt;green.hu@gmail.com&gt;
Cc: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: "James E.J. Bottomley" &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Matt Turner &lt;mattst88@gmail.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Peter Rosin &lt;peda@axentia.se&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Rolf Eike Beer &lt;eike-kernel@sf-tec.de&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Russell King &lt;rmk+kernel@armlinux.org.uk&gt;
Cc: Sam Creasey &lt;sammy@sammy.net&gt;
Cc: Vincent Chen &lt;deanbo422@gmail.com&gt;
Cc: Vineet Gupta &lt;Vineet.Gupta1@synopsys.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory.c: replace is_zero_pfn with is_huge_zero_pmd for thp</title>
<updated>2019-12-05T03:44:11+00:00</updated>
<author>
<name>Yu Zhao</name>
<email>yuzhao@google.com</email>
</author>
<published>2019-12-05T00:49:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3cde287bb4769fe9dfc9c532ddc88d90e81bc4c5'/>
<id>3cde287bb4769fe9dfc9c532ddc88d90e81bc4c5</id>
<content type='text'>
For hugely mapped thp, we use is_huge_zero_pmd() to check if it's zero
page or not.

We do fill ptes with my_zero_pfn() when we split zero thp pmd, but this
is not what we have in vm_normal_page_pmd() -- pmd_trans_huge_lock()
makes sure of it.

This is a trivial fix for /proc/pid/numa_maps, and AFAIK nobody
complains about it.

Gerald Schaefer asked:
: Maybe the description could also mention the symptom of this bug?
: I would assume that it affects anon/dirty accounting in gather_pte_stats(),
: for huge mappings, if zero page mappings are not correctly recognized.

I came across this while I was looking at the code, so I'm not aware of
any symptom.

Link: http://lkml.kernel.org/r/20191108192629.201556-1-yuzhao@google.com
Signed-off-by: Yu Zhao &lt;yuzhao@google.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: "Aneesh Kumar K . V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Dave Airlie &lt;airlied@redhat.com&gt;
Cc: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Cc: Souptick Joarder &lt;jrdr.linux@gmail.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For hugely mapped thp, we use is_huge_zero_pmd() to check if it's zero
page or not.

We do fill ptes with my_zero_pfn() when we split zero thp pmd, but this
is not what we have in vm_normal_page_pmd() -- pmd_trans_huge_lock()
makes sure of it.

This is a trivial fix for /proc/pid/numa_maps, and AFAIK nobody
complains about it.

Gerald Schaefer asked:
: Maybe the description could also mention the symptom of this bug?
: I would assume that it affects anon/dirty accounting in gather_pte_stats(),
: for huge mappings, if zero page mappings are not correctly recognized.

I came across this while I was looking at the code, so I'm not aware of
any symptom.

Link: http://lkml.kernel.org/r/20191108192629.201556-1-yuzhao@google.com
Signed-off-by: Yu Zhao &lt;yuzhao@google.com&gt;
Acked-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: "Aneesh Kumar K . V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Dave Airlie &lt;airlied@redhat.com&gt;
Cc: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Cc: Souptick Joarder &lt;jrdr.linux@gmail.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix typos in comments when calling __SetPageUptodate()</title>
<updated>2019-12-01T20:59:10+00:00</updated>
<author>
<name>Wei Yang</name>
<email>richardw.yang@linux.intel.com</email>
</author>
<published>2019-12-01T01:58:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f4f5329d453704e2214011ecf00db73cd3196d06'/>
<id>f4f5329d453704e2214011ecf00db73cd3196d06</id>
<content type='text'>
There are several places emphasise the effect of __SetPageUptodate(),
while the comment seems to have a typo in two places.

Link: http://lkml.kernel.org/r/20190926023705.7226-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are several places emphasise the effect of __SetPageUptodate(),
while the comment seems to have a typo in two places.

Link: http://lkml.kernel.org/r/20190926023705.7226-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory.c: fix a huge pud insertion race during faulting</title>
<updated>2019-12-01T14:29:19+00:00</updated>
<author>
<name>Thomas Hellstrom</name>
<email>thellstrom@vmware.com</email>
</author>
<published>2019-12-01T01:51:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=625110b5e9dae9074d8a7e67dd07f821a053eed7'/>
<id>625110b5e9dae9074d8a7e67dd07f821a053eed7</id>
<content type='text'>
A huge pud page can theoretically be faulted in racing with pmd_alloc()
in __handle_mm_fault().  That will lead to pmd_alloc() returning an
invalid pmd pointer.

Fix this by adding a pud_trans_unstable() function similar to
pmd_trans_unstable() and check whether the pud is really stable before
using the pmd pointer.

Race:
  Thread 1:             Thread 2:                 Comment
  create_huge_pud()                               Fallback - not taken.
                        create_huge_pud()         Taken.
  pmd_alloc()                                     Returns an invalid pointer.

This will result in user-visible huge page data corruption.

Note that this was caught during a code audit rather than a real
experienced problem.  It looks to me like the only implementation that
currently creates huge pud pagetable entries is dev_dax_huge_fault()
which doesn't appear to care much about private (COW) mappings or
write-tracking which is, I believe, a prerequisite for create_huge_pud()
falling back on thread 1, but not in thread 2.

Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A huge pud page can theoretically be faulted in racing with pmd_alloc()
in __handle_mm_fault().  That will lead to pmd_alloc() returning an
invalid pmd pointer.

Fix this by adding a pud_trans_unstable() function similar to
pmd_trans_unstable() and check whether the pud is really stable before
using the pmd pointer.

Race:
  Thread 1:             Thread 2:                 Comment
  create_huge_pud()                               Fallback - not taken.
                        create_huge_pud()         Taken.
  pmd_alloc()                                     Returns an invalid pointer.

This will result in user-visible huge page data corruption.

Note that this was caught during a code audit rather than a real
experienced problem.  It looks to me like the only implementation that
currently creates huge pud pagetable entries is dev_dax_huge_fault()
which doesn't appear to care much about private (COW) mappings or
write-tracking which is, I believe, a prerequisite for create_huge_pud()
falling back on thread 1, but not in thread 2.

Link: http://lkml.kernel.org/r/20191115115808.21181-2-thomas_os@shipmail.org
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rss_stat: add support to detect RSS updates of external mm</title>
<updated>2019-12-01T14:29:18+00:00</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2019-12-01T01:50:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e4dcad204d3a281be6f8573e0a82648a4ad84e69'/>
<id>e4dcad204d3a281be6f8573e0a82648a4ad84e69</id>
<content type='text'>
When a process updates the RSS of a different process, the rss_stat
tracepoint appears in the context of the process doing the update.  This
can confuse userspace that the RSS of process doing the update is
updated, while in reality a different process's RSS was updated.

This issue happens in reclaim paths such as with direct reclaim or
background reclaim.

This patch adds more information to the tracepoint about whether the mm
being updated belongs to the current process's context (curr field).  We
also include a hash of the mm pointer so that the process who the mm
belongs to can be uniquely identified (mm_id field).

Also vsprintf.c is refactored a bit to allow reuse of hashing code.

[akpm@linux-foundation.org: remove unused local `str']
[joelaf@google.com: inline call to ptr_to_hashval]
  Link: http://lore.kernel.org/r/20191113153816.14b95acd@gandalf.local.home
  Link: http://lkml.kernel.org/r/20191114164622.GC233237@google.com
Link: http://lkml.kernel.org/r/20191106024452.81923-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Reported-by: Ioannis Ilkos &lt;ilkos@google.com&gt;
Acked-by: Petr Mladek &lt;pmladek@suse.com&gt;	[lib/vsprintf.c]
Cc: Tim Murray &lt;timmurray@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Carmen Jackson &lt;carmenjackson@google.com&gt;
Cc: Mayank Gupta &lt;mayankgupta@google.com&gt;
Cc: Daniel Colascione &lt;dancol@google.com&gt;
Cc: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a process updates the RSS of a different process, the rss_stat
tracepoint appears in the context of the process doing the update.  This
can confuse userspace that the RSS of process doing the update is
updated, while in reality a different process's RSS was updated.

This issue happens in reclaim paths such as with direct reclaim or
background reclaim.

This patch adds more information to the tracepoint about whether the mm
being updated belongs to the current process's context (curr field).  We
also include a hash of the mm pointer so that the process who the mm
belongs to can be uniquely identified (mm_id field).

Also vsprintf.c is refactored a bit to allow reuse of hashing code.

[akpm@linux-foundation.org: remove unused local `str']
[joelaf@google.com: inline call to ptr_to_hashval]
  Link: http://lore.kernel.org/r/20191113153816.14b95acd@gandalf.local.home
  Link: http://lkml.kernel.org/r/20191114164622.GC233237@google.com
Link: http://lkml.kernel.org/r/20191106024452.81923-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Reported-by: Ioannis Ilkos &lt;ilkos@google.com&gt;
Acked-by: Petr Mladek &lt;pmladek@suse.com&gt;	[lib/vsprintf.c]
Cc: Tim Murray &lt;timmurray@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Carmen Jackson &lt;carmenjackson@google.com&gt;
Cc: Mayank Gupta &lt;mayankgupta@google.com&gt;
Cc: Daniel Colascione &lt;dancol@google.com&gt;
Cc: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: emit tracepoint when RSS changes</title>
<updated>2019-12-01T14:29:18+00:00</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2019-12-01T01:50:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b3d1411b6726ea6930222f8f12587d89762477c6'/>
<id>b3d1411b6726ea6930222f8f12587d89762477c6</id>
<content type='text'>
Useful to track how RSS is changing per TGID to detect spikes in RSS and
memory hogs.  Several Android teams have been using this patch in
various kernel trees for half a year now.  Many reported to me it is
really useful so I'm posting it upstream.

Initial patch developed by Tim Murray.  Changes I made from original
patch: o Prevent any additional space consumed by mm_struct.

Regarding the fact that the RSS may change too often thus flooding the
traces - note that, there is some "hysterisis" with this already.  That
is - We update the counter only if we receive 64 page faults due to
SPLIT_RSS_ACCOUNTING.  However, during zapping or copying of pte range,
the RSS is updated immediately which can become noisy/flooding.  In a
previous discussion, we agreed that BPF or ftrace can be used to rate
limit the signal if this becomes an issue.

Also note that I added wrappers to trace_rss_stat to prevent compiler
errors where linux/mm.h is included from tracing code, causing errors
such as:

    CC      kernel/trace/power-traces.o
  In file included from ./include/trace/define_trace.h:102,
                   from ./include/trace/events/kmem.h:342,
                   from ./include/linux/mm.h:31,
                   from ./include/linux/ring_buffer.h:5,
                   from ./include/linux/trace_events.h:6,
                   from ./include/trace/events/power.h:12,
                   from kernel/trace/power-traces.c:15:
  ./include/trace/trace_events.h:113:22: error: field `ent' has incomplete type
     struct trace_entry ent;    \

Link: http://lore.kernel.org/r/20190903200905.198642-1-joel@joelfernandes.org
Link: http://lkml.kernel.org/r/20191001172817.234886-1-joel@joelfernandes.org
Co-developed-by: Tim Murray &lt;timmurray@google.com&gt;
Signed-off-by: Tim Murray &lt;timmurray@google.com&gt;
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Carmen Jackson &lt;carmenjackson@google.com&gt;
Cc: Mayank Gupta &lt;mayankgupta@google.com&gt;
Cc: Daniel Colascione &lt;dancol@google.com&gt;
Cc: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Useful to track how RSS is changing per TGID to detect spikes in RSS and
memory hogs.  Several Android teams have been using this patch in
various kernel trees for half a year now.  Many reported to me it is
really useful so I'm posting it upstream.

Initial patch developed by Tim Murray.  Changes I made from original
patch: o Prevent any additional space consumed by mm_struct.

Regarding the fact that the RSS may change too often thus flooding the
traces - note that, there is some "hysterisis" with this already.  That
is - We update the counter only if we receive 64 page faults due to
SPLIT_RSS_ACCOUNTING.  However, during zapping or copying of pte range,
the RSS is updated immediately which can become noisy/flooding.  In a
previous discussion, we agreed that BPF or ftrace can be used to rate
limit the signal if this becomes an issue.

Also note that I added wrappers to trace_rss_stat to prevent compiler
errors where linux/mm.h is included from tracing code, causing errors
such as:

    CC      kernel/trace/power-traces.o
  In file included from ./include/trace/define_trace.h:102,
                   from ./include/trace/events/kmem.h:342,
                   from ./include/linux/mm.h:31,
                   from ./include/linux/ring_buffer.h:5,
                   from ./include/linux/trace_events.h:6,
                   from ./include/trace/events/power.h:12,
                   from kernel/trace/power-traces.c:15:
  ./include/trace/trace_events.h:113:22: error: field `ent' has incomplete type
     struct trace_entry ent;    \

Link: http://lore.kernel.org/r/20190903200905.198642-1-joel@joelfernandes.org
Link: http://lkml.kernel.org/r/20191001172817.234886-1-joel@joelfernandes.org
Co-developed-by: Tim Murray &lt;timmurray@google.com&gt;
Signed-off-by: Tim Murray &lt;timmurray@google.com&gt;
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Carmen Jackson &lt;carmenjackson@google.com&gt;
Cc: Mayank Gupta &lt;mayankgupta@google.com&gt;
Cc: Daniel Colascione &lt;dancol@google.com&gt;
Cc: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ralph Campbell &lt;rcampbell@nvidia.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: drop mmap_sem before calling balance_dirty_pages() in write fault</title>
<updated>2019-12-01T14:29:18+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2019-12-01T01:50:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=89b15332af7c0312a41e50846819ca6613b58b4c'/>
<id>89b15332af7c0312a41e50846819ca6613b58b4c</id>
<content type='text'>
One of our services is observing hanging ps/top/etc under heavy write
IO, and the task states show this is an mmap_sem priority inversion:

A write fault is holding the mmap_sem in read-mode and waiting for
(heavily cgroup-limited) IO in balance_dirty_pages():

    balance_dirty_pages+0x724/0x905
    balance_dirty_pages_ratelimited+0x254/0x390
    fault_dirty_shared_page.isra.96+0x4a/0x90
    do_wp_page+0x33e/0x400
    __handle_mm_fault+0x6f0/0xfa0
    handle_mm_fault+0xe4/0x200
    __do_page_fault+0x22b/0x4a0
    page_fault+0x45/0x50

Somebody tries to change the address space, contending for the mmap_sem in
write-mode:

    call_rwsem_down_write_failed_killable+0x13/0x20
    do_mprotect_pkey+0xa8/0x330
    SyS_mprotect+0xf/0x20
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The waiting writer locks out all subsequent readers to avoid lock
starvation, and several threads can be seen hanging like this:

    call_rwsem_down_read_failed+0x14/0x30
    proc_pid_cmdline_read+0xa0/0x480
    __vfs_read+0x23/0x140
    vfs_read+0x87/0x130
    SyS_read+0x42/0x90
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

To fix this, do what we do for cache read faults already: drop the
mmap_sem before calling into anything IO bound, in this case the
balance_dirty_pages() function, and return VM_FAULT_RETRY.

Link: http://lkml.kernel.org/r/20190924194238.GA29030@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
One of our services is observing hanging ps/top/etc under heavy write
IO, and the task states show this is an mmap_sem priority inversion:

A write fault is holding the mmap_sem in read-mode and waiting for
(heavily cgroup-limited) IO in balance_dirty_pages():

    balance_dirty_pages+0x724/0x905
    balance_dirty_pages_ratelimited+0x254/0x390
    fault_dirty_shared_page.isra.96+0x4a/0x90
    do_wp_page+0x33e/0x400
    __handle_mm_fault+0x6f0/0xfa0
    handle_mm_fault+0xe4/0x200
    __do_page_fault+0x22b/0x4a0
    page_fault+0x45/0x50

Somebody tries to change the address space, contending for the mmap_sem in
write-mode:

    call_rwsem_down_write_failed_killable+0x13/0x20
    do_mprotect_pkey+0xa8/0x330
    SyS_mprotect+0xf/0x20
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The waiting writer locks out all subsequent readers to avoid lock
starvation, and several threads can be seen hanging like this:

    call_rwsem_down_read_failed+0x14/0x30
    proc_pid_cmdline_read+0xa0/0x480
    __vfs_read+0x23/0x140
    vfs_read+0x87/0x130
    SyS_read+0x42/0x90
    do_syscall_64+0x5b/0x100
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

To fix this, do what we do for cache read faults already: drop the
mmap_sem before calling into anything IO bound, in this case the
balance_dirty_pages() function, and return VM_FAULT_RETRY.

Link: http://lkml.kernel.org/r/20190924194238.GA29030@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix double page fault on arm64 if PTE_AF is cleared</title>
<updated>2019-10-18T10:11:29+00:00</updated>
<author>
<name>Jia He</name>
<email>justin.he@arm.com</email>
</author>
<published>2019-10-11T14:09:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=83d116c53058d505ddef051e90ab27f57015b025'/>
<id>83d116c53058d505ddef051e90ab27f57015b025</id>
<content type='text'>
When we tested pmdk unit test [1] vmmalloc_fork TEST3 on arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

To reproduce the bug, the cmd is as follows after you deployed everything:
make -C src/test/vmmalloc_fork/ TEST_TIME=60m check

Below call trace is from arm64 do_page_fault for debugging purpose:
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
               pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fixes it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case (by Kirill).

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Reported-by: Yibo Cai &lt;Yibo.Cai@arm.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we tested pmdk unit test [1] vmmalloc_fork TEST3 on arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

To reproduce the bug, the cmd is as follows after you deployed everything:
make -C src/test/vmmalloc_fork/ TEST_TIME=60m check

Below call trace is from arm64 do_page_fault for debugging purpose:
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003,
               pmd=000000023d4b3003, pte=360000298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fixes it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case (by Kirill).

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Reported-by: Yibo Cai &lt;Yibo.Cai@arm.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: do not hash address in print_bad_pte()</title>
<updated>2019-09-24T22:54:09+00:00</updated>
<author>
<name>Kefeng Wang</name>
<email>wangkefeng.wang@huawei.com</email>
</author>
<published>2019-09-23T22:35:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6aa9b8b2c6355fc3339d6819ec250eadfe79fdb5'/>
<id>6aa9b8b2c6355fc3339d6819ec250eadfe79fdb5</id>
<content type='text'>
Using %px to show the actual address in print_bad_pte()
to help us to debug issue.

Link: http://lkml.kernel.org/r/20190831011816.141002-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using %px to show the actual address in print_bad_pte()
to help us to debug issue.

Link: http://lkml.kernel.org/r/20190831011816.141002-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
