<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/util.c, branch v5.10</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm/util.c: update the kerneldoc for kstrdup_const()</title>
<updated>2020-10-16T18:11:17+00:00</updated>
<author>
<name>Bartosz Golaszewski</name>
<email>bgolaszewski@baylibre.com</email>
</author>
<published>2020-10-16T03:07:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=295a17302348bc38704ef0f11ca2ce4ce58db2e9'/>
<id>295a17302348bc38704ef0f11ca2ce4ce58db2e9</id>
<content type='text'>
Memory allocated with kstrdup_const() must not be passed to regular
krealloc() as it is not aware of the possibility of the chunk residing in
.rodata.  Since there are no potential users of krealloc_const() at the
moment, let's just update the doc to make it explicit.

Signed-off-by: Bartosz Golaszewski &lt;bgolaszewski@baylibre.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20200817173927.23389-1-brgl@bgdev.pl
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Memory allocated with kstrdup_const() must not be passed to regular
krealloc() as it is not aware of the possibility of the chunk residing in
.rodata.  Since there are no potential users of krealloc_const() at the
moment, let's just update the doc to make it explicit.

Signed-off-by: Bartosz Golaszewski &lt;bgolaszewski@baylibre.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20200817173927.23389-1-brgl@bgdev.pl
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: mte: Tags-aware aware memcmp_pages() implementation</title>
<updated>2020-09-04T11:46:07+00:00</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2019-11-27T09:53:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4d1a8a2dc0f4cdc2d74491a60709e698f85daf55'/>
<id>4d1a8a2dc0f4cdc2d74491a60709e698f85daf55</id>
<content type='text'>
When the Memory Tagging Extension is enabled, two pages are identical
only if both their data and tags are identical.

Make the generic memcmp_pages() a __weak function and add an
arm64-specific implementation which returns non-zero if any of the two
pages contain valid MTE tags (PG_mte_tagged set). There isn't much
benefit in comparing the tags of two pages since these are normally used
for heap allocations and likely to differ anyway.

Co-developed-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Signed-off-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the Memory Tagging Extension is enabled, two pages are identical
only if both their data and tags are identical.

Make the generic memcmp_pages() a __weak function and add an
arm64-specific implementation which returns non-zero if any of the two
pages contain valid MTE tags (PG_mte_tagged set). There isn't much
benefit in comparing the tags of two pages since these are normally used
for heap allocations and likely to differ anyway.

Co-developed-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Signed-off-by: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove unnecessary wrapper function do_mmap_pgoff()</title>
<updated>2020-08-07T18:33:27+00:00</updated>
<author>
<name>Peter Collingbourne</name>
<email>pcc@google.com</email>
</author>
<published>2020-08-07T06:23:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=45e55300f11495ed58c53427da7f0d958800a30f'/>
<id>45e55300f11495ed58c53427da7f0d958800a30f</id>
<content type='text'>
The current split between do_mmap() and do_mmap_pgoff() was introduced in
commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to
do_mmap_pgoff()") to support MPX.

The wrapper function do_mmap_pgoff() always passed 0 as the value of the
vm_flags argument to do_mmap().  However, MPX support has subsequently
been removed from the kernel and there were no more direct callers of
do_mmap(); all calls were going via do_mmap_pgoff().

Simplify the code by removing do_mmap_pgoff() and changing all callers to
directly call do_mmap(), which now no longer takes a vm_flags argument.

Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current split between do_mmap() and do_mmap_pgoff() was introduced in
commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to
do_mmap_pgoff()") to support MPX.

The wrapper function do_mmap_pgoff() always passed 0 as the value of the
vm_flags argument to do_mmap().  However, MPX support has subsequently
been removed from the kernel and there were no more direct callers of
do_mmap(); all calls were going via do_mmap_pgoff().

Simplify the code by removing do_mmap_pgoff() and changing all callers to
directly call do_mmap(), which now no longer takes a vm_flags argument.

Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: adjust vm_committed_as_batch according to vm overcommit policy</title>
<updated>2020-08-07T18:33:26+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@intel.com</email>
</author>
<published>2020-08-07T06:23:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=56f3547bfa4d361148aa748ccb86073bc57f5e6c'/>
<id>56f3547bfa4d361148aa748ccb86073bc57f5e6c</id>
<content type='text'>
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: kernel test robot &lt;rong.a.chen@intel.com&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: kernel test robot &lt;rong.a.chen@intel.com&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/util.c: make vm_memory_committed() more accurate</title>
<updated>2020-08-07T18:33:26+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@intel.com</email>
</author>
<published>2020-08-07T06:23:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e2ee51e82510813969eff9feff8e570a8a28c73'/>
<id>4e2ee51e82510813969eff9feff8e570a8a28c73</id>
<content type='text'>
percpu_counter_sum_positive() will provide more accurate info.

As with percpu_counter_read_positive(), in worst case the deviation could
be 'batch * nr_cpus', which is totalram_pages/256 for now, and will be
more when the batch gets enlarged.

Its time cost is about 800 nanoseconds on a 2C/4T platform and 2~3
microseconds on a 2S/36C/72T Skylake server in normal case, and in worst
case where vm_committed_as's spinlock is under severe contention, it costs
30~40 microseconds for the 2S/36C/72T Skylake sever, which should be fine
for its only two users: /proc/meminfo and HyperV balloon driver's status
trace per second.

Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt; # for /proc/meminfo
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: kernel test robot &lt;rong.a.chen@intel.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/1592725000-73486-3-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-3-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
percpu_counter_sum_positive() will provide more accurate info.

As with percpu_counter_read_positive(), in worst case the deviation could
be 'batch * nr_cpus', which is totalram_pages/256 for now, and will be
more when the batch gets enlarged.

Its time cost is about 800 nanoseconds on a 2C/4T platform and 2~3
microseconds on a 2S/36C/72T Skylake server in normal case, and in worst
case where vm_committed_as's spinlock is under severe contention, it costs
30~40 microseconds for the 2S/36C/72T Skylake sever, which should be fine
for its only two users: /proc/meminfo and HyperV balloon driver's status
trace per second.

Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt; # for /proc/meminfo
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: kernel test robot &lt;rong.a.chen@intel.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/1592725000-73486-3-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-3-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mmap locking API: convert mmap_sem comments</title>
<updated>2020-06-09T16:39:14+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c1e8d7c6a7a682e1405e3e242d32fc377fd196ff'/>
<id>c1e8d7c6a7a682e1405e3e242d32fc377fd196ff</id>
<content type='text'>
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()</title>
<updated>2020-06-09T16:39:14+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=42fc541404f249778e752ab39c8bc25fcb2dbe1e'/>
<id>42fc541404f249778e752ab39c8bc25fcb2dbe1e</id>
<content type='text'>
Add new APIs to assert that mmap_sem is held.

Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
makes the assertions more tolerant of future changes to the lock type.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add new APIs to assert that mmap_sem is held.

Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
makes the assertions more tolerant of future changes to the lock type.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mmap locking API: use coccinelle to convert mmap_sem rwsem call sites</title>
<updated>2020-06-09T16:39:14+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d8ed45c5dcd455fc5848d47f86883a1b872ac0d0'/>
<id>d8ed45c5dcd455fc5848d47f86883a1b872ac0d0</id>
<content type='text'>
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&amp;mm-&gt;mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Reviewed-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&amp;mm-&gt;mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Reviewed-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: add kvfree_sensitive() for freeing sensitive data objects</title>
<updated>2020-06-05T02:06:22+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2020-06-04T23:48:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d4eaa2837851db2bfed572898bfc17f9a9f9151e'/>
<id>d4eaa2837851db2bfed572898bfc17f9a9f9151e</id>
<content type='text'>
For kvmalloc'ed data object that contains sensitive information like
cryptographic keys, we need to make sure that the buffer is always cleared
before freeing it.  Using memset() alone for buffer clearing may not
provide certainty as the compiler may compile it away.  To be sure, the
special memzero_explicit() has to be used.

This patch introduces a new kvfree_sensitive() for freeing those sensitive
data objects allocated by kvmalloc().  The relevant places where
kvfree_sensitive() can be used are modified to use it.

Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jarkko Sakkinen &lt;jarkko.sakkinen@linux.intel.com&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: "Serge E. Hallyn" &lt;serge@hallyn.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Uladzislau Rezki &lt;urezki@gmail.com&gt;
Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For kvmalloc'ed data object that contains sensitive information like
cryptographic keys, we need to make sure that the buffer is always cleared
before freeing it.  Using memset() alone for buffer clearing may not
provide certainty as the compiler may compile it away.  To be sure, the
special memzero_explicit() has to be used.

This patch introduces a new kvfree_sensitive() for freeing those sensitive
data objects allocated by kvmalloc().  The relevant places where
kvfree_sensitive() can be used are modified to use it.

Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jarkko Sakkinen &lt;jarkko.sakkinen@linux.intel.com&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: "Serge E. Hallyn" &lt;serge@hallyn.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Uladzislau Rezki &lt;urezki@gmail.com&gt;
Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check</title>
<updated>2020-06-05T02:06:20+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@intel.com</email>
</author>
<published>2020-06-04T23:46:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c571686a92ffd30d9f6092ce3f697e125bf96fd5'/>
<id>c571686a92ffd30d9f6092ce3f697e125bf96fd5</id>
<content type='text'>
This check was added by commit 82f71ae4a2b8 ("mm: catch memory
commitment underflow") in 2014 to have a safety check for issues which
have been fixed.  And there has been few report caught by it, as
described in its commit log:

: This shouldn't happen any more - the previous two patches fixed
: the committed_as underflow issues.

But it was really found by Qian Cai when he used the LTP memory stress
suite to test a RFC patchset, which tries to improve scalability of
per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number for
loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), while
keeping current number for OVERCOMMIT_NEVER.

With that patchset, when system firstly uses a loose policy, the
'vm_committed_as' count could be a big negative value, as its big 'batch'
number allows a big deviation, then when the policy is changed to
OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value,
thus hits this WARN check.

To mitigate this, one proposed solution is to queue work on all online
CPUs to do a local sync for 'vm_committed_as' when changing policy to
OVERCOMMIT_NEVER, plus some global syncing to garante the case won't be
hit.

But this solution is costy and slow, given this check hasn't shown real
trouble or benefit, simply drop it from one hot path of MM.  And perf
stats does show some tiny saving for removing it.

Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Qian Cai &lt;cai@lca.pw&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Link: http://lkml.kernel.org/r/20200603094804.GB89848@shbuild999.sh.intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This check was added by commit 82f71ae4a2b8 ("mm: catch memory
commitment underflow") in 2014 to have a safety check for issues which
have been fixed.  And there has been few report caught by it, as
described in its commit log:

: This shouldn't happen any more - the previous two patches fixed
: the committed_as underflow issues.

But it was really found by Qian Cai when he used the LTP memory stress
suite to test a RFC patchset, which tries to improve scalability of
per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number for
loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), while
keeping current number for OVERCOMMIT_NEVER.

With that patchset, when system firstly uses a loose policy, the
'vm_committed_as' count could be a big negative value, as its big 'batch'
number allows a big deviation, then when the policy is changed to
OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value,
thus hits this WARN check.

To mitigate this, one proposed solution is to queue work on all online
CPUs to do a local sync for 'vm_committed_as' when changing policy to
OVERCOMMIT_NEVER, plus some global syncing to garante the case won't be
hit.

But this solution is costy and slow, given this check hasn't shown real
trouble or benefit, simply drop it from one hot path of MM.  And perf
stats does show some tiny saving for removing it.

Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Qian Cai &lt;cai@lca.pw&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Link: http://lkml.kernel.org/r/20200603094804.GB89848@shbuild999.sh.intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
