linux.git/mm/util.c, branch v5.10

mm/util.c: update the kerneldoc for kstrdup_const()

2020-10-16T18:11:17+00:00

Memory allocated with kstrdup_const() must not be passed to regular
krealloc() as it is not aware of the possibility of the chunk residing in
.rodata.  Since there are no potential users of krealloc_const() at the
moment, let's just update the doc to make it explicit.

Signed-off-by: Bartosz Golaszewski 
Signed-off-by: Andrew Morton 
Reviewed-by: Andrew Morton 
Link: http://lkml.kernel.org/r/20200817173927.23389-1-brgl@bgdev.pl
Signed-off-by: Linus Torvalds

arm64: mte: Tags-aware aware memcmp_pages() implementation

2020-09-04T11:46:07+00:00

When the Memory Tagging Extension is enabled, two pages are identical
only if both their data and tags are identical.

Make the generic memcmp_pages() a __weak function and add an
arm64-specific implementation which returns non-zero if any of the two
pages contain valid MTE tags (PG_mte_tagged set). There isn't much
benefit in comparing the tags of two pages since these are normally used
for heap allocations and likely to differ anyway.

Co-developed-by: Vincenzo Frascino 
Signed-off-by: Vincenzo Frascino 
Signed-off-by: Catalin Marinas 
Cc: Will Deacon

mm: remove unnecessary wrapper function do_mmap_pgoff()

2020-08-07T18:33:27+00:00

The current split between do_mmap() and do_mmap_pgoff() was introduced in
commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to
do_mmap_pgoff()") to support MPX.

The wrapper function do_mmap_pgoff() always passed 0 as the value of the
vm_flags argument to do_mmap().  However, MPX support has subsequently
been removed from the kernel and there were no more direct callers of
do_mmap(); all calls were going via do_mmap_pgoff().

Simplify the code by removing do_mmap_pgoff() and changing all callers to
directly call do_mmap(), which now no longer takes a vm_flags argument.

Signed-off-by: Peter Collingbourne 
Signed-off-by: Andrew Morton 
Reviewed-by: Andrew Morton 
Reviewed-by: David Hildenbrand 
Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
Signed-off-by: Linus Torvalds

mm: adjust vm_committed_as_batch according to vm overcommit policy

2020-08-07T18:33:26+00:00

When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang 
Signed-off-by: Andrew Morton 
Acked-by: Michal Hocko 
Cc: Matthew Wilcox (Oracle) 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Qian Cai 
Cc: Kees Cook 
Cc: Andi Kleen 
Cc: Tim Chen 
Cc: Dave Hansen 
Cc: Huang Ying 
Cc: Christoph Lameter 
Cc: Dennis Zhou 
Cc: Haiyang Zhang 
Cc: kernel test robot 
Cc: "K. Y. Srinivasan" 
Cc: Tejun Heo 
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds

mm/util.c: make vm_memory_committed() more accurate

2020-08-07T18:33:26+00:00

percpu_counter_sum_positive() will provide more accurate info.

As with percpu_counter_read_positive(), in worst case the deviation could
be 'batch * nr_cpus', which is totalram_pages/256 for now, and will be
more when the batch gets enlarged.

Its time cost is about 800 nanoseconds on a 2C/4T platform and 2~3
microseconds on a 2S/36C/72T Skylake server in normal case, and in worst
case where vm_committed_as's spinlock is under severe contention, it costs
30~40 microseconds for the 2S/36C/72T Skylake sever, which should be fine
for its only two users: /proc/meminfo and HyperV balloon driver's status
trace per second.

Signed-off-by: Feng Tang 
Signed-off-by: Andrew Morton 
Acked-by: Michal Hocko  # for /proc/meminfo
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Matthew Wilcox (Oracle) 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Qian Cai 
Cc: Andi Kleen 
Cc: Tim Chen 
Cc: Dave Hansen 
Cc: Huang Ying 
Cc: Christoph Lameter 
Cc: Dennis Zhou 
Cc: Kees Cook 
Cc: kernel test robot 
Cc: Tejun Heo 
Link: http://lkml.kernel.org/r/1592725000-73486-3-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-3-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds

mmap locking API: convert mmap_sem comments

2020-06-09T16:39:14+00:00

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse 
Signed-off-by: Andrew Morton 
Reviewed-by: Vlastimil Babka 
Reviewed-by: Daniel Jordan 
Cc: Davidlohr Bueso 
Cc: David Rientjes 
Cc: Hugh Dickins 
Cc: Jason Gunthorpe 
Cc: Jerome Glisse 
Cc: John Hubbard 
Cc: Laurent Dufour 
Cc: Liam Howlett 
Cc: Matthew Wilcox 
Cc: Peter Zijlstra 
Cc: Ying Han 
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds

mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()

2020-06-09T16:39:14+00:00

Add new APIs to assert that mmap_sem is held.

Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
makes the assertions more tolerant of future changes to the lock type.

Signed-off-by: Michel Lespinasse 
Signed-off-by: Andrew Morton 
Reviewed-by: Vlastimil Babka 
Reviewed-by: Daniel Jordan 
Cc: Davidlohr Bueso 
Cc: David Rientjes 
Cc: Hugh Dickins 
Cc: Jason Gunthorpe 
Cc: Jerome Glisse 
Cc: John Hubbard 
Cc: Laurent Dufour 
Cc: Liam Howlett 
Cc: Matthew Wilcox 
Cc: Peter Zijlstra 
Cc: Ying Han 
Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
Signed-off-by: Linus Torvalds

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites

2020-06-09T16:39:14+00:00

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse 
Signed-off-by: Andrew Morton 
Reviewed-by: Daniel Jordan 
Reviewed-by: Laurent Dufour 
Reviewed-by: Vlastimil Babka 
Cc: Davidlohr Bueso 
Cc: David Rientjes 
Cc: Hugh Dickins 
Cc: Jason Gunthorpe 
Cc: Jerome Glisse 
Cc: John Hubbard 
Cc: Liam Howlett 
Cc: Matthew Wilcox 
Cc: Peter Zijlstra 
Cc: Ying Han 
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds

mm: add kvfree_sensitive() for freeing sensitive data objects

2020-06-05T02:06:22+00:00

For kvmalloc'ed data object that contains sensitive information like
cryptographic keys, we need to make sure that the buffer is always cleared
before freeing it.  Using memset() alone for buffer clearing may not
provide certainty as the compiler may compile it away.  To be sure, the
special memzero_explicit() has to be used.

This patch introduces a new kvfree_sensitive() for freeing those sensitive
data objects allocated by kvmalloc().  The relevant places where
kvfree_sensitive() can be used are modified to use it.

Fixes: 4f0882491a14 ("KEYS: Avoid false positive ENOMEM error on key read")
Suggested-by: Linus Torvalds 
Signed-off-by: Waiman Long 
Signed-off-by: Andrew Morton 
Reviewed-by: Eric Biggers 
Acked-by: David Howells 
Cc: Jarkko Sakkinen 
Cc: James Morris 
Cc: "Serge E. Hallyn" 
Cc: Joe Perches 
Cc: Matthew Wilcox 
Cc: David Rientjes 
Cc: Uladzislau Rezki 
Link: http://lkml.kernel.org/r/20200407200318.11711-1-longman@redhat.com
Signed-off-by: Linus Torvalds

mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check

2020-06-05T02:06:20+00:00

This check was added by commit 82f71ae4a2b8 ("mm: catch memory
commitment underflow") in 2014 to have a safety check for issues which
have been fixed.  And there has been few report caught by it, as
described in its commit log:

: This shouldn't happen any more - the previous two patches fixed
: the committed_as underflow issues.

But it was really found by Qian Cai when he used the LTP memory stress
suite to test a RFC patchset, which tries to improve scalability of
per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number for
loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), while
keeping current number for OVERCOMMIT_NEVER.

With that patchset, when system firstly uses a loose policy, the
'vm_committed_as' count could be a big negative value, as its big 'batch'
number allows a big deviation, then when the policy is changed to
OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value,
thus hits this WARN check.

To mitigate this, one proposed solution is to queue work on all online
CPUs to do a local sync for 'vm_committed_as' when changing policy to
OVERCOMMIT_NEVER, plus some global syncing to garante the case won't be
hit.

But this solution is costy and slow, given this check hasn't shown real
trouble or benefit, simply drop it from one hot path of MM.  And perf
stats does show some tiny saving for removing it.

Reported-by: Qian Cai 
Signed-off-by: Feng Tang 
Signed-off-by: Andrew Morton 
Reviewed-by: Qian Cai 
Acked-by: Michal Hocko 
Cc: Konstantin Khlebnikov 
Cc: Andi Kleen 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Mel Gorman 
Cc: Kees Cook 
Link: http://lkml.kernel.org/r/20200603094804.GB89848@shbuild999.sh.intel.com
Signed-off-by: Linus Torvalds