linux-stable.git/mm/util.c, branch v5.2

prctl_set_mm: downgrade mmap_sem to read lock

2019-06-01T22:51:31+00:00

The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem.  Later commit 88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct") avoided
the coarse use of mmap_sem in similar situations.  But there still
remained two places that (mis)use mmap_sem.

get_cmdline should also use arg_lock instead of mmap_sem when it reads the
boundaries.

The second place that should use arg_lock is in prctl_set_mm.  By
protecting the boundaries fields with the arg_lock, we can downgrade
mmap_sem to reader lock (analogous to what we already do in
prctl_set_mm_map).

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20190502125203.24014-3-mkoutny@suse.com
Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")
Signed-off-by: Michal Koutný 
Signed-off-by: Laurent Dufour 
Co-developed-by: Laurent Dufour 
Reviewed-by: Cyrill Gorcunov 
Acked-by: Michal Hocko 
Cc: Yang Shi 
Cc: Mateusz Guzik 
Cc: Kirill Tkhai 
Cc: Konstantin Khlebnikov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

treewide: Add SPDX license identifier for missed files

2019-05-21T08:50:45+00:00

Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

mm: fix false-positive OVERCOMMIT_GUESS failures

2019-05-14T16:47:50+00:00

With the default overcommit==guess we occasionally run into mmap
rejections despite plenty of memory that would get dropped under
pressure but just isn't accounted reclaimable. One example of this is
dying cgroups pinned by some page cache. A previous case was auxiliary
path name memory associated with dentries; we have since annotated
those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
treat indirectly reclaimable memory as free in overcommit logic")).

But trying to classify all allocated memory reliably as reclaimable
and unreclaimable is a bit of a fool's errand. There could be a myriad
of dependencies that constantly change with kernel versions.

It becomes even more questionable of an effort when considering how
this estimate of available memory is used: it's not compared to the
system-wide allocated virtual memory in any way. It's not even
compared to the allocating process's address space. It's compared to
the single allocation request at hand!

So we have an elaborate left-hand side of the equation that tries to
assess the exact breathing room the system has available down to a
page - and then compare it to an isolated allocation request with no
additional context. We could fail an allocation of N bytes, but for
two allocations of N/2 bytes we'd do this elaborate dance twice in a
row and then still let N bytes of virtual memory through. This doesn't
make a whole lot of sense.

Let's take a step back and look at the actual goal of the
heuristic. From the documentation:

   Heuristic overcommit handling. Obvious overcommits of address
   space are refused. Used for a typical system. It ensures a
   seriously wild allocation fails while allowing overcommit to
   reduce swap usage.  root is allowed to allocate slightly more
   memory in this mode. This is the default.

If all we want to do is catch clearly bogus allocation requests
irrespective of the general virtual memory situation, the physical
memory counter-part doesn't need to be that complicated, either.

When in GUESS mode, catch wild allocations by comparing their request
size to total amount of ram and swap in the system.

Link: http://lkml.kernel.org/r/20190412191418.26333-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner 
Acked-by: Roman Gushchin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/gup: change GUP fast to use flags rather than a write 'bool'

2019-05-14T16:47:46+00:00

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny 
Reviewed-by: Mike Marshall 
Cc: Aneesh Kumar K.V 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: "David S. Miller" 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: James Hogan 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: "Kirill A. Shutemov" 
Cc: Martin Schwidefsky 
Cc: Michal Hocko 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Ralf Baechle 
Cc: Rich Felker 
Cc: Thomas Gleixner 
Cc: Yoshinori Sato 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/util.c: fix strndup_user() comment

2019-04-06T02:02:31+00:00

The kerneldoc misdescribes strndup_user()'s return value.

Cc: Dan Carpenter 
Cc: Timur Tabi 
Cc: Mihai Caraman 
Cc: Kumar Gala 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

docs/core-api/mm: fix return value descriptions in mm/

2019-03-06T05:07:20+00:00

Many kernel-doc comments in mm/ have the return value descriptions
either misformatted or omitted at all which makes kernel-doc script
unhappy:

$ make V=1 htmldocs
...
./mm/util.c:36: info: Scanning doc for kstrdup
./mm/util.c:41: warning: No description found for return value of 'kstrdup'
./mm/util.c:57: info: Scanning doc for kstrdup_const
./mm/util.c:66: warning: No description found for return value of 'kstrdup_const'
./mm/util.c:75: info: Scanning doc for kstrndup
./mm/util.c:83: warning: No description found for return value of 'kstrndup'
...

Fixing the formatting and adding the missing return value descriptions
eliminates ~100 such warnings.

Link: http://lkml.kernel.org/r/1549549644-4903-4-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport 
Reviewed-by: Andrew Morton 
Cc: Jonathan Corbet 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: don't let userspace spam allocations warnings

2019-02-21T17:01:01+00:00

memdump_user usually gets fed unchecked userspace input.  Blasting a
full backtrace into dmesg every time is a bit excessive - I'm not sure
on the kernel rule in general, but at least in drm we're trying not to
let unpriviledge userspace spam the logs freely.  Definitely not entire
warning backtraces.

It also means more filtering for our CI, because our testsuite exercises
these corner cases and so hits these a lot.

Link: http://lkml.kernel.org/r/20190220204058.11676-1-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter 
Reviewed-by: Andrew Morton 
Acked-by: Michal Hocko 
Reviewed-by: Kees Cook 
Cc: Mike Rapoport 
Cc: Roman Gushchin 
Cc: Vlastimil Babka 
Cc: Jan Stancek 
Cc: Andrey Ryabinin 
Cc: "Michael S. Tsirkin" 
Cc: Huang Ying 
Cc: Bartosz Golaszewski 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: page_mapped: don't assume compound page is huge or THP

2019-01-09T01:15:11+00:00

LTP proc01 testcase has been observed to rarely trigger crashes
on arm64:
    page_mapped+0x78/0xb4
    stable_page_flags+0x27c/0x338
    kpageflags_read+0xfc/0x164
    proc_reg_read+0x7c/0xb8
    __vfs_read+0x58/0x178
    vfs_read+0x90/0x14c
    SyS_read+0x60/0xc0

The issue is that page_mapped() assumes that if compound page is not
huge, then it must be THP.  But if this is 'normal' compound page
(COMPOUND_PAGE_DTOR), then following loop can keep running (for
HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
mapped and triggers a panic:

        for (i = 0; i < hpage_nr_pages(page); i++) {
                if (atomic_read(&page[i]._mapcount) >= 0)
                        return true;
	}

I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
with a custom kernel module [1] which:
 - allocates compound page (PAGEC) of order 1
 - allocates 2 normal pages (COPY), which are initialized to 0xff (to
   satisfy _mapcount >= 0)
 - 2 PAGEC page structs are copied to address of first COPY page
 - second page of COPY is marked as not present
 - call to page_mapped(COPY) now triggers fault on access to 2nd COPY
   page at offset 0x30 (_mapcount)

[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c

Fix the loop to iterate for "1 << compound_order" pages.

Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
pages".

Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com
Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Jan Stancek 
Debugged-by: Laszlo Ersek 
Suggested-by: "Kirill A. Shutemov" 
Acked-by: Michal Hocko 
Acked-by: Kirill A. Shutemov 
Reviewed-by: David Hildenbrand 
Reviewed-by: Andrea Arcangeli 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: convert totalram_pages and totalhigh_pages variables to atomic

2018-12-28T20:11:47+00:00

totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS 
Suggested-by: Michal Hocko 
Suggested-by: Vlastimil Babka 
Reviewed-by: Konstantin Khlebnikov 
Reviewed-by: Pavel Tatashin 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Cc: David Hildenbrand 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kvfree(): fix misleading comment

2018-10-26T23:26:33+00:00

vfree() might sleep if called not in interrupt context.  So does kvfree()
too.  Fix misleading kvfree()'s comment about allowed context.

Link: http://lkml.kernel.org/r/20180914130512.10394-1-aryabinin@virtuozzo.com
Fixes: 04b8e946075d ("mm/util.c: improve kvfree() kerneldoc")
Signed-off-by: Andrey Ryabinin 
Reviewed-by: Andrew Morton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds