linux-stable.git/include/linux/mm_types.h, branch v4.6.3

mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage

2016-04-04T17:41:08+00:00

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov 
Acked-by: Michal Hocko 
Signed-off-by: Linus Torvalds

Merge branch 'x86/urgent' into x86/asm, to pick up fixes

2016-02-18T08:28:03+00:00

Signed-off-by: Ingo Molnar

mm: polish virtual memory accounting

2016-02-03T16:28:43+00:00

* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture
* always account VMAs with flag VM_STACK as stack (as it was before)
* cleanup classifying helpers
* update comments and documentation

Signed-off-by: Konstantin Khlebnikov 
Tested-by: Sudip Mukherjee 
Cc: Cyrill Gorcunov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge tag 'v4.5-rc1' into x86/asm, to refresh the branch before merging new changes

2016-01-29T08:41:18+00:00

Signed-off-by: Ingo Molnar

mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup

2016-01-16T01:56:32+00:00

get_dev_page() enables paths like get_user_pages() to pin a dynamically
mapped pfn-range (devm_memremap_pages()) while the resulting struct page
objects are in use.  Unlike get_page() it may fail if the device is, or
is in the process of being, disabled.  While the initial lookup of the
range may be an expensive list walk, the result is cached to speed up
subsequent lookups which are likely to be in the same mapped range.

devm_memremap_pages() now requires a reference counter to be specified
at init time.  For pmem this means moving request_queue allocation into
pmem_alloc() so the existing queue usage counter can track "device
pages".

ZONE_DEVICE pages always have an elevated count and will never be on an
lru reclaim list.  That space in 'struct page' can be redirected for
other uses, but for safety introduce a poison value that will always
trip __list_add() to assert.  This allows half of the struct list_head
storage to be reclaimed with some assurance to back up the assumption
that the page count never goes to zero and a list_add() is never
attempted.

Signed-off-by: Dan Williams 
Tested-by: Logan Gunthorpe 
Cc: Dave Hansen 
Cc: Matthew Wilcox 
Cc: Ross Zwisler 
Cc: Alexander Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

thp: introduce deferred_split_huge_page()

2016-01-16T01:56:32+00:00

Currently we don't split huge page on partial unmap.  It's not an ideal
situation.  It can lead to memory overhead.

Furtunately, we can detect partial unmap on page_remove_rmap().  But we
cannot call split_huge_page() from there due to locking context.

It's also counterproductive to do directly from munmap() codepath: in
many cases we will hit this from exit(2) and splitting the huge page
just to free it up in small pages is not what we really want.

The patch introduce deferred_split_huge_page() which put the huge page
into queue for splitting.  The splitting itself will happen when we get
memory pressure via shrinker interface.  The page will be dropped from
list on freeing through compound page destructor.

Signed-off-by: Kirill A. Shutemov 
Tested-by: Sasha Levin 
Tested-by: Aneesh Kumar K.V 
Acked-by: Vlastimil Babka 
Acked-by: Jerome Marchand 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Naoya Horiguchi 
Cc: Steve Capper 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Christoph Lameter 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: rework mapcount accounting to enable 4k mapping of THPs

2016-01-16T01:56:32+00:00

We're going to allow mapping of individual 4k pages of THP compound.  It
means we need to track mapcount on per small page basis.

Straight-forward approach is to use ->_mapcount in all subpages to track
how many time this subpage is mapped with PMDs or PTEs combined.  But
this is rather expensive: mapping or unmapping of a THP page with PMD
would require HPAGE_PMD_NR atomic operations instead of single we have
now.

The idea is to store separately how many times the page was mapped as
whole -- compound_mapcount.  This frees up ->_mapcount in subpages to
track PTE mapcount.

We use the same approach as with compound page destructor and compound
order to store compound_mapcount: use space in first tail page,
->mapping this time.

Any time we map/unmap whole compound page (THP or hugetlb) -- we
increment/decrement compound_mapcount.  When we map part of compound
page with PTE we operate on ->_mapcount of the subpage.

page_mapcount() counts both: PTE and PMD mappings of the page.

Basically, we have mapcount for a subpage spread over two counters.  It
makes tricky to detect when last mapcount for a page goes away.

We introduced PageDoubleMap() for this.  When we split THP PMD for the
first time and there's other PMD mapping left we offset up ->_mapcount
in all subpages by one and set PG_double_map on the compound page.
These additional references go away with last compound_mapcount.

This approach provides a way to detect when last mapcount goes away on
per small page basis without introducing new overhead for most common
cases.

[akpm@linux-foundation.org: fix typo in comment]
[mhocko@suse.com: ignore partial THP when moving task]
Signed-off-by: Kirill A. Shutemov 
Tested-by: Aneesh Kumar K.V 
Acked-by: Jerome Marchand 
Cc: Sasha Levin 
Cc: Aneesh Kumar K.V 
Cc: Jerome Marchand 
Cc: Vlastimil Babka 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Naoya Horiguchi 
Cc: Steve Capper 
Cc: Johannes Weiner 
Cc: Christoph Lameter 
Cc: David Rientjes 
Signed-off-by: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: drop tail page refcounting

2016-01-16T01:56:32+00:00

Tail page refcounting is utterly complicated and painful to support.

It uses ->_mapcount on tail pages to store how many times this page is
pinned.  get_page() bumps ->_mapcount on tail page in addition to
->_count on head.  This information is required by split_huge_page() to
be able to distribute pins from head of compound page to tails during
the split.

We will need ->_mapcount to account PTE mappings of subpages of the
compound page.  We eliminate need in current meaning of ->_mapcount in
tail pages by forbidding split entirely if the page is pinned.

The only user of tail page refcounting is THP which is marked BROKEN for
now.

Let's drop all this mess.  It makes get_page() and put_page() much
simpler.

Signed-off-by: Kirill A. Shutemov 
Tested-by: Sasha Levin 
Tested-by: Aneesh Kumar K.V 
Acked-by: Vlastimil Babka 
Acked-by: Jerome Marchand 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Naoya Horiguchi 
Cc: Steve Capper 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Christoph Lameter 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: rework virtual memory accounting

2016-01-15T00:00:49+00:00

When inspecting a vague code inside prctl(PR_SET_MM_MEM) call (which
testing the RLIMIT_DATA value to figure out if we're allowed to assign
new @start_brk, @brk, @start_data, @end_data from mm_struct) it's been
commited that RLIMIT_DATA in a form it's implemented now doesn't do
anything useful because most of user-space libraries use mmap() syscall
for dynamic memory allocations.

Linus suggested to convert RLIMIT_DATA rlimit into something suitable
for anonymous memory accounting.  But in this patch we go further, and
the changes are bundled together as:

 * keep vma counting if CONFIG_PROC_FS=n, will be used for limits
 * replace mm->shared_vm with better defined mm->data_vm
 * account anonymous executable areas as executable
 * account file-backed growsdown/up areas as stack
 * drop struct file* argument from vm_stat_account
 * enforce RLIMIT_DATA for size of data areas

This way code looks cleaner: now code/stack/data classification depends
only on vm_flags state:

 VM_EXEC & ~VM_WRITE            -> code  (VmExe + VmLib in proc)
 VM_GROWSUP | VM_GROWSDOWN      -> stack (VmStk)
 VM_WRITE & ~VM_SHARED & !stack -> data  (VmData)

The rest (VmSize - VmData - VmStk - VmExe - VmLib) could be called
"shared", but that might be strange beast like readonly-private or VM_IO
area.

 - RLIMIT_AS            limits whole address space "VmSize"
 - RLIMIT_STACK         limits stack "VmStk" (but each vma individually)
 - RLIMIT_DATA          now limits "VmData"

Signed-off-by: Konstantin Khlebnikov 
Signed-off-by: Cyrill Gorcunov 
Cc: Quentin Casasnovas 
Cc: Vegard Nossum 
Acked-by: Linus Torvalds 
Cc: Willy Tarreau 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Vladimir Davydov 
Cc: Pavel Emelyanov 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, shmem: add internal shmem resident memory accounting

2016-01-15T00:00:49+00:00

Currently looking at /proc//status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem pages
are mapped to /dev/zero), even though their implication in actual memory
use is quite different.

The internal accounting currently counts shmem pages together with
regular files.  As a preparation to extend the userspace interfaces,
this patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
shmem pages separately from MM_FILEPAGES.  The next patch will expose it
to userspace - this patch doesn't change the exported values yet, by
adding up MM_SHMEMPAGES to MM_FILEPAGES at places where MM_FILEPAGES was
used before.  The only user-visible change after this patch is the OOM
killer message that separates the reported "shmem-rss" from "file-rss".

[vbabka@suse.cz: forward-porting, tweak changelog]
Signed-off-by: Jerome Marchand 
Signed-off-by: Vlastimil Babka 
Acked-by: Konstantin Khlebnikov 
Acked-by: Michal Hocko 
Acked-by: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds