<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/mm_types.h, branch v4.7.3</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: oom_reaper: remove some bloat</title>
<updated>2016-05-26T22:35:44+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-05-26T22:16:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7ef949d77f95f0d129f0d404b336459a34a00101'/>
<id>7ef949d77f95f0d129f0d404b336459a34a00101</id>
<content type='text'>
mmput_async is currently used only from the oom_reaper which is defined
only for CONFIG_MMU.  We can save work_struct in mm_struct for
!CONFIG_MMU.

[akpm@linux-foundation.org: fix typo, per Minchan]
Link: http://lkml.kernel.org/r/20160520061658.GB19172@dhcp22.suse.cz
Reported-by: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mmput_async is currently used only from the oom_reaper which is defined
only for CONFIG_MMU.  We can save work_struct in mm_struct for
!CONFIG_MMU.

[akpm@linux-foundation.org: fix typo, per Minchan]
Link: http://lkml.kernel.org/r/20160520061658.GB19172@dhcp22.suse.cz
Reported-by: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@i-love.sakura.ne.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, oom_reaper: do not mmput synchronously from the oom reaper context</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-05-20T23:57:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ec8d7c14ea14922fe21945b458a75e39f11dd832'/>
<id>ec8d7c14ea14922fe21945b458a75e39f11dd832</id>
<content type='text'>
Tetsuo has properly noted that mmput slow path might get blocked waiting
for another party (e.g.  exit_aio waits for an IO).  If that happens the
oom_reaper would be put out of the way and will not be able to process
next oom victim.  We should strive for making this context as reliable
and independent on other subsystems as much as possible.

Introduce mmput_async which will perform the slow path from an async
(WQ) context.  This will delay the operation but that shouldn't be a
problem because the oom_reaper has reclaimed the victim's address space
for most cases as much as possible and the remaining context shouldn't
bind too much memory anymore.  The only exception is when mmap_sem
trylock has failed which shouldn't happen too often.

The issue is only theoretical but not impossible.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Tetsuo has properly noted that mmput slow path might get blocked waiting
for another party (e.g.  exit_aio waits for an IO).  If that happens the
oom_reaper would be put out of the way and will not be able to process
next oom victim.  We should strive for making this context as reliable
and independent on other subsystems as much as possible.

Introduce mmput_async which will perform the slow path from an async
(WQ) context.  This will delay the operation but that shouldn't be a
problem because the oom_reaper has reclaimed the victim's address space
for most cases as much as possible and the remaining context shouldn't
bind too much memory anymore.  The only exception is when mmap_sem
trylock has failed which shouldn't happen too often.

The issue is only theoretical but not impossible.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: rename _count, field of the struct page, to _refcount</title>
<updated>2016-05-20T02:12:14+00:00</updated>
<author>
<name>Joonsoo Kim</name>
<email>iamjoonsoo.kim@lge.com</email>
</author>
<published>2016-05-20T00:10:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0139aa7b7fa12ceef095d99dc36606a5b10ab83a'/>
<id>0139aa7b7fa12ceef095d99dc36606a5b10ab83a</id>
<content type='text'>
Many developers already know that field for reference count of the
struct page is _count and atomic type.  They would try to handle it
directly and this could break the purpose of page reference count
tracepoint.  To prevent direct _count modification, this patch rename it
to _refcount and add warning message on the code.  After that, developer
who need to handle reference count will find that field should not be
accessed directly.

[akpm@linux-foundation.org: fix comments, per Vlastimil]
[akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
[sfr@canb.auug.org.au: sync ethernet driver changes]
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Sunil Goutham &lt;sgoutham@cavium.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Manish Chopra &lt;manish.chopra@qlogic.com&gt;
Cc: Yuval Mintz &lt;yuval.mintz@qlogic.com&gt;
Cc: Tariq Toukan &lt;tariqt@mellanox.com&gt;
Cc: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Many developers already know that field for reference count of the
struct page is _count and atomic type.  They would try to handle it
directly and this could break the purpose of page reference count
tracepoint.  To prevent direct _count modification, this patch rename it
to _refcount and add warning message on the code.  After that, developer
who need to handle reference count will find that field should not be
accessed directly.

[akpm@linux-foundation.org: fix comments, per Vlastimil]
[akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
[sfr@canb.auug.org.au: sync ethernet driver changes]
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Berg &lt;johannes@sipsolutions.net&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Sunil Goutham &lt;sgoutham@cavium.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@mellanox.com&gt;
Cc: Manish Chopra &lt;manish.chopra@qlogic.com&gt;
Cc: Yuval Mintz &lt;yuval.mintz@qlogic.com&gt;
Cc: Tariq Toukan &lt;tariqt@mellanox.com&gt;
Cc: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage</title>
<updated>2016-04-04T17:41:08+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-04-01T12:29:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ea1754a084760e68886f5b725c8eaada9cc57155'/>
<id>ea1754a084760e68886f5b725c8eaada9cc57155</id>
<content type='text'>
Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'x86/urgent' into x86/asm, to pick up fixes</title>
<updated>2016-02-18T08:28:03+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2016-02-18T08:28:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3a2f2ac9b96f9a9f5538396a212d3b9fb543bfc5'/>
<id>3a2f2ac9b96f9a9f5538396a212d3b9fb543bfc5</id>
<content type='text'>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: polish virtual memory accounting</title>
<updated>2016-02-03T16:28:43+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>koct9i@gmail.com</email>
</author>
<published>2016-02-03T00:57:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=30bdbb78009e67767983085e302bec6d97afc679'/>
<id>30bdbb78009e67767983085e302bec6d97afc679</id>
<content type='text'>
* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture
* always account VMAs with flag VM_STACK as stack (as it was before)
* cleanup classifying helpers
* update comments and documentation

Signed-off-by: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Tested-by: Sudip Mukherjee &lt;sudipm.mukherjee@gmail.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture
* always account VMAs with flag VM_STACK as stack (as it was before)
* cleanup classifying helpers
* update comments and documentation

Signed-off-by: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Tested-by: Sudip Mukherjee &lt;sudipm.mukherjee@gmail.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'v4.5-rc1' into x86/asm, to refresh the branch before merging new changes</title>
<updated>2016-01-29T08:41:18+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2016-01-29T08:41:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=76b36fa896a2db64582690e085f36adc76604134'/>
<id>76b36fa896a2db64582690e085f36adc76604134</id>
<content type='text'>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2016-01-16T00:56:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5c2c2587b13235bf8b5c9027589f22eff68bdf49'/>
<id>5c2c2587b13235bf8b5c9027589f22eff68bdf49</id>
<content type='text'>
get_dev_page() enables paths like get_user_pages() to pin a dynamically
mapped pfn-range (devm_memremap_pages()) while the resulting struct page
objects are in use.  Unlike get_page() it may fail if the device is, or
is in the process of being, disabled.  While the initial lookup of the
range may be an expensive list walk, the result is cached to speed up
subsequent lookups which are likely to be in the same mapped range.

devm_memremap_pages() now requires a reference counter to be specified
at init time.  For pmem this means moving request_queue allocation into
pmem_alloc() so the existing queue usage counter can track "device
pages".

ZONE_DEVICE pages always have an elevated count and will never be on an
lru reclaim list.  That space in 'struct page' can be redirected for
other uses, but for safety introduce a poison value that will always
trip __list_add() to assert.  This allows half of the struct list_head
storage to be reclaimed with some assurance to back up the assumption
that the page count never goes to zero and a list_add() is never
attempted.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
get_dev_page() enables paths like get_user_pages() to pin a dynamically
mapped pfn-range (devm_memremap_pages()) while the resulting struct page
objects are in use.  Unlike get_page() it may fail if the device is, or
is in the process of being, disabled.  While the initial lookup of the
range may be an expensive list walk, the result is cached to speed up
subsequent lookups which are likely to be in the same mapped range.

devm_memremap_pages() now requires a reference counter to be specified
at init time.  For pmem this means moving request_queue allocation into
pmem_alloc() so the existing queue usage counter can track "device
pages".

ZONE_DEVICE pages always have an elevated count and will never be on an
lru reclaim list.  That space in 'struct page' can be redirected for
other uses, but for safety introduce a poison value that will always
trip __list_add() to assert.  This allows half of the struct list_head
storage to be reclaimed with some assurance to back up the assumption
that the page count never goes to zero and a list_add() is never
attempted.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Cc: Matthew Wilcox &lt;willy@linux.intel.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp: introduce deferred_split_huge_page()</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-01-16T00:54:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9a982250f773cc8c76f1eee68a770b7cbf2faf78'/>
<id>9a982250f773cc8c76f1eee68a770b7cbf2faf78</id>
<content type='text'>
Currently we don't split huge page on partial unmap.  It's not an ideal
situation.  It can lead to memory overhead.

Furtunately, we can detect partial unmap on page_remove_rmap().  But we
cannot call split_huge_page() from there due to locking context.

It's also counterproductive to do directly from munmap() codepath: in
many cases we will hit this from exit(2) and splitting the huge page
just to free it up in small pages is not what we really want.

The patch introduce deferred_split_huge_page() which put the huge page
into queue for splitting.  The splitting itself will happen when we get
memory pressure via shrinker interface.  The page will be dropped from
list on freeing through compound page destructor.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Tested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently we don't split huge page on partial unmap.  It's not an ideal
situation.  It can lead to memory overhead.

Furtunately, we can detect partial unmap on page_remove_rmap().  But we
cannot call split_huge_page() from there due to locking context.

It's also counterproductive to do directly from munmap() codepath: in
many cases we will hit this from exit(2) and splitting the huge page
just to free it up in small pages is not what we really want.

The patch introduce deferred_split_huge_page() which put the huge page
into queue for splitting.  The splitting itself will happen when we get
memory pressure via shrinker interface.  The page will be dropped from
list on freeing through compound page destructor.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Tested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: rework mapcount accounting to enable 4k mapping of THPs</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-01-16T00:53:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=53f9263baba69fc1630e3c780c4d11b72643f962'/>
<id>53f9263baba69fc1630e3c780c4d11b72643f962</id>
<content type='text'>
We're going to allow mapping of individual 4k pages of THP compound.  It
means we need to track mapcount on per small page basis.

Straight-forward approach is to use -&gt;_mapcount in all subpages to track
how many time this subpage is mapped with PMDs or PTEs combined.  But
this is rather expensive: mapping or unmapping of a THP page with PMD
would require HPAGE_PMD_NR atomic operations instead of single we have
now.

The idea is to store separately how many times the page was mapped as
whole -- compound_mapcount.  This frees up -&gt;_mapcount in subpages to
track PTE mapcount.

We use the same approach as with compound page destructor and compound
order to store compound_mapcount: use space in first tail page,
-&gt;mapping this time.

Any time we map/unmap whole compound page (THP or hugetlb) -- we
increment/decrement compound_mapcount.  When we map part of compound
page with PTE we operate on -&gt;_mapcount of the subpage.

page_mapcount() counts both: PTE and PMD mappings of the page.

Basically, we have mapcount for a subpage spread over two counters.  It
makes tricky to detect when last mapcount for a page goes away.

We introduced PageDoubleMap() for this.  When we split THP PMD for the
first time and there's other PMD mapping left we offset up -&gt;_mapcount
in all subpages by one and set PG_double_map on the compound page.
These additional references go away with last compound_mapcount.

This approach provides a way to detect when last mapcount goes away on
per small page basis without introducing new overhead for most common
cases.

[akpm@linux-foundation.org: fix typo in comment]
[mhocko@suse.com: ignore partial THP when moving task]
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We're going to allow mapping of individual 4k pages of THP compound.  It
means we need to track mapcount on per small page basis.

Straight-forward approach is to use -&gt;_mapcount in all subpages to track
how many time this subpage is mapped with PMDs or PTEs combined.  But
this is rather expensive: mapping or unmapping of a THP page with PMD
would require HPAGE_PMD_NR atomic operations instead of single we have
now.

The idea is to store separately how many times the page was mapped as
whole -- compound_mapcount.  This frees up -&gt;_mapcount in subpages to
track PTE mapcount.

We use the same approach as with compound page destructor and compound
order to store compound_mapcount: use space in first tail page,
-&gt;mapping this time.

Any time we map/unmap whole compound page (THP or hugetlb) -- we
increment/decrement compound_mapcount.  When we map part of compound
page with PTE we operate on -&gt;_mapcount of the subpage.

page_mapcount() counts both: PTE and PMD mappings of the page.

Basically, we have mapcount for a subpage spread over two counters.  It
makes tricky to detect when last mapcount for a page goes away.

We introduced PageDoubleMap() for this.  When we split THP PMD for the
first time and there's other PMD mapping left we offset up -&gt;_mapcount
in all subpages by one and set PG_double_map on the compound page.
These additional references go away with last compound_mapcount.

This approach provides a way to detect when last mapcount goes away on
per small page basis without introducing new overhead for most common
cases.

[akpm@linux-foundation.org: fix typo in comment]
[mhocko@suse.com: ignore partial THP when moving task]
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
