<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/huge_memory.c, branch v4.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm/huge_memory.c: deposit page table when copying a PMD migration entry</title>
<updated>2017-11-03T14:39:19+00:00</updated>
<author>
<name>Zi Yan</name>
<email>zi.yan@cs.rutgers.edu</email>
</author>
<published>2017-11-02T22:59:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dd8a67f9a37c74b61e5e050924ceec9ffb4f8c3c'/>
<id>dd8a67f9a37c74b61e5e050924ceec9ffb4f8c3c</id>
<content type='text'>
We need to deposit pre-allocated PTE page table when a PMD migration
entry is copied in copy_huge_pmd().  Otherwise, we will leak the
pre-allocated page and cause a NULL pointer dereference later in
zap_huge_pmd().

The missing counters during PMD migration entry copy process are added
as well.

The bug report is here: https://lkml.org/lkml/2017/10/29/214

Link: http://lkml.kernel.org/r/20171030144636.4836-1-zi.yan@sent.com
Fixes: 84c3fc4e9c563 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need to deposit pre-allocated PTE page table when a PMD migration
entry is copied in copy_huge_pmd().  Otherwise, we will leak the
pre-allocated page and cause a NULL pointer dereference later in
zap_huge_pmd().

The missing counters during PMD migration entry copy process are added
as well.

The bug report is here: https://lkml.org/lkml/2017/10/29/214

Link: http://lkml.kernel.org/r/20171030144636.4836-1-zi.yan@sent.com
Fixes: 84c3fc4e9c563 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: soft-dirty: keep soft-dirty bits over thp migration</title>
<updated>2017-09-09T01:26:45+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2017-09-08T23:11:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ab6e3d0939bb332d72444a532f0f72e0dfde7b7b'/>
<id>ab6e3d0939bb332d72444a532f0f72e0dfde7b7b</id>
<content type='text'>
Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: thp: check pmd migration entry in common path</title>
<updated>2017-09-09T01:26:45+00:00</updated>
<author>
<name>Zi Yan</name>
<email>zi.yan@cs.rutgers.edu</email>
</author>
<published>2017-09-08T23:11:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=84c3fc4e9c563d8fb91cfdf5948da48fe1af34d3'/>
<id>84c3fc4e9c563d8fb91cfdf5948da48fe1af34d3</id>
<content type='text'>
When THP migration is being used, memory management code needs to handle
pmd migration entries properly.  This patch uses !pmd_present() or
is_swap_pmd() (depending on whether pmd_none() needs separate code or
not) to check pmd migration entries at the places where a pmd entry is
present.

Since pmd-related code uses split_huge_page(), split_huge_pmd(),
pmd_trans_huge(), pmd_trans_unstable(), or
pmd_none_or_trans_huge_or_clear_bad(), this patch:

1. adds pmd migration entry split code in split_huge_pmd(),

2. takes care of pmd migration entries whenever pmd_trans_huge() is present,

3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.

Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
them.

Until this commit, a pmd entry should be:
1. pointing to a pte page,
2. is_swap_pmd(),
3. pmd_trans_huge(),
4. pmd_devmap(), or
5. pmd_none().

Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When THP migration is being used, memory management code needs to handle
pmd migration entries properly.  This patch uses !pmd_present() or
is_swap_pmd() (depending on whether pmd_none() needs separate code or
not) to check pmd migration entries at the places where a pmd entry is
present.

Since pmd-related code uses split_huge_page(), split_huge_pmd(),
pmd_trans_huge(), pmd_trans_unstable(), or
pmd_none_or_trans_huge_or_clear_bad(), this patch:

1. adds pmd migration entry split code in split_huge_pmd(),

2. takes care of pmd migration entries whenever pmd_trans_huge() is present,

3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.

Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
them.

Until this commit, a pmd entry should be:
1. pointing to a pte page,
2. is_swap_pmd(),
3. pmd_trans_huge(),
4. pmd_devmap(), or
5. pmd_none().

Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: thp: enable thp migration in generic path</title>
<updated>2017-09-09T01:26:45+00:00</updated>
<author>
<name>Zi Yan</name>
<email>zi.yan@cs.rutgers.edu</email>
</author>
<published>2017-09-08T23:10:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=616b8371539a6c487404c3b8fb04078016dab4ba'/>
<id>616b8371539a6c487404c3b8fb04078016dab4ba</id>
<content type='text'>
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: thp: introduce separate TTU flag for thp freezing</title>
<updated>2017-09-09T01:26:45+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2017-09-08T23:10:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b5ff8161e37cef3265e186ecded23324e4dc2973'/>
<id>b5ff8161e37cef3265e186ecded23324e4dc2973</id>
<content type='text'>
TTU_MIGRATION is used to convert pte into migration entry until thp
split completes.  This behavior conflicts with thp migration added later
patches, so let's introduce a new TTU flag specifically for freezing.

try_to_unmap() is used both for thp split (via freeze_page()) and page
migration (via __unmap_and_move()).  In freeze_page(), ttu_flag given
for head page is like below (assuming anonymous thp):

    (TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
     TTU_MIGRATION | TTU_SPLIT_HUGE_PMD)

and ttu_flag given for tail pages is:

    (TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
     TTU_MIGRATION)

__unmap_and_move() calls try_to_unmap() with ttu_flag:

    (TTU_MIGRATION | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS)

Now I'm trying to insert a branch for thp migration at the top of
try_to_unmap_one() like below

static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
                       unsigned long address, void *arg)
  {
          ...
          /* PMD-mapped THP migration entry */
          if (!pvmw.pte &amp;&amp; (flags &amp; TTU_MIGRATION)) {
              if (!PageAnon(page))
                  continue;

              set_pmd_migration_entry(&amp;pvmw, page);
              continue;
          }
	  ...
  }

so try_to_unmap() for tail pages called by thp split can go into thp
migration code path (which converts *pmd* into migration entry), while
the expectation is to freeze thp (which converts *pte* into migration
entry.)

I detected this failure as a "bad page state" error in a testcase where
split_huge_page() is called from queue_pages_pte_range().

Link: http://lkml.kernel.org/r/20170717193955.20207-4-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TTU_MIGRATION is used to convert pte into migration entry until thp
split completes.  This behavior conflicts with thp migration added later
patches, so let's introduce a new TTU flag specifically for freezing.

try_to_unmap() is used both for thp split (via freeze_page()) and page
migration (via __unmap_and_move()).  In freeze_page(), ttu_flag given
for head page is like below (assuming anonymous thp):

    (TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
     TTU_MIGRATION | TTU_SPLIT_HUGE_PMD)

and ttu_flag given for tail pages is:

    (TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
     TTU_MIGRATION)

__unmap_and_move() calls try_to_unmap() with ttu_flag:

    (TTU_MIGRATION | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS)

Now I'm trying to insert a branch for thp migration at the top of
try_to_unmap_one() like below

static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
                       unsigned long address, void *arg)
  {
          ...
          /* PMD-mapped THP migration entry */
          if (!pvmw.pte &amp;&amp; (flags &amp; TTU_MIGRATION)) {
              if (!PageAnon(page))
                  continue;

              set_pmd_migration_entry(&amp;pvmw, page);
              continue;
          }
	  ...
  }

so try_to_unmap() for tail pages called by thp split can go into thp
migration code path (which converts *pmd* into migration entry), while
the expectation is to freeze thp (which converts *pte* into migration
entry.)

I detected this failure as a "bad page state" error in a testcase where
split_huge_page() is called from queue_pages_pte_range().

Link: http://lkml.kernel.org/r/20170717193955.20207-4-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Zi Yan &lt;zi.yan@cs.rutgers.edu&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: hugetlb: clear target sub-page last when clearing huge page</title>
<updated>2017-09-07T00:27:30+00:00</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2017-09-06T23:25:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c79b57e462b5d2f47afa5f175cf1828f16e18612'/>
<id>c79b57e462b5d2f47afa5f175cf1828f16e18612</id>
<content type='text'>
Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue.  For example, when
clearing huge page on x86_64 platform, the cache footprint is 2M.  But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC (last level cache).  That is, in average, there are 2.5M LLC for
each core and 1.25M LLC for each thread.

If the cache pressure is heavy when clearing the huge page, and we clear
the huge page from the begin to the end, it is possible that the begin
of huge page is evicted from the cache after we finishing clearing the
end of the huge page.  And it is possible for the application to access
the begin of the huge page after clearing the huge page.

To help the above situation, in this patch, when we clear a huge page,
the order to clear sub-pages is changed.  In quite some situation, we
can get the address that the application will access after we clear the
huge page, for example, in a page fault handler.  Instead of clearing
the huge page from begin to end, we will clear the sub-pages farthest
from the the sub-page to access firstly, and clear the sub-page to
access last.  This will make the sub-page to access most cache-hot and
sub-pages around it more cache-hot too.  If we cannot know the address
the application will access, the begin of the huge page is assumed to be
the the address the application will access.

With this patch, the throughput increases ~28.3% in vm-scalability
anon-w-seq test case with 72 processes on a 2 socket Xeon E5 v3 2699
system (36 cores, 72 threads).  The test case creates 72 processes, each
process mmap a big anonymous memory area and writes to it from the begin
to the end.  For each process, other processes could be seen as other
workload which generates heavy cache pressure.  At the same time, the
cache miss rate reduced from ~33.4% to ~31.7%, the IPC (instruction per
cycle) increased from 0.56 to 0.74, and the time spent in user space is
reduced ~7.9%

Christopher Lameter suggests to clear bytes inside a sub-page from end
to begin too.  But tests show no visible performance difference in the
tests.  May because the size of page is small compared with the cache
size.

Thanks Andi Kleen to propose to use address to access to determine the
order of sub-pages to clear.

The hugetlbfs access address could be improved, will do that in another
patch.

[ying.huang@intel.com: improve readability of clear_huge_page()]
  Link: http://lkml.kernel.org/r/20170830051842.1397-1-ying.huang@intel.com
Link: http://lkml.kernel.org/r/20170815014618.15842-1-ying.huang@intel.com
Suggested-by: Andi Kleen &lt;andi.kleen@intel.com&gt;
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Nadia Yvette Chambers &lt;nyc@holomorphy.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Shaohua Li &lt;shli@fb.com&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue.  For example, when
clearing huge page on x86_64 platform, the cache footprint is 2M.  But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC (last level cache).  That is, in average, there are 2.5M LLC for
each core and 1.25M LLC for each thread.

If the cache pressure is heavy when clearing the huge page, and we clear
the huge page from the begin to the end, it is possible that the begin
of huge page is evicted from the cache after we finishing clearing the
end of the huge page.  And it is possible for the application to access
the begin of the huge page after clearing the huge page.

To help the above situation, in this patch, when we clear a huge page,
the order to clear sub-pages is changed.  In quite some situation, we
can get the address that the application will access after we clear the
huge page, for example, in a page fault handler.  Instead of clearing
the huge page from begin to end, we will clear the sub-pages farthest
from the the sub-page to access firstly, and clear the sub-page to
access last.  This will make the sub-page to access most cache-hot and
sub-pages around it more cache-hot too.  If we cannot know the address
the application will access, the begin of the huge page is assumed to be
the the address the application will access.

With this patch, the throughput increases ~28.3% in vm-scalability
anon-w-seq test case with 72 processes on a 2 socket Xeon E5 v3 2699
system (36 cores, 72 threads).  The test case creates 72 processes, each
process mmap a big anonymous memory area and writes to it from the begin
to the end.  For each process, other processes could be seen as other
workload which generates heavy cache pressure.  At the same time, the
cache miss rate reduced from ~33.4% to ~31.7%, the IPC (instruction per
cycle) increased from 0.56 to 0.74, and the time spent in user space is
reduced ~7.9%

Christopher Lameter suggests to clear bytes inside a sub-page from end
to begin too.  But tests show no visible performance difference in the
tests.  May because the size of page is small compared with the cache
size.

Thanks Andi Kleen to propose to use address to access to determine the
order of sub-pages to clear.

The hugetlbfs access address could be improved, will do that in another
patch.

[ying.huang@intel.com: improve readability of clear_huge_page()]
  Link: http://lkml.kernel.org/r/20170830051842.1397-1-ying.huang@intel.com
Link: http://lkml.kernel.org/r/20170815014618.15842-1-ying.huang@intel.com
Suggested-by: Andi Kleen &lt;andi.kleen@intel.com&gt;
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Nadia Yvette Chambers &lt;nyc@holomorphy.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Shaohua Li &lt;shli@fb.com&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, THP, swap: support splitting THP for THP swap out</title>
<updated>2017-09-07T00:27:28+00:00</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2017-09-06T23:22:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=59807685a7e77e8c8fe5925613968841538d53d7'/>
<id>59807685a7e77e8c8fe5925613968841538d53d7</id>
<content type='text'>
After adding swapping out support for THP (Transparent Huge Page), it is
possible that a THP in swap cache (partly swapped out) need to be split.
To split such a THP, the swap cluster backing the THP need to be split
too, that is, the CLUSTER_FLAG_HUGE flag need to be cleared for the swap
cluster.  The patch implemented this.

And because the THP swap writing needs the THP keeps as huge page during
writing.  The PageWriteback flag is checked before splitting.

Link: http://lkml.kernel.org/r/20170724051840.2309-8-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Kirill A . Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Ross Zwisler &lt;ross.zwisler@intel.com&gt; [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After adding swapping out support for THP (Transparent Huge Page), it is
possible that a THP in swap cache (partly swapped out) need to be split.
To split such a THP, the swap cluster backing the THP need to be split
too, that is, the CLUSTER_FLAG_HUGE flag need to be cleared for the swap
cluster.  The patch implemented this.

And because the THP swap writing needs the THP keeps as huge page during
writing.  The PageWriteback flag is checked before splitting.

Link: http://lkml.kernel.org/r/20170724051840.2309-8-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Kirill A . Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Ross Zwisler &lt;ross.zwisler@intel.com&gt; [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, THP, swap: make reuse_swap_page() works for THP swapped out</title>
<updated>2017-09-07T00:27:27+00:00</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2017-09-06T23:22:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ba3c4ce6def4915093be80585ff69f780630f32f'/>
<id>ba3c4ce6def4915093be80585ff69f780630f32f</id>
<content type='text'>
After supporting to delay THP (Transparent Huge Page) splitting after
swapped out, it is possible that some page table mappings of the THP are
turned into swap entries.  So reuse_swap_page() need to check the swap
count in addition to the map count as before.  This patch done that.

In the huge PMD write protect fault handler, in addition to the page map
count, the swap count need to be checked too, so the page lock need to
be acquired too when calling reuse_swap_page() in addition to the page
table lock.

[ying.huang@intel.com: silence a compiler warning]
  Link: http://lkml.kernel.org/r/87bmnzizjy.fsf@yhuang-dev.intel.com
Link: http://lkml.kernel.org/r/20170724051840.2309-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Kirill A . Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Ross Zwisler &lt;ross.zwisler@intel.com&gt; [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After supporting to delay THP (Transparent Huge Page) splitting after
swapped out, it is possible that some page table mappings of the THP are
turned into swap entries.  So reuse_swap_page() need to check the swap
count in addition to the map count as before.  This patch done that.

In the huge PMD write protect fault handler, in addition to the page map
count, the swap count need to be checked too, so the page lock need to
be acquired too when calling reuse_swap_page() in addition to the page
table lock.

[ying.huang@intel.com: silence a compiler warning]
  Link: http://lkml.kernel.org/r/87bmnzizjy.fsf@yhuang-dev.intel.com
Link: http://lkml.kernel.org/r/20170724051840.2309-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: "Kirill A . Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Ross Zwisler &lt;ross.zwisler@intel.com&gt; [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/huge_memory.c: constify attribute_group structures</title>
<updated>2017-09-07T00:27:27+00:00</updated>
<author>
<name>Arvind Yadav</name>
<email>arvind.yadav.cs@gmail.com</email>
</author>
<published>2017-09-06T23:22:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8aa95a21bc51205112b3a416810666616287b49b'/>
<id>8aa95a21bc51205112b3a416810666616287b49b</id>
<content type='text'>
attribute_group are not supposed to change at runtime.  All functions
working with attribute_group provided by &lt;linux/sysfs.h&gt; work with const
attribute_group.  So mark the non-const structs as const.

Link: http://lkml.kernel.org/r/1501157240-3876-1-git-send-email-arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav &lt;arvind.yadav.cs@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
attribute_group are not supposed to change at runtime.  All functions
working with attribute_group provided by &lt;linux/sysfs.h&gt; work with const
attribute_group.  So mark the non-const structs as const.

Link: http://lkml.kernel.org/r/1501157240-3876-1-git-send-email-arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav &lt;arvind.yadav.cs@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'linus' into locking/core, to pick up fixes</title>
<updated>2017-08-25T09:04:51+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-08-25T09:04:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=10c9850cb2ced2ce528e5b692c639974213a64ec'/>
<id>10c9850cb2ced2ce528e5b692c639974213a64ec</id>
<content type='text'>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
