<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/fs/proc/task_mmu.c, branch linux-3.14.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>proc: Fix ptrace-based permission checks for accessing task maps</title>
<updated>2016-03-03T23:06:45+00:00</updated>
<author>
<name>Corey Wright</name>
<email>undefined@pobox.com</email>
</author>
<published>2016-02-28T08:42:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=223c676634fcfe5a54b80bc3d58aec8e49a3c016'/>
<id>223c676634fcfe5a54b80bc3d58aec8e49a3c016</id>
<content type='text'>
Modify mm_access() calls in fs/proc/task_mmu.c and fs/proc/task_nommu.c to
have the mode include PTRACE_MODE_FSCREDS so accessing /proc/pid/maps and
/proc/pid/pagemap is not denied to all users.

In backporting upstream commit caaee623 to pre-3.18 kernel versions it was
overlooked that mm_access() is used in fs/proc/task_*mmu.c as those calls
were removed in 3.18 (by upstream commit 29a40ace) and did not exist at the
time of the original commit.

Signed-off-by: Corey Wright &lt;undefined@pobox.com&gt;
Acked-by: Jann Horn &lt;jann@thejh.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Modify mm_access() calls in fs/proc/task_mmu.c and fs/proc/task_nommu.c to
have the mode include PTRACE_MODE_FSCREDS so accessing /proc/pid/maps and
/proc/pid/pagemap is not denied to all users.

In backporting upstream commit caaee623 to pre-3.18 kernel versions it was
overlooked that mm_access() is used in fs/proc/task_*mmu.c as those calls
were removed in 3.18 (by upstream commit 29a40ace) and did not exist at the
time of the original commit.

Signed-off-by: Corey Wright &lt;undefined@pobox.com&gt;
Acked-by: Jann Horn &lt;jann@thejh.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>proc/pagemap: walk page tables under pte lock</title>
<updated>2015-04-29T08:31:56+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-02-11T23:27:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3b09542675a042106bd31a84b7c86b181751d029'/>
<id>3b09542675a042106bd31a84b7c86b181751d029</id>
<content type='text'>
commit 05fbf357d94152171bc50f8a369390f1f16efd89 upstream.

Lockless access to pte in pagemap_pte_range() might race with page
migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():

CPU A (pagemap)                           CPU B (migration)
                                          lock_page()
                                          try_to_unmap(page, TTU_MIGRATION...)
                                               make_migration_entry()
                                               set_pte_at()
&lt;read *pte&gt;
pte_to_pagemap_entry()
                                          remove_migration_ptes()
                                          unlock_page()
    if(is_migration_entry())
        migration_entry_to_page()
            BUG_ON(!PageLocked(page))

Also lockless read might be non-atomic if pte is larger than wordsize.
Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.

Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap")
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Reported-by: Andrey Ryabinin &lt;a.ryabinin@samsung.com&gt;
Reviewed-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.5+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 05fbf357d94152171bc50f8a369390f1f16efd89 upstream.

Lockless access to pte in pagemap_pte_range() might race with page
migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():

CPU A (pagemap)                           CPU B (migration)
                                          lock_page()
                                          try_to_unmap(page, TTU_MIGRATION...)
                                               make_migration_entry()
                                               set_pte_at()
&lt;read *pte&gt;
pte_to_pagemap_entry()
                                          remove_migration_ptes()
                                          unlock_page()
    if(is_migration_entry())
        migration_entry_to_page()
            BUG_ON(!PageLocked(page))

Also lockless read might be non-atomic if pte is larger than wordsize.
Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.

Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap")
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Reported-by: Andrey Ryabinin &lt;a.ryabinin@samsung.com&gt;
Reviewed-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.5+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: softdirty: unmapped addresses between VMAs are clean</title>
<updated>2015-04-29T08:31:56+00:00</updated>
<author>
<name>Peter Feiner</name>
<email>pfeiner@google.com</email>
</author>
<published>2014-10-09T22:28:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=620d77bda9fedb4dcd1271d3cbcbe099dff01f98'/>
<id>620d77bda9fedb4dcd1271d3cbcbe099dff01f98</id>
<content type='text'>
commit 81d0fa623c5b8dbd5279d9713094b0f9b0a00fb4 upstream.

If a /proc/pid/pagemap read spans a [VMA, an unmapped region, then a
VM_SOFTDIRTY VMA], the virtual pages in the unmapped region are reported
as softdirty.  Here's a program to demonstrate the bug:

int main() {
	const uint64_t PAGEMAP_SOFTDIRTY = 1ul &lt;&lt; 55;
	uint64_t pme[3];
	int fd = open("/proc/self/pagemap", O_RDONLY);;
	char *m = mmap(NULL, 3 * getpagesize(), PROT_READ,
	               MAP_ANONYMOUS | MAP_SHARED, -1, 0);
	munmap(m + getpagesize(), getpagesize());
	pread(fd, pme, 24, (unsigned long) m / getpagesize() * 8);
	assert(pme[0] &amp; PAGEMAP_SOFTDIRTY);    /* passes */
	assert(!(pme[1] &amp; PAGEMAP_SOFTDIRTY)); /* fails */
	assert(pme[2] &amp; PAGEMAP_SOFTDIRTY);    /* passes */
	return 0;
}

(Note that all pages in new VMAs are softdirty until cleared).

Tested:
	Used the program given above. I'm going to include this code in
	a selftest in the future.

[n-horiguchi@ah.jp.nec.com: prevent pagemap_pte_range() from overrunning]
Signed-off-by: Peter Feiner &lt;pfeiner@google.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Jamie Liu &lt;jamieliu@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 81d0fa623c5b8dbd5279d9713094b0f9b0a00fb4 upstream.

If a /proc/pid/pagemap read spans a [VMA, an unmapped region, then a
VM_SOFTDIRTY VMA], the virtual pages in the unmapped region are reported
as softdirty.  Here's a program to demonstrate the bug:

int main() {
	const uint64_t PAGEMAP_SOFTDIRTY = 1ul &lt;&lt; 55;
	uint64_t pme[3];
	int fd = open("/proc/self/pagemap", O_RDONLY);;
	char *m = mmap(NULL, 3 * getpagesize(), PROT_READ,
	               MAP_ANONYMOUS | MAP_SHARED, -1, 0);
	munmap(m + getpagesize(), getpagesize());
	pread(fd, pme, 24, (unsigned long) m / getpagesize() * 8);
	assert(pme[0] &amp; PAGEMAP_SOFTDIRTY);    /* passes */
	assert(!(pme[1] &amp; PAGEMAP_SOFTDIRTY)); /* fails */
	assert(pme[2] &amp; PAGEMAP_SOFTDIRTY);    /* passes */
	return 0;
}

(Note that all pages in new VMAs are softdirty until cleared).

Tested:
	Used the program given above. I'm going to include this code in
	a selftest in the future.

[n-horiguchi@ah.jp.nec.com: prevent pagemap_pte_range() from overrunning]
Signed-off-by: Peter Feiner &lt;pfeiner@google.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Jamie Liu &lt;jamieliu@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>pagemap: do not leak physical addresses to non-privileged userspace</title>
<updated>2015-03-26T14:06:57+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-03-09T21:11:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=26f7f4d46a2cbfa6fbb633d228ec34cf969589d5'/>
<id>26f7f4d46a2cbfa6fbb633d228ec34cf969589d5</id>
<content type='text'>
commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream.

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Mark Seaborn &lt;mseaborn@chromium.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream.

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Mark Seaborn &lt;mseaborn@chromium.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: per-thread vma caching</title>
<updated>2014-10-09T19:21:29+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-04-07T22:37:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=efb5fea23009a0223996e699b54cc9533e2070e9'/>
<id>efb5fea23009a0223996e699b54cc9533e2070e9</id>
<content type='text'>
commit 615d6e8756c87149f2d4c1b93d471bca002bd849 upstream.

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 615d6e8756c87149f2d4c1b93d471bca002bd849 upstream.

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: add !pte_present() check on existing hugetlb_entry callbacks</title>
<updated>2014-06-11T18:54:13+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2014-06-06T14:00:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=395982ce12b3aba14a197ead3e5828e77e11aec1'/>
<id>395982ce12b3aba14a197ead3e5828e77e11aec1</id>
<content type='text'>
commit d4c54919ed86302094c0ca7d48a8cbd4ee753e92 upstream.

The age table walker doesn't check non-present hugetlb entry in common
path, so hugetlb_entry() callbacks must check it.  The reason for this
behavior is that some callers want to handle it in its own way.

[ I think that reason is bogus, btw - it should just do what the regular
  code does, which is to call the "pte_hole()" function for such hugetlb
  entries  - Linus]

However, some callers don't check it now, which causes unpredictable
result, for example when we have a race between migrating hugepage and
reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
checks on buggy callbacks.

This bug exists for years and got visible by introducing hugepage
migration.

ChangeLog v2:
- fix if condition (check !pte_present() instead of pte_present())

Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
[ Backported to 3.15.  Signed-off-by: Josh Boyer &lt;jwboyer@fedoraproject.org&gt; ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d4c54919ed86302094c0ca7d48a8cbd4ee753e92 upstream.

The age table walker doesn't check non-present hugetlb entry in common
path, so hugetlb_entry() callbacks must check it.  The reason for this
behavior is that some callers want to handle it in its own way.

[ I think that reason is bogus, btw - it should just do what the regular
  code does, which is to call the "pte_hole()" function for such hugetlb
  entries  - Linus]

However, some callers don't check it now, which causes unpredictable
result, for example when we have a race between migrating hugepage and
reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
checks on buggy callbacks.

This bug exists for years and got visible by introducing hugepage
migration.

ChangeLog v2:
- fix if condition (check !pte_present() instead of pte_present())

Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
[ Backported to 3.15.  Signed-off-by: Josh Boyer &lt;jwboyer@fedoraproject.org&gt; ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>seq_file: remove "%n" usage from seq_file users</title>
<updated>2013-11-15T00:32:20+00:00</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2013-11-14T22:31:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=652586df95e5d76b37d07a11839126dcfede1621'/>
<id>652586df95e5d76b37d07a11839126dcfede1621</id>
<content type='text'>
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, thp: change pmd_trans_huge_lock() to return taken lock</title>
<updated>2013-11-15T00:32:14+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-11-14T22:30:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bf929152e9f6c49b66fad4ebf08cc95b02ce48f5'/>
<id>bf929152e9f6c49b66fad4ebf08cc95b02ce48f5</id>
<content type='text'>
With split ptlock it's important to know which lock
pmd_trans_huge_lock() took.  This patch adds one more parameter to the
function to return the lock.

In most places migration to new api is trivial.  Exception is
move_huge_pmd(): we need to take two locks if pmd tables are different.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E . McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Robin Holt &lt;robinmholt@gmail.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With split ptlock it's important to know which lock
pmd_trans_huge_lock() took.  This patch adds one more parameter to the
function to return the lock.

In most places migration to new api is trivial.  Exception is
move_huge_pmd(): we need to take two locks if pmd tables are different.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E . McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Robin Holt &lt;robinmholt@gmail.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: convert mm-&gt;nr_ptes to atomic_long_t</title>
<updated>2013-11-15T00:32:14+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-11-14T22:30:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e1f56c89b040134add93f686931cc266541d239a'/>
<id>e1f56c89b040134add93f686931cc266541d239a</id>
<content type='text'>
With split page table lock for PMD level we can't hold mm-&gt;page_table_lock
while updating nr_ptes.

Let's convert it to atomic_long_t to avoid races.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E . McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Robin Holt &lt;robinmholt@gmail.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With split page table lock for PMD level we can't hold mm-&gt;page_table_lock
while updating nr_ptes.

Let's convert it to atomic_long_t to avoid races.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E . McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Robin Holt &lt;robinmholt@gmail.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>/proc/pid/smaps: show VM_SOFTDIRTY flag in VmFlags line</title>
<updated>2013-11-13T03:09:07+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2013-11-12T23:07:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ec8e41aec13005fed0dbee002fb8c99b4e001d50'/>
<id>ec8e41aec13005fed0dbee002fb8c99b4e001d50</id>
<content type='text'>
This flag shows that the VMA is "newly created" and thus represents
"dirty" in the task's VM.

You can clear it by "echo 4 &gt; /proc/pid/clear_refs."

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This flag shows that the VMA is "newly created" and thus represents
"dirty" in the task's VM.

You can clear it by "echo 4 &gt; /proc/pid/clear_refs."

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Acked-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
