<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/memory-failure.c, branch v4.5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: soft-offline: exit with failure for non anonymous thp</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2016-01-16T00:57:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=98fd1ef4241ce0c700a174dd3e33c643b4774690'/>
<id>98fd1ef4241ce0c700a174dd3e33c643b4774690</id>
<content type='text'>
Currently memory_failure() doesn't handle non anonymous thp case,
because we can hardly expect the error handling to be successful, and it
can just hit some corner case which results in BUG_ON or something
severe like that.  This is also the case for soft offline code, so let's
make it in the same way.

Orignal code has a MF_COUNT_INCREASED check before put_hwpoison_page(),
but it's unnecessary because get_any_page() is already called when
running on this code, which takes a refcount of the target page
regardress of the flag.  So this patch also removes it.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently memory_failure() doesn't handle non anonymous thp case,
because we can hardly expect the error handling to be successful, and it
can just hit some corner case which results in BUG_ON or something
severe like that.  This is also the case for soft offline code, so let's
make it in the same way.

Orignal code has a MF_COUNT_INCREASED check before put_hwpoison_page(),
but it's unnecessary because get_any_page() is already called when
running on this code, which takes a refcount of the target page
regardress of the flag.  So this patch also removes it.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: soft-offline: clean up soft_offline_page()</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2016-01-16T00:57:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=acc14dc4bd484f1fc8c227dd9fc2a1e592312d2b'/>
<id>acc14dc4bd484f1fc8c227dd9fc2a1e592312d2b</id>
<content type='text'>
soft_offline_page() has some deeply indented code, that's the sign of
demand for cleanup.  So let's do this.  No functionality change.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
soft_offline_page() has some deeply indented code, that's the sign of
demand for cleanup.  So let's do this.  No functionality change.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: hwpoison: adjust for new thp refcounting</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2016-01-16T00:54:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e41a30c6d506c884d3da9aeb316352e70679d4b'/>
<id>4e41a30c6d506c884d3da9aeb316352e70679d4b</id>
<content type='text'>
Some mm-related BUG_ON()s could trigger from hwpoison code due to recent
changes in thp refcounting rule.  This patch fixes them up.

In the new refcounting, we no longer use tail-&gt;_mapcount to keep tail's
refcount, and thereby we can simplify get/put_hwpoison_page().

And another change is that tail's refcount is not transferred to the raw
page during thp split (more precisely, in new rule we don't take
refcount on tail page any more.) So when we need thp split, we have to
transfer the refcount properly to the 4kB soft-offlined page before
migration.

thp split code goes into core code only when precheck
(total_mapcount(head) == page_count(head) - 1) passes to avoid useless
split, where we assume that one refcount is held by the caller of thp
split and the others are taken via mapping.  To meet this assumption,
this patch moves thp split part in soft_offline_page() after
get_any_page().

[akpm@linux-foundation.org: remove unneeded #define, per Kirill]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Kirill A.  Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some mm-related BUG_ON()s could trigger from hwpoison code due to recent
changes in thp refcounting rule.  This patch fixes them up.

In the new refcounting, we no longer use tail-&gt;_mapcount to keep tail's
refcount, and thereby we can simplify get/put_hwpoison_page().

And another change is that tail's refcount is not transferred to the raw
page during thp split (more precisely, in new rule we don't take
refcount on tail page any more.) So when we need thp split, we have to
transfer the refcount properly to the 4kB soft-offlined page before
migration.

thp split code goes into core code only when precheck
(total_mapcount(head) == page_count(head) - 1) passes to avoid useless
split, where we assume that one refcount is held by the caller of thp
split and the others are taken via mapping.  To meet this assumption,
this patch moves thp split part in soft_offline_page() after
get_any_page().

[akpm@linux-foundation.org: remove unneeded #define, per Kirill]
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Acked-by: Kirill A.  Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: soft-offline: check return value in second __get_any_page() call</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2016-01-16T00:54:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d96b339f453997f2f08c52da3f41423be48c978f'/>
<id>d96b339f453997f2f08c52da3f41423be48c978f</id>
<content type='text'>
I saw the following BUG_ON triggered in a testcase where a process calls
madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
calls migratepages command repeatedly (doing ping-pong among different
NUMA nodes) for the first process:

   Soft offlining page 0x60000 at 0x700000600000
   __get_any_page: 0x60000 free buddy page
   page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
   flags: 0x1fffc0000000000()
   page dumped because: VM_BUG_ON_PAGE(atomic_read(&amp;page-&gt;_count) == 0)
   ------------[ cut here ]------------
   kernel BUG at /src/linux-dev/include/linux/mm.h:342!
   invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
   Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
   CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
   Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
   RIP: 0010:[&lt;ffffffff8118998c&gt;]  [&lt;ffffffff8118998c&gt;] put_page+0x5c/0x60
   RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
   Call Trace:
     put_hwpoison_page+0x4e/0x80
     soft_offline_page+0x501/0x520
     SyS_madvise+0x6bc/0x6f0
     entry_SYSCALL_64_fastpath+0x12/0x6a
   Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 &lt;0f&gt; 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
   RIP  [&lt;ffffffff8118998c&gt;] put_page+0x5c/0x60
    RSP &lt;ffff88007c213e00&gt;

The root cause resides in get_any_page() which retries to get a refcount
of the page to be soft-offlined.  This function calls
put_hwpoison_page(), expecting that the target page is putback to LRU
list.  But it can be also freed to buddy.  So the second check need to
care about such case.

Fixes: af8fae7c0886 ("mm/memory-failure.c: clean up soft_offline_page()")
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.9+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I saw the following BUG_ON triggered in a testcase where a process calls
madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
calls migratepages command repeatedly (doing ping-pong among different
NUMA nodes) for the first process:

   Soft offlining page 0x60000 at 0x700000600000
   __get_any_page: 0x60000 free buddy page
   page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
   flags: 0x1fffc0000000000()
   page dumped because: VM_BUG_ON_PAGE(atomic_read(&amp;page-&gt;_count) == 0)
   ------------[ cut here ]------------
   kernel BUG at /src/linux-dev/include/linux/mm.h:342!
   invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
   Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
   CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
   Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
   RIP: 0010:[&lt;ffffffff8118998c&gt;]  [&lt;ffffffff8118998c&gt;] put_page+0x5c/0x60
   RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
   Call Trace:
     put_hwpoison_page+0x4e/0x80
     soft_offline_page+0x501/0x520
     SyS_madvise+0x6bc/0x6f0
     entry_SYSCALL_64_fastpath+0x12/0x6a
   Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 &lt;0f&gt; 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
   RIP  [&lt;ffffffff8118998c&gt;] put_page+0x5c/0x60
    RSP &lt;ffff88007c213e00&gt;

The root cause resides in get_any_page() which retries to get a refcount
of the page to be soft-offlined.  This function calls
put_hwpoison_page(), expecting that the target page is putback to LRU
list.  But it can be also freed to buddy.  So the second check need to
care about such case.

Fixes: af8fae7c0886 ("mm/memory-failure.c: clean up soft_offline_page()")
Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.9+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>thp, mm: split_huge_page(): caller need to lock page</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-01-16T00:54:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4d2fa965483f4c39bd097ff9bbf3efe62d4cf367'/>
<id>4d2fa965483f4c39bd097ff9bbf3efe62d4cf367</id>
<content type='text'>
We're going to use migration entries instead of compound_lock() to
stabilize page refcounts.  Setup and remove migration entries require
page to be locked.

Some of split_huge_page() callers already have the page locked.  Let's
require everybody to lock the page before calling split_huge_page().

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Tested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We're going to use migration entries instead of compound_lock() to
stabilize page refcounts.  Setup and remove migration entries require
page to be locked.

Some of split_huge_page() callers already have the page locked.  Let's
require everybody to lock the page before calling split_huge_page().

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Tested-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>page-flags: define PG_locked behavior on compound pages</title>
<updated>2016-01-16T01:56:32+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-01-16T00:51:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=48c935ad88f5be20eb5445a77c171351b1eb5111'/>
<id>48c935ad88f5be20eb5445a77c171351b1eb5111</id>
<content type='text'>
lock_page() must operate on the whole compound page.  It doesn't make
much sense to lock part of compound page.  Change code to use head
page's PG_locked, if tail page is passed.

This patch also gets rid of custom helper functions --
__set_page_locked() and __clear_page_locked().  They are replaced with
helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG.  Tail pages to these
helper would trigger VM_BUG_ON().

SLUB uses PG_locked as a bit spin locked.  IIUC, tail pages should never
appear there.  VM_BUG_ON() is added to make sure that this assumption is
correct.

[akpm@linux-foundation.org: fix fs/cifs/file.c]
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
lock_page() must operate on the whole compound page.  It doesn't make
much sense to lock part of compound page.  Change code to use head
page's PG_locked, if tail page is passed.

This patch also gets rid of custom helper functions --
__set_page_locked() and __clear_page_locked().  They are replaced with
helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG.  Tail pages to these
helper would trigger VM_BUG_ON().

SLUB uses PG_locked as a bit spin locked.  IIUC, tail pages should never
appear there.  VM_BUG_ON() is added to make sure that this assumption is
correct.

[akpm@linux-foundation.org: fix fs/cifs/file.c]
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Cc: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: make compound_head() robust</title>
<updated>2015-11-07T01:50:42+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-11-07T00:29:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1d798ca3f16437c71ff63e36597ff07f9c12e4d6'/>
<id>1d798ca3f16437c71ff63e36597ff07f9c12e4d6</id>
<content type='text'>
Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail-&gt;first_page = NULL
      head = tail-&gt;first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail-&gt;first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    &lt;head == NULL dereferencing&gt;

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page-&gt;compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.

The patch moves page-&gt;pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.

hugetlb_cgroup uses page-&gt;lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page-&gt;private
in the second tail page instead. The space is free since -&gt;first_page is
removed from the union.

The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.

That means page-&gt;compound_head shares storage space with:

 - page-&gt;lru.next;
 - page-&gt;next;
 - page-&gt;rcu_head.next;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.

page-&gt;rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().

[1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail-&gt;first_page = NULL
      head = tail-&gt;first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail-&gt;first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    &lt;head == NULL dereferencing&gt;

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page-&gt;compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.

The patch moves page-&gt;pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.

hugetlb_cgroup uses page-&gt;lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page-&gt;private
in the second tail page instead. The space is free since -&gt;first_page is
removed from the union.

The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.

That means page-&gt;compound_head shares storage space with:

 - page-&gt;lru.next;
 - page-&gt;next;
 - page-&gt;rcu_head.next;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.

page-&gt;rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().

[1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: hwpoison: ratelimit messages from unpoison_memory()</title>
<updated>2015-11-06T03:34:48+00:00</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2015-11-06T02:47:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a5f65109026b35b654b94fdcd26a971185a53adc'/>
<id>a5f65109026b35b654b94fdcd26a971185a53adc</id>
<content type='text'>
Currently kernel prints out results of every single unpoison event, which
i= s not necessary because unpoison is purely a testing feature and
testers can = get little or no information from lots of lines of unpoison
log storm.  So this patch ratelimits printk in unpoison_memory().

This patch introduces a file local ratelimit_state, which adds 64 bytes to
memory-failure.o.  If we apply pr_info_ratelimited() for 8 callsite below,
2= 56 bytes is added, so it's a win.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently kernel prints out results of every single unpoison event, which
i= s not necessary because unpoison is purely a testing feature and
testers can = get little or no information from lots of lines of unpoison
log storm.  So this patch ratelimits printk in unpoison_memory().

This patch introduces a file local ratelimit_state, which adds 64 bytes to
memory-failure.o.  If we apply pr_info_ratelimited() for 8 callsite below,
2= 56 bytes is added, so it's a win.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hwpoison: use page_cgroup_ino for filtering by memcg</title>
<updated>2015-09-10T20:29:01+00:00</updated>
<author>
<name>Vladimir Davydov</name>
<email>vdavydov@parallels.com</email>
</author>
<published>2015-09-09T22:35:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=94a59fb36ee91b3f977a9b914ec701efe40041de'/>
<id>94a59fb36ee91b3f977a9b914ec701efe40041de</id>
<content type='text'>
Hwpoison allows to filter pages by memory cgroup ino.  Currently, it
calls try_get_mem_cgroup_from_page to obtain the cgroup from a page and
then its ino using cgroup_ino, but now we have a helper method for
that, page_cgroup_ino, so use it instead.

This patch also loosens the hwpoison memcg filter dependency rules - it
makes it depend on CONFIG_MEMCG instead of CONFIG_MEMCG_SWAP, because
hwpoison memcg filter does not require anything (nor it used to) from
CONFIG_MEMCG_SWAP side.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Reviewed-by: Andres Lagar-Cavilla &lt;andreslc@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Michel Lespinasse &lt;walken@google.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hwpoison allows to filter pages by memory cgroup ino.  Currently, it
calls try_get_mem_cgroup_from_page to obtain the cgroup from a page and
then its ino using cgroup_ino, but now we have a helper method for
that, page_cgroup_ino, so use it instead.

This patch also loosens the hwpoison memcg filter dependency rules - it
makes it depend on CONFIG_MEMCG instead of CONFIG_MEMCG_SWAP, because
hwpoison memcg filter does not require anything (nor it used to) from
CONFIG_MEMCG_SWAP side.

Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Reviewed-by: Andres Lagar-Cavilla &lt;andreslc@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Michel Lespinasse &lt;walken@google.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: rename alloc_pages_exact_node() to __alloc_pages_node()</title>
<updated>2015-09-08T22:35:28+00:00</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2015-09-08T22:03:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=96db800f5d73cd5c49461253d45766e094f0f8c2'/>
<id>96db800f5d73cd5c49461253d45766e094f0f8c2</id>
<content type='text'>
alloc_pages_exact_node() was introduced in commit 6484eb3e2a81 ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise.  In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see for example
commits 5265047ac301 ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb43f8e ("mm, mempolicy:
migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage.  Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly.  The number of users would be small
anyway.

Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead.  This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid &lt; 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.

Both differences will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers.  Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.

Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Robin Holt &lt;robinmholt@gmail.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Gleb Natapov &lt;gleb@kernel.org&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Cliff Whickman &lt;cpw@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
alloc_pages_exact_node() was introduced in commit 6484eb3e2a81 ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise.  In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see for example
commits 5265047ac301 ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb43f8e ("mm, mempolicy:
migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage.  Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly.  The number of users would be small
anyway.

Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead.  This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid &lt; 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.

Both differences will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers.  Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.

Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Robin Holt &lt;robinmholt@gmail.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Gleb Natapov &lt;gleb@kernel.org&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Cliff Whickman &lt;cpw@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
