linux-stable.git/include/linux/hugetlb.h, branch v3.14.2

mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE

2014-01-24T00:36:50+00:00

Most of the VM_BUG_ON assertions are performed on a page.  Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.

I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.

This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.

[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin 
Cc: "Kirill A. Shutemov" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: tail page refcounting optimization for slab and hugetlbfs

2014-01-22T00:19:43+00:00

This skips the _mapcount mangling for slab and hugetlbfs pages.

The main trouble in doing this is to guarantee that PageSlab and
PageHeadHuge remains constant for all get_page/put_page run on the tail
of slab or hugetlbfs compound pages.  Otherwise if they're set during
get_page but not set during put_page, the _mapcount of the tail page
would underflow.

PageHeadHuge will remain true until the compound page is released and
enters the buddy allocator so it won't risk to change even if the tail
page is the last reference left on the page.

PG_slab instead is cleared before the slab frees the head page with
put_page, so if the tail pin is released after the slab freed the page,
we would have a problem.  But in the slab case the tail pin cannot be
the last reference left on the page.  This is because the slab code is
free to reuse the compound page after a kfree/kmem_cache_free without
having to check if there's any tail pin left.  In turn all tail pins
must be always released while the head is still pinned by the slab code
and so we know PG_slab will be still set too.

Signed-off-by: Andrea Arcangeli 
Reviewed-by: Khalid Aziz 
Cc: Pravin Shelar 
Cc: Greg Kroah-Hartman 
Cc: Ben Hutchings 
Cc: Christoph Lameter 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Andi Kleen 
Cc: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages

2014-01-22T00:19:43+00:00

Dave Jiang reported that he was seeing oopses when running NUMA systems
and default_hugepagesz=1G.  I traced the issue down to
migrate_page_copy() trying to use the same code for hugetlb pages and
transparent hugepages.  It should not have been trying to pass thp pages
in there.

So, add some VM_BUG_ON()s for the next hapless VM developer that tries
the same thing.

Signed-off-by: Dave Hansen 
Reviewed-by: Naoya Horiguchi 
Tested-by: Dave Jiang 
Acked-by: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

include/linux/hugetlb.h: make isolate_huge_page() an inline

2013-12-13T02:19:25+00:00

With CONFIG_HUGETLBFS=n:

  mm/migrate.c: In function `do_move_page_to_node_array':
  include/linux/hugetlb.h:140:33: warning: statement with no effect [-Wunused-value]
   #define isolate_huge_page(p, l) false
                                   ^
  mm/migrate.c:1170:4: note: in expansion of macro `isolate_huge_page'
      isolate_huge_page(page, &pagelist);

Reported-by: Borislav Petkov 
Tested-by: Borislav Petkov 
Signed-off-by: Naoya Horiguchi 
Acked-by: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: hugetlbfs: fix hugetlbfs optimization

2013-11-22T00:42:27+00:00

Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database
caused by THP") can cause dereference of a dangling pointer if
split_huge_page runs during PageHuge() if there are updates to the
tail_page->private field.

Also it is repeating compound_head twice for hugetlbfs and it is running
compound_head+compound_trans_head for THP when a single one is needed in
both cases.

The new code within the PageSlab() check doesn't need to verify that the
THP page size is never bigger than the smallest hugetlbfs page size, to
avoid memory corruption.

A longstanding theoretical race condition was found while fixing the
above (see the change right after the skip_unlock label, that is
relevant for the compound_lock path too).

By re-establishing the _mapcount tail refcounting for all compound
pages, this also fixes the below problem:

  echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

  BUG: Bad page state in process bash  pfn:59a01
  page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
  page flags: 0x1c00000000008000(tail)
  Modules linked in:
  CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x55/0x76
    bad_page+0xd5/0x130
    free_pages_prepare+0x213/0x280
    __free_pages+0x36/0x80
    update_and_free_page+0xc1/0xd0
    free_pool_huge_page+0xc2/0xe0
    set_max_huge_pages.part.58+0x14c/0x220
    nr_hugepages_store_common.isra.60+0xd0/0xf0
    nr_hugepages_store+0x13/0x20
    kobj_attr_store+0xf/0x20
    sysfs_write_file+0x189/0x1e0
    vfs_write+0xc5/0x1f0
    SyS_write+0x55/0xb0
    system_call_fastpath+0x16/0x1b

Signed-off-by: Khalid Aziz 
Signed-off-by: Andrea Arcangeli 
Tested-by: Khalid Aziz 
Cc: Pravin Shelar 
Cc: Greg Kroah-Hartman 
Cc: Ben Hutchings 
Cc: Christoph Lameter 
Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Andi Kleen 
Cc: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: thp: give transparent hugepage code a separate copy_page

2013-11-22T00:42:27+00:00

Right now, the migration code in migrate_page_copy() uses copy_huge_page()
for hugetlbfs and thp pages:

       if (PageHuge(page) || PageTransHuge(page))
                copy_huge_page(newpage, page);

So, yay for code reuse.  But:

  void copy_huge_page(struct page *dst, struct page *src)
  {
        struct hstate *h = page_hstate(src);

and a non-hugetlbfs page has no page_hstate().  This works 99% of the
time because page_hstate() determines the hstate from the page order
alone.  Since the page order of a THP page matches the default hugetlbfs
page order, it works.

But, if you change the default huge page size on the boot command-line
(say default_hugepagesz=1G), then we might not even *have* a 2MB hstate
so page_hstate() returns null and copy_huge_page() oopses pretty fast
since copy_huge_page() dereferences the hstate:

  void copy_huge_page(struct page *dst, struct page *src)
  {
        struct hstate *h = page_hstate(src);
        if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
  ...

Mel noticed that the migration code is really the only user of these
functions.  This moves all the copy code over to migrate.c and makes
copy_huge_page() work for THP by checking for it explicitly.

I believe the bug was introduced in commit b32967ff101a ("mm: numa: Add
THP migration for the NUMA working set scanning fault case")

[akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi]
Signed-off-by: Dave Hansen 
Acked-by: Mel Gorman 
Reviewed-by: Naoya Horiguchi 
Cc: Hillf Danton 
Cc: Andrea Arcangeli 
Tested-by: Dave Jiang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, hugetlb: convert hugetlbfs to use split pmd lock

2013-11-15T00:32:14+00:00

Hugetlb supports multiple page sizes. We use split lock only for PMD
level, but not for PUD.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi 
Signed-off-by: Kirill A. Shutemov 
Tested-by: Alex Thorlton 
Cc: Ingo Molnar 
Cc: "Eric W . Biederman" 
Cc: "Paul E . McKenney" 
Cc: Al Viro 
Cc: Andi Kleen 
Cc: Andrea Arcangeli 
Cc: Dave Hansen 
Cc: Dave Jones 
Cc: David Howells 
Cc: Frederic Weisbecker 
Cc: Johannes Weiner 
Cc: Kees Cook 
Cc: Mel Gorman 
Cc: Michael Kerrisk 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Robin Holt 
Cc: Sedat Dilek 
Cc: Srikar Dronamraju 
Cc: Thomas Gleixner 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: migrate: check movability of hugepage in unmap_and_move_huge_page()

2013-09-11T22:57:49+00:00

Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe.  But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.

To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not.  And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.

Signed-off-by: Naoya Horiguchi 
Cc: Andi Kleen 
Cc: Hillf Danton 
Cc: Wanpeng Li 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: KOSAKI Motohiro 
Cc: Michal Hocko 
Cc: Rik van Riel 
Cc: "Aneesh Kumar K.V" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memory-hotplug: enable memory hotplug to handle hugepage

2013-09-11T22:57:48+00:00

Until now we can't offline memory blocks which contain hugepages because a
hugepage is considered as an unmovable page.  But now with this patch
series, a hugepage has become movable, so by using hugepage migration we
can offline such memory blocks.

What's different from other users of hugepage migration is that we need to
decompose all the hugepages inside the target memory block into free buddy
pages after hugepage migration, because otherwise free hugepages remaining
in the memory block intervene the memory offlining.  For this reason we
introduce new functions dissolve_free_huge_page() and
dissolve_free_huge_pages().

Other than that, what this patch does is straightforwardly to add hugepage
migration code, that is, adding hugepage code to the functions which scan
over pfn and collect hugepages to be migrated, and adding a hugepage
allocation function to alloc_migrate_target().

As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
over them because it's larger than memory block.  So we now simply leave
it to fail as it is.

[yongjun_wei@trendmicro.com.cn: remove duplicated include]
Signed-off-by: Naoya Horiguchi 
Acked-by: Andi Kleen 
Cc: Hillf Danton 
Cc: Wanpeng Li 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: KOSAKI Motohiro 
Cc: Michal Hocko 
Cc: Rik van Riel 
Cc: "Aneesh Kumar K.V" 
Signed-off-by: Wei Yongjun 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: mbind: add hugepage migration code to mbind()

2013-09-11T22:57:48+00:00

Extend do_mbind() to handle vma with VM_HUGETLB set.  We will be able to
migrate hugepage with mbind(2) after applying the enablement patch which
comes later in this series.

Signed-off-by: Naoya Horiguchi 
Acked-by: Andi Kleen 
Reviewed-by: Wanpeng Li 
Acked-by: Hillf Danton 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: KOSAKI Motohiro 
Cc: Michal Hocko 
Cc: Rik van Riel 
Cc: "Aneesh Kumar K.V" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds