linux-stable.git/mm, branch v3.12.8

aio/migratepages: make aio migrate pages sane

2014-01-09T20:25:16+00:00

commit 8e321fefb0e60bae4e2a28d20fc4fa30758d27c6 upstream.

The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.

Signed-off-by: Benjamin LaHaise 
Signed-off-by: Greg Kroah-Hartman

memcg: fix memcg_size() calculation

2014-01-09T20:25:15+00:00

commit 695c60830764945cf61a2cc623eb1392d137223e upstream.

The mem_cgroup structure contains nr_node_ids pointers to
mem_cgroup_per_node objects, not the objects themselves.

Signed-off-by: Vladimir Davydov 
Acked-by: Michal Hocko 
Cc: Glauber Costa 
Cc: Johannes Weiner 
Cc: Balbir Singh 
Cc: KAMEZAWA Hiroyuki 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm/memory-failure.c: transfer page count from head page to tail page after split thp

2014-01-09T20:25:14+00:00

commit a3e0f9e47d5ef7858a26cc12d90ad5146e802d47 upstream.

Memory failures on thp tail pages cause kernel panic like below:

   mce: [Hardware Error]: Machine check events logged
   MCE exception done on CPU 7
   BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
   IP: [] dequeue_hwpoisoned_huge_page+0x131/0x1e0
   PGD bae42067 PUD ba47d067 PMD 0
   Oops: 0000 [#1] SMP
  ...
   CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G   M       O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25
  ...
   Call Trace:
     me_huge_page+0x3e/0x50
     memory_failure+0x4bb/0xc20
     mce_process_work+0x3e/0x70
     process_one_work+0x171/0x420
     worker_thread+0x11b/0x3a0
     ? manage_workers.isra.25+0x2b0/0x2b0
     kthread+0xe4/0x100
     ? kthread_create_on_node+0x190/0x190
     ret_from_fork+0x7c/0xb0
     ? kthread_create_on_node+0x190/0x190
  ...
   RIP   dequeue_hwpoisoned_huge_page+0x131/0x1e0
   CR2: 0000000000000058

The reasoning of this problem is shown below:
 - when we have a memory error on a thp tail page, the memory error
   handler grabs a refcount of the head page to keep the thp under us.
 - Before unmapping the error page from processes, we split the thp,
   where page refcounts of both of head/tail pages don't change.
 - Then we call try_to_unmap() over the error page (which was a tail
   page before). We didn't pin the error page to handle the memory error,
   this error page is freed and removed from LRU list.
 - We never have the error page on LRU list, so the first page state
   check returns "unknown page," then we move to the second check
   with the saved page flag.
 - The saved page flag have PG_tail set, so the second page state check
   returns "hugepage."
 - We call me_huge_page() for freed error page, then we hit the above panic.

The root cause is that we didn't move refcount from the head page to the
tail page after split thp.  So this patch suggests to do this.

This panic was introduced by commit 524fca1e73 ("HWPOISON: fix
misjudgement of page_action() for errors on mlocked pages").  Note that we
did have the same refcount problem before this commit, but it was just
ignored because we had only first page state check which returned "unknown
page." The commit changed the refcount problem from "doesn't work" to
"kernel panic."

Signed-off-by: Naoya Horiguchi 
Reviewed-by: Wanpeng Li 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: fix use-after-free in sys_remap_file_pages

2014-01-09T20:25:14+00:00

commit 4eb919825e6c3c7fb3630d5621f6d11e98a18b3a upstream.

remap_file_pages calls mmap_region, which may merge the VMA with other
existing VMAs, and free "vma".  This can lead to a use-after-free bug.
Avoid the bug by remembering vm_flags before calling mmap_region, and
not trying to dereference vma later.

Signed-off-by: Rik van Riel 
Reported-by: Dmitry Vyukov 
Cc: PaX Team 
Cc: Kees Cook 
Cc: Michel Lespinasse 
Cc: Cyrill Gorcunov 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: munlock: fix deadlock in __munlock_pagevec()

2014-01-09T20:25:14+00:00

commit 3b25df93c6e37e323b86a2a8c1e00c0a2821c6c9 upstream.

Commit 7225522bb429 ("mm: munlock: batch non-THP page isolation and
munlock+putback using pagevec" introduced __munlock_pagevec() to speed
up munlock by holding lru_lock over multiple isolated pages.  Pages that
fail to be isolated are put_page()d immediately, also within the lock.

This can lead to deadlock when __munlock_pagevec() becomes the holder of
the last page pin and put_page() leads to __page_cache_release() which
also locks lru_lock.  The deadlock has been observed by Sasha Levin
using trinity.

This patch avoids the deadlock by deferring put_page() operations until
lru_lock is released.  Another pagevec (which is also used by later
phases of the function is reused to gather the pages for put_page()
operation.

Signed-off-by: Vlastimil Babka 
Reported-by: Sasha Levin 
Cc: Michel Lespinasse 
Cc: Andrea Arcangeli 
Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: munlock: fix a bug where THP tail page is encountered

2014-01-09T20:25:14+00:00

commit c424be1cbbf852e46acc84d73162af3066cd2c86 upstream.

Since commit ff6a6da60b89 ("mm: accelerate munlock() treatment of THP
pages") munlock skips tail pages of a munlocked THP page.  However, when
the head page already has PageMlocked unset, it will not skip the tail
pages.

Commit 7225522bb429 ("mm: munlock: batch non-THP page isolation and
munlock+putback using pagevec") has added a PageTransHuge() check which
contains VM_BUG_ON(PageTail(page)).  Sasha Levin found this triggered
using trinity, on the first tail page of a THP page without PageMlocked
flag.

This patch fixes the issue by skipping tail pages also in the case when
PageMlocked flag is unset.  There is still a possibility of race with
THP page split between clearing PageMlocked and determining how many
pages to skip.  The race might result in former tail pages not being
skipped, which is however no longer a bug, as during the skip the
PageTail flags are cleared.

However this race also affects correctness of NR_MLOCK accounting, which
is to be fixed in a separate patch.

Signed-off-by: Vlastimil Babka 
Reported-by: Sasha Levin 
Cc: Michel Lespinasse 
Cc: Andrea Arcangeli 
Cc: Rik van Riel 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Bob Liu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: page_alloc: revert NUMA aspect of fair allocation policy

2014-01-09T20:25:14+00:00

commit fff4068cba484e6b0abe334ed6b15d5a215a3b25 upstream.

Commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") meant
to bring aging fairness among zones in system, but it was overzealous
and badly regressed basic workloads on NUMA systems.

Due to the way kswapd and page allocator interacts, we still want to
make sure that all zones in any given node are used equally for all
allocations to maximize memory utilization and prevent thrashing on the
highest zone in the node.

While the same principle applies to NUMA nodes - memory utilization is
obviously improved by spreading allocations throughout all nodes -
remote references can be costly and so many workloads prefer locality
over memory utilization.  The original change assumed that
zone_reclaim_mode would be a good enough predictor for that, but it
turned out to be as indicative as a coin flip.

Revert the NUMA aspect of the fairness until we can find a proper way to
make it configurable and agree on a sane default.

Signed-off-by: Johannes Weiner 
Reviewed-by: Michal Hocko 
Signed-off-by: Mel Gorman 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm/hugetlb: check for pte NULL pointer in __page_check_address()

2014-01-09T20:25:14+00:00

commit 98398c32f6687ee1e1f3ae084effb4b75adb0747 upstream.

In __page_check_address(), if address's pud is not present,
huge_pte_offset() will return NULL, we should check the return value.

Signed-off-by: Jianguo Wu 
Cc: Naoya Horiguchi 
Cc: Mel Gorman 
Cc: qiuxishi 
Cc: Hanjun Guo 
Acked-by: Kirill A. Shutemov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully

2014-01-09T20:25:14+00:00

commit a49ecbcd7b0d5a1cda7d60e03df402dd0ef76ac8 upstream.

After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy(over-commit
page).  If page is in buddy, page_hstate(page) will be NULL.  It will
hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  IP: [] dequeue_hwpoisoned_huge_page+0x131/0x1d0
  PGD c23762067 PUD c24be2067 PMD 0
  Oops: 0000 [#1] SMP

So check PageHuge(page) after call migrate_pages() successfully.

Signed-off-by: Jianguo Wu 
Tested-by: Naoya Horiguchi 
Reviewed-by: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm/compaction: respect ignore_skip_hint in update_pageblock_skip

2014-01-09T20:25:14+00:00

commit 6815bf3f233e0b10c99a758497d5d236063b010b upstream.

update_pageblock_skip() only fits to compaction which tries to isolate
by pageblock unit.  If isolate_migratepages_range() is called by CMA, it
try to isolate regardless of pageblock unit and it don't reference
get_pageblock_skip() by ignore_skip_hint.  We should also respect it on
update_pageblock_skip() to prevent from setting the wrong information.

Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Reviewed-by: Naoya Horiguchi 
Reviewed-by: Wanpeng Li 
Cc: Christoph Lameter 
Cc: Rafael Aquini 
Cc: Vlastimil Babka 
Cc: Wanpeng Li 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Zhang Yanfei 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman