linux.git/mm, branch v4.0-rc4

Merge branch 'gadget' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2015-03-13T17:55:32+00:00

Pull gadgetfs fixes from Al Viro:
 "Assorted fixes around AIO on gadgetfs: leaks, use-after-free, troubles
  caused by ->f_op flipping"

* 'gadget' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  gadgetfs: really get rid of switching ->f_op
  gadgetfs: get rid of flipping ->f_op in ep_config()
  gadget: switch ep_io_operations to ->read_iter/->write_iter
  gadgetfs: use-after-free in ->aio_read()
  gadget/function/f_fs.c: switch to ->{read,write}_iter()
  gadget/function/f_fs.c: use put iov_iter into io_data
  gadget/function/f_fs.c: close leaks
  move iov_iter.c from mm/ to lib/
  new helper: dup_iter()

Merge branch 'akpm' (patches from Andrew)

2015-03-13T01:46:19+00:00

Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton :
  memcg: disable hierarchy support if bound to the legacy cgroup hierarchy
  mm: reorder can_do_mlock to fix audit denial
  kasan, module: move MODULE_ALIGN macro into 
  kasan, module, vmalloc: rework shadow allocation for modules
  fanotify: fix event filtering with FAN_ONDIR set
  mm/nommu.c: export symbol max_mapnr
  arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU
  nilfs2: fix deadlock of segment constructor during recovery
  mm: cma: fix CMA aligned offset calculation
  mm, hugetlb: close race when setting PageTail for gigantic pages
  mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled
  drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data
  ocfs2: make append_dio an incompat feature

memcg: disable hierarchy support if bound to the legacy cgroup hierarchy

2015-03-13T01:46:08+00:00

If the memory cgroup controller is initially mounted in the scope of the
default cgroup hierarchy and then remounted to a legacy hierarchy, it will
still have hierarchy support enabled, which is incorrect.  We should
disable hierarchy support if bound to the legacy cgroup hierarchy.

Signed-off-by: Vladimir Davydov 
Signed-off-by: Johannes Weiner 
Acked-by: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: reorder can_do_mlock to fix audit denial

2015-03-13T01:46:08+00:00

A userspace call to mmap(MAP_LOCKED) may result in the successful locking
of memory while also producing a confusing audit log denial.  can_do_mlock
checks capable and rlimit.  If either of these return positive
can_do_mlock returns true.  The capable check leads to an LSM hook used by
apparmour and selinux which produce the audit denial.  Reordering so
rlimit is checked first eliminates the denial on success, only recording a
denial when the lock is unsuccessful as a result of the denial.

Signed-off-by: Jeff Vander Stoep 
Acked-by: Nick Kralevich 
Cc: Jeff Vander Stoep 
Cc: Sasha Levin 
Cc: "Paul E. McKenney" 
Cc: Rik van Riel 
Cc: Vlastimil Babka 
Cc: Paul Cassella 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

kasan, module, vmalloc: rework shadow allocation for modules

2015-03-13T01:46:08+00:00

Current approach in handling shadow memory for modules is broken.

Shadow memory could be freed only after memory shadow corresponds it is no
longer used.  vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:

    void vfree(const void *addr)
    {
    ...
        if (unlikely(in_interrupt())) {
            struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
            if (llist_add((struct llist_node *)addr, &p->list))
                    schedule_work(&p->wq);

Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.

So shadow memory should be freed after module's memory.  However, such
deallocation order could race with kasan_module_alloc() in module_alloc().

Free shadow right before releasing vm area.  At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.

Signed-off-by: Andrey Ryabinin 
Acked-by: Rusty Russell 
Cc: Dmitry Vyukov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/nommu.c: export symbol max_mapnr

2015-03-13T01:46:08+00:00

Several modules may need max_mapnr, so export, the related error with
allmodconfig under c6x:

  MODPOST 3327 modules
  ERROR: "max_mapnr" [fs/pstore/ramoops.ko] undefined!
  ERROR: "max_mapnr" [drivers/media/v4l2-core/videobuf2-dma-contig.ko] undefined!

Signed-off-by: Chen Gang 
Cc: Mark Salter 
Cc: Aurelien Jacquiot 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: cma: fix CMA aligned offset calculation

2015-03-13T01:46:07+00:00

The CMA aligned offset calculation is incorrect for non-zero order_per_bit
values.

For example, if cma->order_per_bit=1, cma->base_pfn= 0x2f800000 and
align_order=12, the function returns a value of 0x17c00 instead of 0x400.

This patch fixes the CMA aligned offset calculation.

The previous calculation was wrong and would return too-large values for
the offset, so that when cma_alloc looks for free pages in the bitmap with
the requested alignment > order_per_bit, it starts too far into the bitmap
and so CMA allocations will fail despite there actually being plenty of
free pages remaining.  It will also probably have the wrong alignment.
With this change, we will get the correct offset into the bitmap.

One affected user is powerpc KVM, which has kvm_cma->order_per_bit set to
KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, or 18 - 12 = 6.

[gregory.0xf0@gmail.com: changelog additions]
Signed-off-by: Danesh Petigara 
Reviewed-by: Gregory Fong 
Acked-by: Michal Nazarewicz 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, hugetlb: close race when setting PageTail for gigantic pages

2015-03-13T01:46:07+00:00

Now that gigantic pages are dynamically allocatable, care must be taken to
ensure that p->first_page is valid before setting PageTail.

If this isn't done, then it is possible to race and have compound_head()
return NULL.

Signed-off-by: David Rientjes 
Acked-by: Davidlohr Bueso 
Cc: Luiz Capitulino 
Cc: Joonsoo Kim 
Acked-by: Hillf Danton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled

2015-03-13T01:46:07+00:00

Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
after OOM killer is disabled if the allocation is performed by a kernel
thread.  This behavior was introduced from the very beginning by
7f33d49a2ed5 ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
 This means that the basic contract for the allocation request is broken
and the context requesting such an allocation might blow up unexpectedly.

There are basically two ways forward.

1) move oom_killer_disable after kernel threads are frozen.  This has a
   risk that the OOM victim wouldn't be able to finish because it would
   depend on an already frozen kernel thread.  This would be really tricky
   to debug.

2) do not fail GFP_NOFAIL allocation no matter what and risk a
   potential Freezable kernel threads will loop and fail the suspend.
   Incidental allocations after kernel threads are frozen will at least
   dump a warning - if we are lucky and the serial console is still active
   of course...

This patch implements the later option because it is safer.  We would see
warning rather than allocation failures for the kernel threads which would
blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
from deeper pm code.

Signed-off-by: Michal Hocko 
Acked-by: David Rientjes 
Cc: Johannes Weiner 
Cc: Tetsuo Handa 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: thp: Return the correct value for change_huge_pmd

2015-03-12T21:07:41+00:00

The wrong value is being returned by change_huge_pmd since commit
10c1045f28e8 ("mm: numa: avoid unnecessary TLB flushes when setting
NUMA hinting entries") which allows a fallthrough that tries to adjust
non-existent PTEs. This patch corrects it.

Signed-off-by: Mel Gorman 
Signed-off-by: Linus Torvalds