linux-stable.git/fs/userfaultfd.c, branch v4.19.78

userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx

2019-08-29T06:28:52+00:00

commit 46d0b24c5ee10a15dfb25e20642f5a5ed59c5003 upstream.

userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
mm->core_state != NULL.

Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.

Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
Signed-off-by: Oleg Nesterov 
Reported-by: Kefeng Wang 
Reviewed-by: Andrea Arcangeli 
Tested-by: Kefeng Wang 
Cc: Peter Xu 
Cc: Mike Rapoport 
Cc: Jann Horn 
Cc: Jason Gunthorpe 
Cc: Michal Hocko 
Cc: Tetsuo Handa 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fs/userfaultfd.c: disable irqs for fault_pending and event locks

2019-07-10T07:53:42+00:00

commit cbcfa130a911c613a1d9d921af2eea171c414172 upstream.

When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.

This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
userfaultfd_ctx_read(), which in turn can be waiting for
userfaultfd_ctx::fault_pending_wqh.lock or
userfaultfd_ctx::event_wqh.lock.

But elsewhere the fault_pending_wqh and event_wqh locks are taken with
IRQs enabled.  Since the IRQ handler may take kioctx::ctx_lock, lockdep
reports that a deadlock is possible.

Fix it by always disabling IRQs when taking the fault_pending_wqh and
event_wqh locks.

Commit ae62c16e105a ("userfaultfd: disable irqs when taking the
waitqueue lock") didn't fix this because it only accounted for the
fd_wqh lock, not the other locks nested inside it.

Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Signed-off-by: Eric Biggers 
Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton 
Cc: Christoph Hellwig 
Cc: Andrea Arcangeli 
Cc: 	[4.19+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping

2019-04-27T07:36:37+00:00

commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli 
Reported-by: Jann Horn 
Suggested-by: Oleg Nesterov 
Acked-by: Peter Xu 
Reviewed-by: Mike Rapoport 
Reviewed-by: Oleg Nesterov 
Reviewed-by: Jann Horn 
Acked-by: Jason Gunthorpe 
Acked-by: Michal Hocko 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

userfaultfd: clear flag if remap event not enabled

2019-01-26T08:32:43+00:00

[ Upstream commit 3cfd22be0ad663248fadfc8f6ffa3e255c394552 ]

When the process being tracked does mremap() without
UFFD_FEATURE_EVENT_REMAP on the corresponding tracking uffd file handle,
we should not generate the remap event, and at the same time we should
clear all the uffd flags on the new VMA.  Without this patch, we can still
have the VM_UFFD_MISSING|VM_UFFD_WP flags on the new VMA even the fault
handling process does not even know the existance of the VMA.

Link: http://lkml.kernel.org/r/20181211053409.20317-1-peterx@redhat.com
Signed-off-by: Peter Xu 
Reviewed-by: Andrea Arcangeli 
Acked-by: Mike Rapoport 
Reviewed-by: William Kucharski 
Cc: Andrea Arcangeli 
Cc: Mike Rapoport 
Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Pavel Emelyanov 
Cc: Pravin Shedge 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin

userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

2018-12-19T18:19:50+00:00

commit 01e881f5a1fca4677e82733061868c6d6ea05ca7 upstream.

Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON.  Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.

Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com
Cc: 
Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas")
Signed-off-by: Andrea Arcangeli 
Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com
Acked-by: Mike Rapoport 
Acked-by: Hugh Dickins 
Acked-by: Peter Xu 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas

2018-12-05T18:32:04+00:00

commit 29ec90660d68bbdd69507c1c8b4e33aa299278b1 upstream.

After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration.  This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.

The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.

Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
Fixes: ff62a3421044 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli 
Reviewed-by: Mike Rapoport 
Reviewed-by: Hugh Dickins 
Reported-by: Jann Horn 
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: 
Cc: "Dr. David Alan Gilbert" 
Cc: Mike Kravetz 
Cc: Peter Xu 
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

userfaultfd: disable irqs when taking the waitqueue lock

2018-11-13T19:08:46+00:00

commit ae62c16e105a869524afcf8a07ee85c5ae5d0479 upstream.

userfaultfd contains howe-grown locking of the waitqueue lock, and does
not disable interrupts.  This relies on the fact that no one else takes it
from interrupt context and violates an invariat of the normal waitqueue
locking scheme.  With aio poll it is easy to trigger other locks that
disable interrupts (or are called from interrupt context).

Link: http://lkml.kernel.org/r/20181018154101.18750-1-hch@lst.de
Signed-off-by: Christoph Hellwig 
Reviewed-by: Andrea Arcangeli 
Reviewed-by: Andrew Morton 
Cc: 	[4.19.x]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: Change return type int to vm_fault_t for fault handlers

2018-08-24T01:48:44+00:00

Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

The aim is to change the return type of finish_fault() and
handle_mm_fault() to vm_fault_t type.  As part of that clean up return
type of all other recursively called functions have been changed to
vm_fault_t type.

The places from where handle_mm_fault() is getting invoked will be
change to vm_fault_t type but in a separate patch.

vmf_error() is the newly introduce inline function in 4.17-rc6.

[akpm@linux-foundation.org: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()]
Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
Reviewed-by: Andrew Morton 
Cc: Matthew Wilcox 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

userfaultfd: use fault_wqh lock

2018-08-22T17:52:47+00:00

The userfaultfd code currently uses the unlocked waitqueue helpers for
managing fault_wqh, but instead of holding the waitqueue lock for this
waitqueue around these calls, it the waitqueue lock of
fault_pending_wq, which is a different waitqueue instance.  Given that
the waitqueue is not exposed to the rest of the kernel this actually
works ok at the moment, but prevents the userfaultfd locking rules from
being enforced using lockdep.

Switch to the internally locked waitqueue helpers instead.  This means
that the lock inside fault_wqh now nests inside the fault_pending_wqh
lock, but that's not a problem since it was entirely unused before.

[hch@lst.de: slight changelog updates]
[rppt@linux.vnet.ibm.com: spotted changelog spellos]
Link: http://lkml.kernel.org/r/20171214152344.6880-3-hch@lst.de
Signed-off-by: Matthew Wilcox 
Signed-off-by: Christoph Hellwig 
Reviewed-by: Mike Rapoport 
Cc: Al Viro 
Cc: Andrea Arcangeli 
Cc: Ingo Molnar 
Cc: Jason Baron 
Cc: Peter Zijlstra 
Cc: Davidlohr Bueso 
Cc: Matthew Wilcox 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs/userfaultfd.c: remove redundant pointer uwq

2018-08-17T23:20:32+00:00

Pointer uwq is being assigned but is never used hence it is redundant
and can be removed.

Cleans up clang warning:
  warning: variable 'uwq' set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/20180717090802.18357-1-colin.king@canonical.com
Signed-off-by: Colin Ian King 
Reviewed-by: Andrew Morton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds