linux-stable.git/arch/x86/include, branch linux-5.0.y

x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation

2019-05-31T13:45:01+00:00

[ Upstream commit 6ae865615fc43d014da2fd1f1bba7e81ee622d1b ]

The __put_user() macro evaluates it's @ptr argument inside the
__uaccess_begin() / __uaccess_end() region. While this would normally
not be expected to be an issue, an UBSAN bug (it ignored -fwrapv,
fixed in GCC 8+) would transform the @ptr evaluation for:

  drivers/gpu/drm/i915/i915_gem_execbuffer.c: if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) {

into a signed-overflow-UB check and trigger the objtool AC validation.

Finish this commit:

  2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation")

and explicitly evaluate all 3 arguments early.

Reported-by: Randy Dunlap 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Randy Dunlap  # build-tested
Acked-by: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: luto@kernel.org
Fixes: 2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation")
Link: http://lkml.kernel.org/r/20190424072208.695962771@infradead.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin

x86: Hide the int3_emulate_call/jmp functions from UML

2019-05-31T13:44:44+00:00

commit 693713cbdb3a4bda5a8a678c31f06560bbb14657 upstream.

User Mode Linux does not have access to the ip or sp fields of the pt_regs,
and accessing them causes UML to fail to build. Hide the int3_emulate_jmp()
and int3_emulate_call() instructions from UML, as it doesn't need them
anyway.

Reported-by: kbuild test robot 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

x86/mpx, mm/core: Fix recursive munmap() corruption

2019-05-25T16:22:14+00:00

commit 5a28fc94c9143db766d1ba5480cae82d856ad080 upstream.

This is a bit of a mess, to put it mildly.  But, it's a bug
that only seems to have showed up in 4.20 but wasn't noticed
until now, because nobody uses MPX.

MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory.  When
memory is unmapped, there is also a need to unmap the MPX
bounds tables.  Barring this, unused bounds tables can eat 80%
of the address space.

But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state.  It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.

See the "long story" further below for the gory details.

To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful.  Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.

== UML / unicore32 impact ==

Remove unused 'vma' argument to arch_unmap().  No functional
change.

I compile tested this on UML but not unicore32.

== powerpc impact ==

powerpc uses arch_unmap() well to watch for munmap() on the
VDSO and zeroes out 'current->mm->context.vdso_base'.  Moving
arch_unmap() makes this happen earlier in __do_munmap().  But,
'vdso_base' seems to only be used in perf and in the signal
delivery that happens near the return to userspace.  I can not
find any likely impact to powerpc, other than the zeroing
happening a little earlier.

powerpc does not use the 'vma' argument and is unaffected by
its removal.

I compile-tested a 64-bit powerpc defconfig.

== x86 impact ==

For the common success case this is functionally identical to
what was there before.  For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use.  But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).

I can't imagine anyone doing this:

	ptr = mmap();
	// use ptr
	ret = munmap(ptr);
	if (ret)
		// oh, there was an error, I'll
		// keep using ptr.

Because if you're doing munmap(), you are *done* with the
memory.  There's probably no good data in there _anyway_.

This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.

The long story:

munmap() has a couple of pieces:

 1. Find the affected VMA(s)
 2. Split the start/end one(s) if neceesary
 3. Pull the VMAs out of the rbtree
 4. Actually zap the memory via unmap_region(), including
    freeing page tables (or queueing them to be freed).
 5. Fix up some of the accounting (like fput()) and actually
    free the VMA itself.

This specific ordering was actually introduced by:

  dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")

during the 4.20 merge window.  The previous __do_munmap() code
was actually safe because the only thing after arch_unmap() was
remove_vma_list().  arch_unmap() could not see 'vma' in the
rbtree because it was detached, so it is not even capable of
doing operations unsafe for remove_vma_list()'s use of 'vma'.

Richard Biener reported a test that shows this in dmesg:

  [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
  [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576

What triggered this was the recursive do_munmap() called via
arch_unmap().  It was freeing page tables that has not been
properly zapped.

But, the problem was bigger than this.  For one, arch_unmap()
can free VMAs.  But, the calling __do_munmap() has variables
that *point* to VMAs and obviously can't handle them just
getting freed while the pointer is still in use.

I tried a couple of things here.  First, I tried to fix the page
table freeing problem in isolation, but I then found the VMA
issue.  I also tried having the MPX code return a flag if it
modified the rbtree which would force __do_munmap() to re-walk
to restart.  That spiralled out of control in complexity pretty
fast.

Just moving arch_unmap() and accepting that the bonkers failure
case might eat some bounds tables seems like the simplest viable
fix.

This was also reported in the following kernel bugzilla entry:

  https://bugzilla.kernel.org/show_bug.cgi?id=203123

There are some reports that this commit triggered this bug:

  dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap")

While that commit certainly made the issues easier to hit, I believe
the fundamental issue has been with us as long as MPX itself, thus
the Fixes: tag below is for one of the original MPX commits.

[ mingo: Minor edits to the changelog and the patch. ]

Reported-by: Richard Biener 
Reported-by: H.J. Lu 
Signed-off-by: Dave Hansen 
Reviewed-by Thomas Gleixner 
Reviewed-by: Yang Shi 
Acked-by: Michael Ellerman 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Anton Ivanov 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: Guan Xuetao 
Cc: H. Peter Anvin 
Cc: Jeff Dike 
Cc: Linus Torvalds 
Cc: Michal Hocko 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Richard Weinberger 
Cc: Rik van Riel 
Cc: Vlastimil Babka 
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-um@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@vger.kernel.org
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86_64: Allow breakpoints to emulate call instructions

2019-05-25T16:22:11+00:00

commit 4b33dadf37666c0860b88f9e52a16d07bf6d0b03 upstream.

In order to allow breakpoints to emulate call instructions, they need to push
the return address onto the stack. The x86_64 int3 handler adds a small gap
to allow the stack to grow some. Use this gap to add the return address to
be able to emulate a call instruction at the breakpoint location.

These helper functions are added:

  int3_emulate_jmp(): changes the location of the regs->ip to return there.

 (The next two are only for x86_64)
  int3_emulate_push(): to push the address onto the gap in the stack
  int3_emulate_call(): push the return address and change regs->ip

Cc: Andy Lutomirski 
Cc: Nicolai Stange 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: the arch/x86 maintainers 
Cc: Josh Poimboeuf 
Cc: Jiri Kosina 
Cc: Miroslav Benes 
Cc: Petr Mladek 
Cc: Joe Lawrence 
Cc: Shuah Khan 
Cc: Konrad Rzeszutek Wilk 
Cc: Tim Chen 
Cc: Sebastian Andrzej Siewior 
Cc: Mimi Zohar 
Cc: Juergen Gross 
Cc: Nick Desaulniers 
Cc: Nayna Jain 
Cc: Masahiro Yamada 
Cc: Joerg Roedel 
Cc: "open list:KERNEL SELFTEST FRAMEWORK" 
Cc: stable@vger.kernel.org
Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching")
Tested-by: Nicolai Stange 
Reviewed-by: Nicolai Stange 
Reviewed-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
[ Modified to only work for x86_64 and added comment to int3_emulate_push() ]
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Greg Kroah-Hartman

mm/gup: Remove the 'write' parameter from gup_fast_permitted()

2019-05-25T16:21:57+00:00

commit ad8cfb9c42ef83ecf4079bc7d77e6557648e952b upstream.

The 'write' parameter is unused in gup_fast_permitted() so remove it.

Signed-off-by: Ira Weiny 
Acked-by: Kirill A. Shutemov 
Reviewed-by: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Borislav Petkov 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20190210223424.13934-1-ira.weiny@intel.com
Signed-off-by: Ingo Molnar 
Cc: Justin Forbes 
Signed-off-by: Greg Kroah-Hartman

x86/MCE: Group AMD function prototypes in

2019-05-22T05:38:37+00:00

commit 9308fd4074551f222f30322d1ee8c5aff18e9747 upstream.

There are two groups of "ifdef CONFIG_X86_MCE_AMD" function prototypes
in . Merge these two groups.

No functional change.

 [ bp: align vertically. ]

Signed-off-by: Yazen Ghannam 
Signed-off-by: Borislav Petkov 
Cc: Arnd Bergmann 
Cc: "clemej@gmail.com" 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Pu Wen 
Cc: Qiuxu Zhuo 
Cc: "rafal@milecki.pl" 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: Vishal Verma 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190322202848.20749-3-Yazen.Ghannam@amd.com
Signed-off-by: Greg Kroah-Hartman

sched/x86: Save [ER]FLAGS on context switch

2019-05-22T05:38:36+00:00

commit 6690e86be83ac75832e461c141055b5d601c0a6d upstream.

Effectively reverts commit:

  2c7577a75837 ("sched/x86_64: Don't save flags on context switch")

Specifically because SMAP uses FLAGS.AC which invalidates the claim
that the kernel has clean flags.

In particular; while preemption from interrupt return is fine (the
IRET frame on the exception stack contains FLAGS) it breaks any code
that does synchonous scheduling, including preempt_enable().

This has become a significant issue ever since commit:

  5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")

provided for means of having 'normal' C code between STAC / CLAC,
exposing the FLAGS.AC state. So far this hasn't led to trouble,
however fix it before it comes apart.

Reported-by: Julien Thierry 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: stable@kernel.org
Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

KVM: nVMX: always use early vmcs check when EPT is disabled

2019-05-16T17:40:20+00:00

[ Upstream commit 2b27924bb1d48e3775f432b70bdad5e6dd4e7798 ]

The remaining failures of vmx.flat when EPT is disabled are caused by
incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
from the vmcs01.

For simplicity let's just always use hardware VMCS checks when EPT is
disabled.  This way, nested_vmx_restore_host_state is not reached at
all (or at least shouldn't be reached).

Signed-off-by: Paolo Bonzini 
Signed-off-by: Sasha Levin

x86/speculation/mds: Add mitigation mode VMWERV

2019-05-14T17:17:11+00:00

commit 22dd8365088b6403630b82423cf906491859b65e upstream

In virtualized environments it can happen that the host has the microcode
update which utilizes the VERW instruction to clear CPU buffers, but the
hypervisor is not yet updated to expose the X86_FEATURE_MD_CLEAR CPUID bit
to guests.

Introduce an internal mitigation mode VMWERV which enables the invocation
of the CPU buffer clearing even if X86_FEATURE_MD_CLEAR is not set. If the
system has no updated microcode this results in a pointless execution of
the VERW instruction wasting a few CPU cycles. If the microcode is updated,
but not exposed to a guest then the CPU buffers will be cleared.

That said: Virtual Machines Will Eventually Receive Vaccine

Signed-off-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Reviewed-by: Jon Masters 
Tested-by: Jon Masters 
Signed-off-by: Greg Kroah-Hartman

x86/speculation/mds: Add mitigation control for MDS

2019-05-14T17:17:11+00:00

commit bc1241700acd82ec69fde98c5763ce51086269f8 upstream

Now that the mitigations are in place, add a command line parameter to
control the mitigation, a mitigation selector function and a SMT update
mechanism.

This is the minimal straight forward initial implementation which just
provides an always on/off mode. The command line parameter is:

  mds=[full|off]

This is consistent with the existing mitigations for other speculative
hardware vulnerabilities.

The idle invocation is dynamically updated according to the SMT state of
the system similar to the dynamic update of the STIBP mitigation. The idle
mitigation is limited to CPUs which are only affected by MSBDS and not any
other variant, because the other variants cannot be mitigated on SMT
enabled systems.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Reviewed-by: Jon Masters 
Tested-by: Jon Masters 
Signed-off-by: Greg Kroah-Hartman