linux-stable.git/arch/x86, branch linux-4.1.y

x86/smpboot: Don't use mwait_play_dead() on AMD systems

2018-05-23T01:36:39+00:00

[ Upstream commit da6fa7ef67f07108a1b0cb9fd9e7fcaabd39c051 ]

Recent AMD systems support using MWAIT for C1 state. However, MWAIT will
not allow deeper cstates than C1 on current systems.

play_dead() expects to use the deepest state available.  The deepest state
available on AMD systems is reached through SystemIO or HALT. If MWAIT is
available, it is preferred over the other methods, so the CPU never reaches
the deepest possible state.

Don't try to use MWAIT to play_dead() on AMD systems. Instead, use CPUIDLE
to enter the deepest state advertised by firmware. If CPUIDLE is not
available then fallback to HALT.

Signed-off-by: Yazen Ghannam 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Cc: stable@vger.kernel.org
Cc: Yazen Ghannam 
Link: https://lkml.kernel.org/r/20180403140228.58540-1-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin

x86/tsc: Prevent 32bit truncation in calc_hpet_ref()

2018-05-23T01:36:35+00:00

[ Upstream commit d3878e164dcd3925a237a20e879432400e369172 ]

The TSC calibration code uses HPET as reference. The conversion normalizes
the delta of two HPET timestamps:

    hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6

and then divides the normalized delta of the corresponding TSC timestamps
by the result to calulate the TSC frequency.

    tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref

This uses do_div() which takes an u32 as the divisor, which worked so far
because the HPET frequency was low enough that 'hpetref' never exceeded
32bit.

On Skylake machines the HPET frequency increased so 'hpetref' can exceed
32bit. do_div() truncates the divisor, which causes the calibration to
fail.

Use div64_u64() to avoid the problem.

[ tglx: Fixes whitespace mangled patch and rewrote changelog ]

Signed-off-by: Xiaoming Gao 
Signed-off-by: Thomas Gleixner 
Cc: stable@vger.kernel.org
Cc: peterz@infradead.org
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.com
Signed-off-by: Sasha Levin

um: Use POSIX ucontext_t instead of struct ucontext

2018-05-23T01:36:33+00:00

[ Upstream commit 4d1a535b8ec5e74b42dfd9dc809142653b2597f6 ]

glibc 2.26 removed the 'struct ucontext' to "improve" POSIX compliance
and break programs, including User Mode Linux. Fix User Mode Linux
by using POSIX ucontext_t.

This fixes:

arch/um/os-Linux/signal.c: In function 'hard_handler':
arch/um/os-Linux/signal.c:163:22: error: dereferencing pointer to incomplete type 'struct ucontext'
  mcontext_t *mc = &uc->uc_mcontext;
arch/x86/um/stub_segv.c: In function 'stub_segv_handler':
arch/x86/um/stub_segv.c:16:13: error: dereferencing pointer to incomplete type 'struct ucontext'
          &uc->uc_mcontext);

Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Mazur 
Signed-off-by: Richard Weinberger 
Signed-off-by: Sasha Levin

KVM: SVM: do not zero out segment attributes if segment is unusable or not present

2018-05-23T01:36:27+00:00

[ Upstream commit d9c1b5431d5f0e07575db785a022bce91051ac1d ]

This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
was taken on userspace stack.  The root cause lies in the specific AMD CPU
behaviour which manifests itself as unusable segment attributes on SYSRET.
The corresponding work around for the kernel is the following:

61f01dd941ba ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")

In other turn virtualization side treated unusable segment incorrectly and
restored CPL from SS attributes, which were zeroed out few lines above.

In current patch it is assured only that P bit is cleared in VMCB.save state
and segment attributes are not zeroed out if segment is not presented or is
unusable, therefore CPL can be safely restored from DPL field.

This is only one part of the fix, since QEMU side should be fixed accordingly
not to zero out attributes on its side.  Corresponding patch will follow.

[1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com

Signed-off-by: Roman Pen 
Signed-off-by: Mikhail Sennikovskii 
Cc: Paolo Bonzini 
Cc: Radim KrÄmÃ¡Å™ 
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: Sasha Levin

KVM: nVMX: Fix handling of lmsw instruction

2018-05-23T01:36:26+00:00

[ Upstream commit e1d39b17e044e8ae819827810d87d809ba5f58c0 ]

The decision whether or not to exit from L2 to L1 on an lmsw instruction is
based on bogus values: instead of using the information encoded within the
exit qualification, it uses the data also used for the mov-to-cr
instruction, which boils down to using whatever is in %eax at that point.

Use the correct values instead.

Without this fix, an L1 may not get notified when a 32-bit Linux L2
switches its secondary CPUs to protected mode; the L1 is only notified on
the next modification of CR0. This short time window poses a problem, when
there is some other reason to exit to L1 in between. Then, L2 will be
resumed in real mode and chaos ensues.

Signed-off-by: Jan H. Schönherr 
Reviewed-by: Wanpeng Li 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Sasha Levin

x86/tsc: Provide 'tsc=unstable' boot parameter

2018-05-23T01:36:24+00:00

[ Upstream commit 8309f86cd41e8714526867177facf7a316d9be53 ]

Since the clocksource watchdog will only detect broken TSC after the
fact, all TSC based clocks will likely have observed non-continuous
values before/when switching away from TSC.

Therefore only thing to fully avoid random clock movement when your
BIOS randomly mucks with TSC values from SMI handlers is reporting the
TSC as unstable at boot.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin

crypto: x86/cast5-avx - fix ECB encryption when long sg follows short one

2018-05-23T01:33:57+00:00

[ Upstream commit 8f461b1e02ed546fbd0f11611138da67fd85a30f ]

With ecb-cast5-avx, if a 128+ byte scatterlist element followed a
shorter one, then the algorithm accidentally encrypted/decrypted only 8
bytes instead of the expected 128 bytes.  Fix it by setting the
encryption/decryption 'fn' correctly.

Fixes: c12ab20b162c ("crypto: cast5/avx - avoid using temporary stack buffers")
Cc:  # v3.8+
Signed-off-by: Eric Biggers 
Signed-off-by: Herbert Xu 
Signed-off-by: Sasha Levin

kprobes/x86: Fix to set RWX bits correctly before releasing trampoline

2018-05-23T01:33:55+00:00

[ Upstream commit c93f5cf571e7795f97d49ef51b766cf25e328545 ]

Fix kprobes to set(recover) RWX bits correctly on trampoline
buffer before releasing it. Releasing readonly page to
module_memfree() crash the kernel.

Without this fix, if kprobes user register a bunch of kprobes
in function body (since kprobes on function entry usually
use ftrace) and unregister it, kernel hits a BUG and crash.

Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devbox

Signed-off-by: Masami Hiramatsu 
Fixes: d0381c81c2f7 ("kprobes/x86: Set kprobes pages read-only")
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Sasha Levin

bpf, x64: increase number of passes

2018-05-23T01:33:54+00:00

[ Upstream commit 6007b080d2e2adb7af22bf29165f0594ea12b34c ]

In Cilium some of the main programs we run today are hitting 9 passes
on x64's JIT compiler, and we've had cases already where we surpassed
the limit where the JIT then punts the program to the interpreter
instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON
or insertion failures due to the prog array owner being JITed but the
program to insert not (both must have the same JITed/non-JITed property).

One concrete case the program image shrunk from 12,767 bytes down to
10,288 bytes where the image converged after 16 steps. I've measured
that this took 340us in the JIT until it converges on my i7-6600U. Thus,
increase the original limit we had from day one where the JIT covered
cBPF only back then before we run into the case (as similar with the
complexity limit) where we trip over this and hit program rejections.
Also add a cond_resched() into the compilation loop, the JIT process
runs without any locks and may sleep anyway.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Reviewed-by: Eric Dumazet 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Sasha Levin

perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()

2018-05-23T01:33:54+00:00

[ Upstream commit e5ea9b54a055619160bbfe527ebb7d7191823d66 ]

We intended to clear the lowest 6 bits but because of a type bug we
clear the high 32 bits as well.  Andi says that periods are rarely more
than U32_MAX so this bug probably doesn't have a huge runtime impact.

Signed-off-by: Dan Carpenter 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: H. Peter Anvin 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sebastian Andrzej Siewior 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Fixes: 294fe0f52a44 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwanda
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin