linux-stable.git/arch/x86/include, branch linux-4.5.y

x86/mm/xen: Suppress hugetlbfs in PV guests

2016-05-04T21:49:01+00:00

commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode: 0000 [#1] SMP
  ...
  RIP: e030:[]  [] remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [] hugetlbfs_evict_inode+0x15/0x40
   [] evict+0xbd/0x1b0
   [] __dentry_kill+0x19a/0x1f0
   [] dput+0x1fe/0x220
   [] __fput+0x155/0x200
   [] task_work_run+0x60/0xa0
   [] do_exit+0x160/0x400
   [] do_group_exit+0x3b/0xa0
   [] get_signal+0x1ed/0x470
   [] do_signal+0x14/0x110
   [] prepare_exit_to_usermode+0xe9/0xf0
   [] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov 
Signed-off-by: Jan Beulich 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: David Vrabel 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Luis R. Rodriguez 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Toshi Kani 
Cc: xen-devel 
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

KVM: x86: reduce default value of halt_poll_ns parameter

2016-04-20T06:45:18+00:00

commit 14ebda3394fd3e5388747e742e510b0802a65d24 upstream.

Windows lets applications choose the frequency of the timer tick,
and in Windows 10 the maximum rate was changed from 1024 Hz to
2048 Hz.  Unfortunately, because of the way the Windows API
works, most applications who need a higher rate than the default
64 Hz will just do

   timeGetDevCaps(&tc, sizeof(tc));
   timeBeginPeriod(tc.wPeriodMin);

and pick the maximum rate.  This causes very high CPU usage when
playing media or games on Windows 10, even if the guest does not
actually use the CPU very much, because the frequent timer tick
causes halt_poll_ns to kick in.

There is no really good solution, especially because Microsoft
could sooner or later bump the limit to 4096 Hz, but for now
the best we can do is lower a bit the upper limit for
halt_poll_ns. :-(

Reported-by: Jon Panozzo 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

x86/iopl/64: Properly context-switch IOPL on Xen PV

2016-04-12T14:33:16+00:00

commit b7a584598aea7ca73140cb87b40319944dd3393f upstream.

On Xen PV, regs->flags doesn't reliably reflect IOPL and the
exit-to-userspace code doesn't change IOPL.  We need to context
switch it manually.

I'm doing this without going through paravirt because this is
specific to Xen PV.  After the dust settles, we can merge this with
the 32-bit code, tidy up the iopl syscall implementation, and remove
the set_iopl pvop entirely.

Fixes XSA-171.

Reviewewd-by: Jan Beulich 
Signed-off-by: Andy Lutomirski 
Cc: Andrew Cooper 
Cc: Andy Lutomirski 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: David Vrabel 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jan Beulich 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/693c3bd7aeb4d3c27c92c622b7d0f554a458173c.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()

2016-04-12T14:33:16+00:00

commit 7834c10313fb823e538f2772be78edcdeed2e6e3 upstream.

Since 4.4, I've been able to trigger this occasionally:

===============================
[ INFO: suspicious RCU usage. ]
4.5.0-rc7-think+ #3 Not tainted
Cc: Andi Kleen 
Link: http://lkml.kernel.org/r/20160315012054.GA17765@codemonkey.org.uk
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 

-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/3/0.

stack backtrace:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc7-think+ #3
 ffffffff92f821e0 1f3e5c340597d7fc ffff880468e07f10 ffffffff92560c2a
 ffff880462145280 0000000000000001 ffff880468e07f40 ffffffff921376a6
 ffffffff93665ea0 0000cc7c876d28da 0000000000000005 ffffffff9383dd60
Call Trace:
   [] dump_stack+0x67/0x9d
 [] lockdep_rcu_suspicious+0xe6/0x100
 [] do_trace_write_msr+0x127/0x1a0
 [] native_apic_msr_eoi_write+0x23/0x30
 [] smp_trace_call_function_interrupt+0x38/0x360
 [] trace_call_function_interrupt+0x90/0xa0
   [] ? cpuidle_enter_state+0x1b4/0x520

Move the entering_irq() call before ack_APIC_irq(), because entering_irq()
tells the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly.

Suggested-by: Andi Kleen 
Fixes: 4787c368a9bc "x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()"
Signed-off-by: Dave Jones 
Signed-off-by: Thomas Gleixner

x86/irq: Cure live lock in fixup_irqs()

2016-04-12T14:33:16+00:00

commit 551adc60573cb68e3d55cacca9ba1b7437313df7 upstream.

Harry reported, that he's able to trigger a system freeze with cpu hot
unplug. The freeze turned out to be a live lock caused by recent changes in
irq_force_complete_move().

When fixup_irqs() and from there irq_force_complete_move() is called on the
dying cpu, then all other cpus are in stop machine an wait for the dying cpu
to complete the teardown. If there is a move of an interrupt pending then
irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
mask and waits for them to clear the mask. That's obviously impossible as
those cpus are firmly stuck in stop machine with interrupts disabled.

I should have known that, but I completely overlooked it being concentrated on
the locking issues around the vectors. And the existance of the call to
__irq_complete_move() in the code, which actually sends the cleanup IPI made
it reasonable to wait for that cleanup to complete. That call was bogus even
before the recent changes as it was just a pointless distraction.

We have to look at two cases:

1) The move_in_progress flag of the interrupt is set

   This means the ioapic has been updated with the new vector, but it has not
   fired yet. In theory there is a race:

   set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
   			      i.e. it's raised on the old vector.

   So if the target cpu cannot handle that interrupt before the old vector is
   cleaned up, we get a spurious interrupt and in the worst case the ioapic
   irq line becomes stale, but my experiments so far have only resulted in
   spurious interrupts.

   But in case of cpu hotplug this should be a non issue because if the
   affinity update happens right before all cpus rendevouz in stop machine,
   there is no way that the interrupt can be blocked on the target cpu because
   all cpus loops first with interrupts enabled in stop machine, so the old
   vector is not yet cleaned up when the interrupt fires.

   So the only way to run into this issue is if the delivery of the interrupt
   on the apic/system bus would be delayed beyond the point where the target
   cpu disables interrupts in stop machine. I doubt that it can happen, but at
   least there is a theroretical chance. Virtualization might be able to
   expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
   hardware.

   I've spent quite some time over the weekend to enforce that situation,
   though I was not able to trigger the delayed case.

2) The move_in_progress flag is not set and the old_domain cpu mask is not
   empty.

   That means, that an interrupt was delivered after the change and the
   cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
   responded to it yet.

In both cases we can assume that the next interrupt will arrive on the new
vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
mask.

Fixes: 98229aa36caa "x86/irq: Plug vector cleanup race"
Reported-by: Harry Junior 
Tested-by: Tony Luck 
Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Joe Lawrence 
Cc: Borislav Petkov 
Cc: Ben Hutchings 
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

perf/x86/intel: Add definition for PT PMI bit

2016-04-12T14:33:09+00:00

commit 5690ae28e472d25e330ad0c637a5cea3fc39fb32 upstream.

This patch adds a definition for GLOBAL_OVFL_STATUS bit 55
which is used with the Processor Trace (PT) feature.

Signed-off-by: Stephane Eranian 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86/microcode: Untangle from BLK_DEV_INITRD

2016-04-12T14:33:09+00:00

commit 5f9c01aa7c49a2d74474d6d879a797b8badf29e6 upstream.

Thomas Voegtle reported that doing oldconfig with a .config which has
CONFIG_MICROCODE enabled but BLK_DEV_INITRD disabled prevents the
microcode loading mechanism from being built.

So untangle it from the BLK_DEV_INITRD dependency so that oldconfig
doesn't turn it off and add an explanatory text to its Kconfig help what
the supported methods for supplying microcode are.

Reported-by: Thomas Voegtle 
Tested-by: Thomas Voegtle 
Signed-off-by: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1454499225-21544-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")

2016-03-10T09:15:58+00:00

Leonid Shatz noticed that the SDM interpretation of the following
recent commit:

  394db20ca240741 ("x86/fpu: Disable AVX when eagerfpu is off")

... is incorrect and that the original behavior of the FPU code was correct.

Because AVX is not stated in CR0 TS bit description, it was mistakenly
believed to be not supported for lazy context switch. This turns out
to be false:

  Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:

   'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
    MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
    an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
    by the new task.'

  Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
  Specification:

   'AVX instructions refer to exceptions by classes that include #NM
    "Device Not Available" exception for lazy context switch.'

So revert the commit.

Reported-by: Leonid Shatz 
Signed-off-by: Yu-cheng Yu 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Fenghua Yu 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Ravi V. Shankar 
Cc: Sai Praneeth Prakhya 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar

Merge tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

2016-02-27T20:33:42+00:00

Pull PCI fixes from Bjorn Helgaas:
 "Enumeration:
    Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)

  Marvell MVEBU host bridge driver:
    Restrict build to 32-bit ARM (Thierry Reding)"

* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: mvebu: Restrict build to 32-bit ARM
  Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
  Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
  Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"

Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"

2016-02-27T14:52:20+00:00

991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") appeared in v4.3 and helps support IOAPIC hotplug.

Олег reported that the Elcus-1553 TA1-PCI driver worked in v4.2 but not
v4.3 and bisected it to 991de2e59090.  Sunjin reported that the RocketRAID
272x driver worked in v4.2 but not v4.3.  In both cases booting with
"pci=routirq" is a workaround.

I think the problem is that after 991de2e59090, we no longer call
pcibios_enable_irq() for upstream bridges.  Prior to 991de2e59090, when a
driver called pci_enable_device(), we recursively called
pcibios_enable_irq() for upstream bridges via pci_enable_bridge().

After 991de2e59090, we call pcibios_enable_irq() from pci_device_probe()
instead of the pci_enable_device() path, which does *not* call
pcibios_enable_irq() for upstream bridges.

Revert 991de2e59090 to fix these driver regressions.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Reported-and-tested-by: Олег Мороз 
Reported-by: Sunjin Yang 
Signed-off-by: Bjorn Helgaas 
Acked-by: Rafael J. Wysocki 
CC: Jiang Liu