linux-stable.git/arch/x86/include, branch linux-3.13.y

x86: fix boot on uniprocessor systems

2014-04-03T19:02:36+00:00

commit 825600c0f20e595daaa7a6dd8970f84fa2a2ee57 upstream.

On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too.  Enabling them also fixes
rapl_pmu code.

Signed-off-by: Artem Fetishev 
Cc: Stephane Eranian 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

2014-04-03T19:02:35+00:00

commit 5926f87fdaad4be3ed10cec563bf357915e55a86 upstream.

This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.

PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.

This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.
This canonicalisation is only done for PTEs that are present.

This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
_PAGE_NUMA) was set but _PAGE_PRESENT was clear.  These PTEs would be
migrated as-is which would result in unexpected behaviour in the
destination domain.  Either a) the MFN would be translated to the
wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
because the MFN is no longer owned by the domain; or c) the present
bit would not get set.

Symptoms include "Bad page" reports when munmapping after migrating a
domain.

Signed-off-by: David Vrabel 
Acked-by: Konrad Rzeszutek Wilk 
Signed-off-by: Greg Kroah-Hartman

xen: properly account for _PAGE_NUMA during xen pte translations

2014-02-22T21:34:40+00:00

commit a9c8e4beeeb64c22b84c803747487857fe424b68 upstream.

Steven Noonan forwarded a users report where they had a problem starting
vsftpd on a Xen paravirtualized guest, with this in dmesg:

  BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
  page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
  page flags: 0x2ffc0000000014(referenced|dirty)
  addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
  CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
  Call Trace:
    dump_stack+0x45/0x56
    print_bad_pte+0x22e/0x250
    unmap_single_vma+0x583/0x890
    unmap_vmas+0x65/0x90
    exit_mmap+0xc5/0x170
    mmput+0x65/0x100
    do_exit+0x393/0x9e0
    do_group_exit+0xcc/0x140
    SyS_exit_group+0x14/0x20
    system_call_fastpath+0x1a/0x1f
  Disabling lock debugging due to kernel taint
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1

The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.  He
bisected the problem to commit 1667918b6483 ("mm: numa: clear numa
hinting information on mprotect") that was also included in 3.12-stable.

The problem was related to how xen translates ptes because it was not
accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
a pteval_present helper for use by xen so both bare metal and xen use
the same code when checking if a PTE is present.

[mgorman@suse.de: wrote changelog, proposed minor modifications]
[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Steven Noonan 
Tested-by: Steven Noonan 
Signed-off-by: Elena Ufimtseva 
Signed-off-by: Mel Gorman 
Reviewed-by: David Vrabel 
Acked-by: Konrad Rzeszutek Wilk 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm: don't lose the SOFT_DIRTY flag on mprotect

2014-02-13T21:55:29+00:00

commit 24f91eba18bbfdb27e71a1aae5b3a61b67fcd091 upstream.

The SOFT_DIRTY bit shows that the content of memory was changed after a
defined point in the past.  mprotect() doesn't change the content of
memory, so it must not change the SOFT_DIRTY bit.

This bug causes a malfunction: on the first iteration all pages are
dumped.  On other iterations only pages with the SOFT_DIRTY bit are
dumped.  So if the SOFT_DIRTY bit is cleared from a page by mistake, the
page is not dumped and its content will be restored incorrectly.

This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify()
is called only for present pages.

Fixes commit 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes
tracking").

Signed-off-by: Andrey Vagin 
Acked-by: Cyrill Gorcunov 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Pavel Emelyanov 
Cc: Borislav Petkov 
Cc: Wen Congyang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

x86, cpu, amd: Add workaround for family 16h, erratum 793

2014-02-06T19:34:12+00:00

commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream.

This adds the workaround for erratum 793 as a precaution in case not
every BIOS implements it.  This addresses CVE-2013-6885.

Erratum text:

[Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
document 51810 Rev. 3.04 November 2013]

793 Specific Combination of Writes to Write Combined Memory Types and
Locked Instructions May Cause Core Hang

Description

Under a highly specific and detailed set of internal timing
conditions, a locked instruction may trigger a timing sequence whereby
the write to a write combined memory type is not flushed, causing the
locked instruction to stall indefinitely.

Potential Effect on System

Processor core hang.

Suggested Workaround

BIOS should set MSR
C001_1020[15] = 1b.

Fix Planned

No fix planned

[ hpa: updated description, fixed typo in MSR name ]

Signed-off-by: Borislav Petkov 
Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic
Tested-by: Aravind Gopalakrishnan 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Greg Kroah-Hartman

x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101

2014-02-06T19:33:51+00:00

commit 77f01bdfa5e55dc19d3eb747181d2730a9bb3ca8 upstream.

When Hyper-V hypervisor leaves are present, KVM must relocate
its own leaves at 0x40000100, because Windows does not look for
Hyper-V leaves at indices other than 0x40000000.  In this case,
the KVM features are at 0x40000101, but the old code would always
look at 0x40000001.

Fix by using kvm_cpuid_base().  This also requires making the
function non-inline, since kvm_cpuid_base() is static.

Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d
Cc: mtosatti@redhat.com
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

x86, kvm: cache the base of the KVM cpuid leaves

2014-02-06T19:33:51+00:00

commit 1c300a40772dae829b91dad634999a6a522c0829 upstream.

It is unnecessary to go through hypervisor_cpuid_base every time
a leaf is found (which will be every time a feature is requested
after the next patch).

Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d
Cc: mtosatti@redhat.com
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

2014-01-12T03:15:52+00:00

Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog 
Tested-by: Borislav Petkov 
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin

mm: fix TLB flush race between migration, and change_protection_range

2013-12-19T03:04:51+00:00

There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel 
Signed-off-by: Mel Gorman 
Cc: Alex Thorlton 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

sched: Remove PREEMPT_NEED_RESCHED from generic code

2013-12-11T14:52:32+00:00

While hunting a preemption issue with Alexander, Ben noticed that the
currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for
load-store architectures.

We currently rely on the IPI to fold TIF_NEED_RESCHED into
PREEMPT_NEED_RESCHED, but when this IPI lands while we already have
a load for the preempt-count but before the store, the store will erase
the PREEMPT_NEED_RESCHED change.

The current preempt-count only works on load-store archs because
interrupts are assumed to be completely balanced wrt their preempt_count
fiddling; the previous preempt_count load will match the preempt_count
state after the interrupt and therefore nothing gets lost.

This patch removes the PREEMPT_NEED_RESCHED usage from generic code and
pushes it into x86 arch code; the generic code goes back to relying on
TIF_NEED_RESCHED.

Boot tested on x86_64 and compile tested on ppc64.

Reported-by: Benjamin Herrenschmidt 
Reported-and-Tested-by: Alexander Graf 
Signed-off-by: Peter Zijlstra 
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar