linux-stable.git/arch/arm64/include, branch linux-3.19.y

arm64: errata: add workaround for cortex-a53 erratum #845719

2015-05-06T20:01:56+00:00

commit 905e8c5dcaa147163672b06fe9dcb5abaacbc711 upstream.

When running a compat (AArch32) userspace on Cortex-A53, a load at EL0
from a virtual address that matches the bottom 32 bits of the virtual
address used by a recent load at (AArch64) EL1 might return incorrect
data.

This patch works around the issue by writing to the contextidr_el1
register on the exception return path when returning to a 32-bit task.
This workaround is patched in at runtime based on the MIDR value of the
processor.

Reviewed-by: Marc Zyngier 
Tested-by: Mark Rutland 
Signed-off-by: Will Deacon 
Signed-off-by: Kevin Hilman 
Signed-off-by: Greg Kroah-Hartman

arm64: apply alternatives for !SMP kernels

2015-05-06T20:01:56+00:00

commit 137650aad96c9594683445e41afa8ac5a2097520 upstream.

Currently we only perform alternative patching for kernels built with
CONFIG_SMP, as we call apply_alternatives_all() in smp.c, which is only
built for CONFIG_SMP. Thus !SMP kernels may not have necessary
alternatives patched in.

This patch ensures that we call apply_alternatives_all() once all CPUs
are booted, even for !SMP kernels, by having the smp_init_cpus() stub
call this for !SMP kernels via up_late_init. A new wrapper,
do_post_cpus_up_work, is added so we can hook other calls here later
(e.g. boot mode logging).

Cc: Andre Przywara 
Cc: Catalin Marinas 
Fixes: e039ee4ee3fcf174 ("arm64: add alternative runtime patching")
Tested-by: Ard Biesheuvel 
Reviewed-by: Ard Biesheuvel 
Signed-off-by: Mark Rutland 
Signed-off-by: Will Deacon 
Signed-off-by: Greg Kroah-Hartman

arm64: KVM: Do not use pgd_index to index stage-2 pgd

2015-05-06T20:01:44+00:00

commit 04b8dc85bf4a64517e3cf20e409eeaa503b15cc1 upstream.

The kernel's pgd_index macro is designed to index a normal, page
sized array. KVM is a bit diffferent, as we can use concatenated
pages to have a bigger address space (for example 40bit IPA with
4kB pages gives us an 8kB PGD.

In the above case, the use of pgd_index will always return an index
inside the first 4kB, which makes a guest that has memory above
0x8000000000 rather unhappy, as it spins forever in a page fault,
whist the host happilly corrupts the lower pgd.

The obvious fix is to get our own kvm_pgd_index that does the right
thing(tm).

Tested on X-Gene with a hacked kvmtool that put memory at a stupidly
high address.

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
Signed-off-by: Shannon Zhao 
Signed-off-by: Greg Kroah-Hartman

arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting

2015-05-06T20:01:44+00:00

commit a987370f8e7a1677ae385042644326d9cd145a20 upstream.

We're using __get_free_pages with to allocate the guest's stage-2
PGD. The standard behaviour of this function is to return a set of
pages where only the head page has a valid refcount.

This behaviour gets us into trouble when we're trying to increment
the refcount on a non-head page:

page:ffff7c00cfb693c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
BUG: failure at include/linux/mm.h:548/get_page()!
Kernel panic - not syncing: BUG!
CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
Hardware name: APM X-Gene Mustang board (DT)
Call trace:
[] dump_backtrace+0x0/0x13c
[] show_stack+0x10/0x1c
[] dump_stack+0x74/0x94
[] panic+0x100/0x240
[] stage2_get_pmd+0x17c/0x2bc
[] kvm_handle_guest_abort+0x4b4/0x6b0
[] handle_exit+0x58/0x180
[] kvm_arch_vcpu_ioctl_run+0x114/0x45c
[] kvm_vcpu_ioctl+0x2e0/0x754
[] do_vfs_ioctl+0x424/0x5c8
[] SyS_ioctl+0x40/0x78
CPU0: stopping

A possible approach for this is to split the compound page using
split_page() at allocation time, and change the teardown path to
free one page at a time.  It turns out that alloc_pages_exact() and
free_pages_exact() does exactly that.

While we're at it, the PGD allocation code is reworked to reduce
duplication.

This has been tested on an X-Gene platform with a 4kB/48bit-VA host
kernel, and kvmtool hacked to place memory in the second page of
the hardware PGD (PUD for the host kernel). Also regression-tested
on a Cubietruck (Cortex-A7).

 [ Reworked to use alloc_pages_exact() and free_pages_exact() and to
   return pointers directly instead of by reference as arguments
    - Christoffer ]

Reported-by: Mark Rutland 
Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
Signed-off-by: Shannon Zhao 
Signed-off-by: Greg Kroah-Hartman

KVM: arm/arm64: check IRQ number on userland injection

2015-05-06T20:01:43+00:00

commit fd1d0ddf2ae92fb3df42ed476939861806c5d785 upstream.

When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
only check it against a fixed limit, which historically is set
to 127. With the new dynamic IRQ allocation the effective limit may
actually be smaller (64).
So when now a malicious or buggy userland injects a SPI in that
range, we spill over on our VGIC bitmaps and bytemaps memory.
I could trigger a host kernel NULL pointer dereference with current
mainline by injecting some bogus IRQ number from a hacked kvmtool:
-----------------
....
DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
DEBUG: IRQ #114 still in the game, writing to bytemap now...
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc07652e000
[00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
Hardware name: FVP Base (DT)
task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
PC is at kvm_vgic_inject_irq+0x234/0x310
LR is at kvm_vgic_inject_irq+0x30c/0x310
pc : [] lr : [] pstate: 80000145
.....

So this patch fixes this by checking the SPI number against the
actual limit. Also we remove the former legacy hard limit of
127 in the ioctl code.

Signed-off-by: Andre Przywara 
Reviewed-by: Christoffer Dall 
[maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
as suggested by Christopher Covington]
Signed-off-by: Marc Zyngier 
Signed-off-by: Greg Kroah-Hartman

arm64: percpu: Make this_cpu accessors pre-empt safe

2015-04-13T12:03:56+00:00

commit f3eab7184ddcd4867cf42e3274ba24a66e1e093d upstream.

this_cpu operations were implemented for arm64 in:
 5284e1b arm64: xchg: Implement cmpxchg_double
 f97fc81 arm64: percpu: Implement this_cpu operations

Unfortunately, it is possible for pre-emption to take place between
address generation and data access. This can lead to cases where data
is being manipulated by this_cpu for a different CPU than it was
called on. Which effectively breaks the spec.

This patch disables pre-emption for the this_cpu operations
guaranteeing that address generation and data manipulation take place
without a pre-emption in-between.

Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double")
Fixes: f97fc810798c ("arm64: percpu: Implement this_cpu operations")
Reported-by: Mark Rutland 
Acked-by: Will Deacon 
Signed-off-by: Steve Capper 
[catalin.marinas@arm.com: remove space after type cast]
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: Use the reserved TTBR0 if context switching to the init_mm

2015-04-13T12:03:56+00:00

commit e53f21bce4d35a93b23d8fa1a840860f6c74f59e upstream.

The idle_task_exit() function may call switch_mm() with next ==
&init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so
this patch simply sets the reserved TTBR0.

Reported-by: Jon Medhurst (Tixy) 
Tested-by: Jon Medhurst (Tixy) 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm64: Invalidate the TLB corresponding to intermediate page table levels

2015-03-26T12:59:36+00:00

commit 285994a62c80f1d72c6924282bcb59608098d5ec upstream.

The ARM architecture allows the caching of intermediate page table
levels and page table freeing requires a sequence like:

	pmd_clear()
	TLB invalidation
	pte page freeing

With commit 5e5f6dc10546 (arm64: mm: enable HAVE_RCU_TABLE_FREE logic),
the page table freeing batching was moved from tlb_remove_page() to
tlb_remove_table(). The former takes care of TLB invalidation as this is
also shared with pte clearing and page cache page freeing. The latter,
however, does not invalidate the TLBs for intermediate page table levels
as it probably relies on the architecture code to do it if required.
When the mm->mm_users < 2, tlb_remove_table() does not do any batching
and page table pages are freed before tlb_finish_mmu() which performs
the actual TLB invalidation.

This patch introduces __tlb_flush_pgtable() for arm64 and calls it from
the {pte,pmd,pud}_free_tlb() directly without relying on deferred page
table freeing.

Fixes: 5e5f6dc10546 arm64: mm: enable HAVE_RCU_TABLE_FREE logic
Reported-by: Jon Masters 
Tested-by: Jon Masters 
Tested-by: Steve Capper 
Signed-off-by: Catalin Marinas 
Signed-off-by: Greg Kroah-Hartman

arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault

2015-01-29T22:24:57+00:00

When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.

That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.

At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.

Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.

Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall

arm/arm64: KVM: Invalidate data cache on unmap

2015-01-29T22:24:56+00:00

Let's assume a guest has created an uncached mapping, and written
to that page. Let's also assume that the host uses a cache-coherent
IO subsystem. Let's finally assume that the host is under memory
pressure and starts to swap things out.

Before this "uncached" page is evicted, we need to make sure
we invalidate potential speculated, clean cache lines that are
sitting there, or the IO subsystem is going to swap out the
cached view, loosing the data that has been written directly
into memory.

Signed-off-by: Marc Zyngier 
Signed-off-by: Christoffer Dall