linux-stable.git/virt/kvm, branch linux-3.10.y

KVM: kvm_io_bus_unregister_dev() should never fail

2017-06-07T22:47:11+00:00

commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.

No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.

There is nothing the callers could do, except retrying over and over
again.

So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).

Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov 
Reviewed-by: Cornelia Huck 
Signed-off-by: David Hildenbrand 
Signed-off-by: Paolo Bonzini 
[wt: no kvm_io_bus_read_cookie in 3.10, slightly different constructs]

Signed-off-by: Willy Tarreau

kvm: exclude ioeventfd from counting kvm_io_range limit

2017-06-07T22:47:11+00:00

commit 6ea34c9b78c10289846db0abeebd6b84d5aca084 upstream.

We can easily reach the 1000 limit by start VM with a couple
hundred I/O devices (multifunction=on). The hardcode limit
already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).

In userspace, we already have maximum file descriptor to
limit ioeventfd count. But kvm_io_bus devices also are used
for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
by maximum file descriptor.

Currently only ioeventfds take too much kvm_io_bus devices,
so just exclude it from counting kvm_io_range limit.

Also fixed one indent issue in kvm_host.h

Signed-off-by: Amos Kong 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Gleb Natapov 
[wt: next patch depends on this one]
Signed-off-by: Willy Tarreau

KVM: x86: clear bus pointer when destroyed

2017-06-07T22:47:11+00:00

commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream.

When releasing the bus, let's clear the bus pointers to mark it out. If
any further device unregister happens on this bus, we know that we're
done if we found the bus being released already.

Signed-off-by: Peter Xu 
Signed-off-by: Radim KrÄmÃ¡Å 
Signed-off-by: Willy Tarreau

kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES

2016-08-27T09:40:24+00:00

commit caf1ff26e1aa178133df68ac3d40815fed2187d9 upstream.

These days, we experienced one guest crash with 8 cores and 3 disks,
with qemu error logs as bellow:

qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

And then we found one patch(bdf026317d) in qemu tree, which said
could fix this bug.

Execute the following script will reproduce the BUG quickly:

irq_affinity.sh
========================================================================

vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
    for irq in {1,2,4,8,10,20,40,80}
        do
            echo $irq > /proc/irq/$vda_irq_num/smp_affinity
            echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
            dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
            dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
        done
done
========================================================================

The following qemu log is added in the qemu code and is displayed when
this bug reproduced:

kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024,
irq_routes->nr: 1024, gsi_count: 1024.

That's to say when irq_routes->nr == 1024, there are 1024 routing entries,
but in the kernel code when routes->nr >= 1024, will just return -EINVAL;

The nr is the number of the routing entries which is in of
[1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1].

This patch fix the BUG above.

Cc: stable@vger.kernel.org
Signed-off-by: Xiubo Li 
Signed-off-by: Wei Tang 
Signed-off-by: Zhang Zhuoyu 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Willy Tarreau

KVM: fix spin_lock_init order on x86

2016-06-07T08:42:44+00:00

commit e9ad4ec8379ad1ba6f68b8ca1c26b50b5ae0a327 upstream.

Moving the initialization earlier is needed in 4.6 because
kvm_arch_init_vm is now using mmu_lock, causing lockdep to
complain:

[  284.440294] INFO: trying to register non-static key.
[  284.445259] the code is fine but needs lockdep annotation.
[  284.450736] turning off the locking correctness validator.
...
[  284.528318]  [] lock_acquire+0xd3/0x240
[  284.533733]  [] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.541467]  [] _raw_spin_lock+0x41/0x80
[  284.546960]  [] ? kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.554707]  [] kvm_page_track_register_notifier+0x20/0x60 [kvm]
[  284.562281]  [] kvm_mmu_init_vm+0x20/0x30 [kvm]
[  284.568381]  [] kvm_arch_init_vm+0x1ea/0x200 [kvm]
[  284.574740]  [] kvm_dev_ioctl+0xbf/0x4d0 [kvm]

However, it also helps fixing a preexisting problem, which is why this
patch is also good for stable kernels: kvm_create_vm was incrementing
current->mm->mm_count but not decrementing it at the out_err label (in
case kvm_init_mmu_notifier failed).  The new initialization order makes
it possible to add the required mmdrop without adding a new error label.

Reported-by: Borislav Petkov 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Willy Tarreau

KVM: async_pf: do not warn on page allocation failures

2016-03-03T23:06:24+00:00

commit d7444794a02ff655eda87e3cc54e86b940e7736f upstream.

In async_pf we try to allocate with NOWAIT to get an element quickly
or fail. This code also handle failures gracefully. Lets silence
potential page allocation failures under load.

qemu-system-s39: page allocation failure: order:0,mode:0x2200000
[...]
Call Trace:
([<00000000001146b8>] show_trace+0xf8/0x148)
[<000000000011476a>] show_stack+0x62/0xe8
[<00000000004a36b8>] dump_stack+0x70/0x98
[<0000000000272c3a>] warn_alloc_failed+0xd2/0x148
[<000000000027709e>] __alloc_pages_nodemask+0x94e/0xb38
[<00000000002cd36a>] new_slab+0x382/0x400
[<00000000002cf7ac>] ___slab_alloc.constprop.30+0x2dc/0x378
[<00000000002d03d0>] kmem_cache_alloc+0x160/0x1d0
[<0000000000133db4>] kvm_setup_async_pf+0x6c/0x198
[<000000000013dee8>] kvm_arch_vcpu_ioctl_run+0xd48/0xd58
[<000000000012fcaa>] kvm_vcpu_ioctl+0x372/0x690
[<00000000002f66f6>] do_vfs_ioctl+0x3be/0x510
[<00000000002f68ec>] SyS_ioctl+0xa4/0xb8
[<0000000000781c5e>] system_call+0xd6/0x264
[<000003ffa24fa06a>] 0x3ffa24fa06a

Signed-off-by: Christian Borntraeger 
Reviewed-by: Dominik Dingel 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

KVM: use slowpath for cross page cached accesses

2015-05-06T19:56:21+00:00

commit ca3f0874723fad81d0c701b63ae3a17a408d5f25 upstream.

kvm_write_guest_cached() does not mark all written pages as dirty and
code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot
with cross page accesses.  Fix all the easy way.

The check is '<= 1' to have the same result for 'len = 0' cache anywhere
in the page.  (nr_pages_needed is 0 on page boundary.)

Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.")
Signed-off-by: Radim Krčmář 
Message-Id: <20150408121648.GA3519@potion.brq.redhat.com>
Reviewed-by: Wanpeng Li 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

kvm: fix excessive pages un-pinning in kvm_iommu_map error path.

2014-11-14T16:47:56+00:00

commit 3d32e4dbe71374a6780eaf51d719d76f9a9bf22f upstream.

The third parameter of kvm_unpin_pages() when called from
kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
and not the page size.

This error was facilitated with an inconsistent API: kvm_pin_pages() takes
a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
by matching the two.

This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter
of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
un-pinning for pages intended to be un-pinned (i.e. memory leak) but
unfortunately potentially aggravated the number of pages we un-pin that
should have stayed pinned. As far as I understand though, the same
practical mitigations apply.

This issue was found during review of Red Hat 6.6 patches to prepare
Ksplice rebootless updates.

Thanks to Vegard for his time on a late Friday evening to help me in
understanding this code.

Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
Signed-off-by: Quentin Casasnovas 
Signed-off-by: Vegard Nossum 
Signed-off-by: Jamie Iles 
Reviewed-by: Sasha Levin 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

kvm: don't take vcpu mutex for obviously invalid vcpu ioctls

2014-10-30T16:35:10+00:00

commit 2ea75be3219571d0ec009ce20d9971e54af96e09 upstream.

vcpu ioctls can hang the calling thread if issued while a vcpu is running.
However, invalid ioctls can happen when userspace tries to probe the kind
of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case,
we know the ioctl is going to be rejected as invalid anyway and we can
fail before trying to take the vcpu mutex.

This patch does not change functionality, it just makes invalid ioctls
fail faster.

Signed-off-by: David Matlack 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman

kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)

2014-09-05T23:28:36+00:00

commit 350b8bdd689cd2ab2c67c8a86a0be86cfa0751a7 upstream.

The third parameter of kvm_iommu_put_pages is wrong,
It should be 'gfn - slot->base_gfn'.

By making gfn very large, malicious guest or userspace can cause kvm to
go to this error path, and subsequently to pass a huge value as size.
Alternatively if gfn is small, then pages would be pinned but never
unpinned, causing host memory leak and local DOS.

Passing a reasonable but large value could be the most dangerous case,
because it would unpin a page that should have stayed pinned, and thus
allow the device to DMA into arbitrary memory.  However, this cannot
happen because of the condition that can trigger the error:

- out of memory (where you can't allocate even a single page)
  should not be possible for the attacker to trigger

- when exceeding the iommu's address space, guest pages after gfn
  will also exceed the iommu's address space, and inside
  kvm_iommu_put_pages() the iommu_iova_to_phys() will fail.  The
  page thus would not be unpinned at all.

Reported-by: Jack Morgenstein 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman