<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/virt, branch v4.12</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages</title>
<updated>2017-06-06T13:28:40+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>marc.zyngier@arm.com</email>
</author>
<published>2017-06-05T18:17:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d6dbdd3c8558cad3b6d74cc357b408622d122331'/>
<id>d6dbdd3c8558cad3b6d74cc357b408622d122331</id>
<content type='text'>
Under memory pressure, we start ageing pages, which amounts to parsing
the page tables. Since we don't want to allocate any extra level,
we pass NULL for our private allocation cache. Which means that
stage2_get_pud() is allowed to fail. This results in the following
splat:

[ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1520.417741] pgd = ffff810f52fef000
[ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
[ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 1520.435156] Modules linked in:
[ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G        W       4.12.0-rc4-00027-g1885c397eaec #7205
[ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
[ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
[ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
[ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
[ 1520.478917] pc : [&lt;ffff0000080b137c&gt;] lr : [&lt;ffff0000080b149c&gt;] pstate: 40000145
[ 1520.486325] sp : ffff800ce04e33d0
[ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
[ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
[ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
[ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
[ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
[ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
[ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
[ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
[ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
[ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
[ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
[ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
[ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
[ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
[ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
[ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
[...]
[ 1521.510735] [&lt;ffff0000080b137c&gt;] stage2_get_pmd+0x34/0x110
[ 1521.516221] [&lt;ffff0000080b149c&gt;] kvm_age_hva_handler+0x44/0xf0
[ 1521.522054] [&lt;ffff0000080b0610&gt;] handle_hva_to_gpa+0xb8/0xe8
[ 1521.527716] [&lt;ffff0000080b3434&gt;] kvm_age_hva+0x44/0xf0
[ 1521.532854] [&lt;ffff0000080a58b0&gt;] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
[ 1521.539992] [&lt;ffff000008238378&gt;] __mmu_notifier_clear_flush_young+0x88/0xd0
[ 1521.546958] [&lt;ffff00000821eca0&gt;] page_referenced_one+0xf0/0x188
[ 1521.552881] [&lt;ffff00000821f36c&gt;] rmap_walk_anon+0xec/0x250
[ 1521.558370] [&lt;ffff000008220f78&gt;] rmap_walk+0x78/0xa0
[ 1521.563337] [&lt;ffff000008221104&gt;] page_referenced+0x164/0x180
[ 1521.569002] [&lt;ffff0000081f1af0&gt;] shrink_active_list+0x178/0x3b8
[ 1521.574922] [&lt;ffff0000081f2058&gt;] shrink_node_memcg+0x328/0x600
[ 1521.580758] [&lt;ffff0000081f23f4&gt;] shrink_node+0xc4/0x328
[ 1521.585986] [&lt;ffff0000081f2718&gt;] do_try_to_free_pages+0xc0/0x340
[ 1521.592000] [&lt;ffff0000081f2a64&gt;] try_to_free_pages+0xcc/0x240
[...]

The trivial fix is to handle this NULL pud value early, rather than
dereferencing it blindly.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under memory pressure, we start ageing pages, which amounts to parsing
the page tables. Since we don't want to allocate any extra level,
we pass NULL for our private allocation cache. Which means that
stage2_get_pud() is allowed to fail. This results in the following
splat:

[ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1520.417741] pgd = ffff810f52fef000
[ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
[ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 1520.435156] Modules linked in:
[ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G        W       4.12.0-rc4-00027-g1885c397eaec #7205
[ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
[ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
[ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
[ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
[ 1520.478917] pc : [&lt;ffff0000080b137c&gt;] lr : [&lt;ffff0000080b149c&gt;] pstate: 40000145
[ 1520.486325] sp : ffff800ce04e33d0
[ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
[ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
[ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
[ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
[ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
[ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
[ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
[ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
[ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
[ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
[ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
[ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
[ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
[ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
[ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
[ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
[...]
[ 1521.510735] [&lt;ffff0000080b137c&gt;] stage2_get_pmd+0x34/0x110
[ 1521.516221] [&lt;ffff0000080b149c&gt;] kvm_age_hva_handler+0x44/0xf0
[ 1521.522054] [&lt;ffff0000080b0610&gt;] handle_hva_to_gpa+0xb8/0xe8
[ 1521.527716] [&lt;ffff0000080b3434&gt;] kvm_age_hva+0x44/0xf0
[ 1521.532854] [&lt;ffff0000080a58b0&gt;] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
[ 1521.539992] [&lt;ffff000008238378&gt;] __mmu_notifier_clear_flush_young+0x88/0xd0
[ 1521.546958] [&lt;ffff00000821eca0&gt;] page_referenced_one+0xf0/0x188
[ 1521.552881] [&lt;ffff00000821f36c&gt;] rmap_walk_anon+0xec/0x250
[ 1521.558370] [&lt;ffff000008220f78&gt;] rmap_walk+0x78/0xa0
[ 1521.563337] [&lt;ffff000008221104&gt;] page_referenced+0x164/0x180
[ 1521.569002] [&lt;ffff0000081f1af0&gt;] shrink_active_list+0x178/0x3b8
[ 1521.574922] [&lt;ffff0000081f2058&gt;] shrink_node_memcg+0x328/0x600
[ 1521.580758] [&lt;ffff0000081f23f4&gt;] shrink_node+0xc4/0x328
[ 1521.585986] [&lt;ffff0000081f2718&gt;] do_try_to_free_pages+0xc0/0x340
[ 1521.592000] [&lt;ffff0000081f2a64&gt;] try_to_free_pages+0xcc/0x240
[...]

The trivial fix is to handle this NULL pud value early, rather than
dereferencing it blindly.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction</title>
<updated>2017-06-06T08:16:53+00:00</updated>
<author>
<name>Christoffer Dall</name>
<email>cdall@linaro.org</email>
</author>
<published>2017-06-04T20:17:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d68356cc51e304ff9a389f006b6249d41f2c2319'/>
<id>d68356cc51e304ff9a389f006b6249d41f2c2319</id>
<content type='text'>
We used to extract PRIbits from the ICH_VT_EL2 which was the upper field
in the register word, so a mask wasn't necessary, but as we switched to
looking at PREbits, which is bits 26 through 28 with the PRIbits field
being potentially non-zero, we really need to mask off the field value,
otherwise fun things may happen.

Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Acked-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We used to extract PRIbits from the ICH_VT_EL2 which was the upper field
in the register word, so a mask wasn't necessary, but as we switched to
looking at PREbits, which is bits 26 through 28 with the PRIbits field
being potentially non-zero, we really need to mask off the field value,
otherwise fun things may happen.

Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Acked-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration</title>
<updated>2017-05-24T07:44:07+00:00</updated>
<author>
<name>Christoffer Dall</name>
<email>cdall@linaro.org</email>
</author>
<published>2017-05-20T12:12:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=28232a4317be7ad615f0f1b69dc8583fd580a8e3'/>
<id>28232a4317be7ad615f0f1b69dc8583fd580a8e3</id>
<content type='text'>
We have been a little loose with our intermediate VMCR representation
where we had a 'ctlr' field, but we failed to differentiate between the
GICv2 GICC_CTLR and ICC_CTLR_EL1 layouts, and therefore ended up mapping
the wrong bits into the individual fields of the ICH_VMCR_EL2 when
emulating a GICv2 on a GICv3 system.

Fix this by using explicit fields for the VMCR bits instead.

Cc: Eric Auger &lt;eric.auger@redhat.com&gt;
Reported-by: wanghaibin &lt;wanghaibin.wang@huawei.com&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Reviewed-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Tested-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have been a little loose with our intermediate VMCR representation
where we had a 'ctlr' field, but we failed to differentiate between the
GICv2 GICC_CTLR and ICC_CTLR_EL1 layouts, and therefore ended up mapping
the wrong bits into the individual fields of the ICH_VMCR_EL2 when
emulating a GICv2 on a GICv3 system.

Fix this by using explicit fields for the VMCR bits instead.

Cc: Eric Auger &lt;eric.auger@redhat.com&gt;
Reported-by: wanghaibin &lt;wanghaibin.wang@huawei.com&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Reviewed-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Tested-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: arm/arm64: Hold slots_lock when unregistering kvm io bus devices</title>
<updated>2017-05-18T09:18:16+00:00</updated>
<author>
<name>Christoffer Dall</name>
<email>cdall@linaro.org</email>
</author>
<published>2017-05-17T19:16:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fa472fa91a5a0b241f5ddae927d2e235d07545df'/>
<id>fa472fa91a5a0b241f5ddae927d2e235d07545df</id>
<content type='text'>
We were not holding the kvm-&gt;slots_lock as required when calling
kvm_io_bus_unregister_dev() as required.

This only affects the error path, but still, let's do our due
diligence.

Reported by: Eric Auger &lt;eric.auger@redhat.com&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We were not holding the kvm-&gt;slots_lock as required when calling
kvm_io_bus_unregister_dev() as required.

This only affects the error path, but still, let's do our due
diligence.

Reported by: Eric Auger &lt;eric.auger@redhat.com&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: arm/arm64: Fix bug when registering redist iodevs</title>
<updated>2017-05-18T09:18:12+00:00</updated>
<author>
<name>Christoffer Dall</name>
<email>cdall@linaro.org</email>
</author>
<published>2017-05-17T11:12:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=552c9f47f8d451830a6b47151c6d2db77f77cc3e'/>
<id>552c9f47f8d451830a6b47151c6d2db77f77cc3e</id>
<content type='text'>
If userspace creates the VCPUs after initializing the VGIC, then we end
up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
it is called prior to adding the VCPU into the vcpus array on the VM.

There is no tight coupling between the VCPU index and the area of the
redistributor region used for the VCPU, so we can simply ensure that all
creations of redistributors are serialized per VM, and increment an
offset when we successfully add a redistributor.

The vgic_register_redist_iodev() function can be called from two paths:
vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
device attribute handler.  This patch already holds the kvm-&gt;lock mutex.

The other path is via kvm_vgic_vcpu_init, which is called through a
longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
kvm-&gt;lock mutex just before calling kvm_arch_vcpu_create(), so we can
simply take this mutex again later for our purposes.

Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs")
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Tested-by: Jean-Philippe Brucker &lt;jean-philippe.brucker@arm.com&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If userspace creates the VCPUs after initializing the VGIC, then we end
up in a situation where we trigger a bug in kvm_vcpu_get_idx(), because
it is called prior to adding the VCPU into the vcpus array on the VM.

There is no tight coupling between the VCPU index and the area of the
redistributor region used for the VCPU, so we can simply ensure that all
creations of redistributors are serialized per VM, and increment an
offset when we successfully add a redistributor.

The vgic_register_redist_iodev() function can be called from two paths:
vgic_redister_all_redist_iodev() which is called via the kvm_vgic_addr()
device attribute handler.  This patch already holds the kvm-&gt;lock mutex.

The other path is via kvm_vgic_vcpu_init, which is called through a
longer chain from kvm_vm_ioctl_create_vcpu(), which releases the
kvm-&gt;lock mutex just before calling kvm_arch_vcpu_create(), so we can
simply take this mutex again later for our purposes.

Fixes: ab6f468c10 ("KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs")
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Tested-by: Jean-Philippe Brucker &lt;jean-philippe.brucker@arm.com&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kvm: arm/arm64: Fix use after free of stage2 page table</title>
<updated>2017-05-16T09:54:25+00:00</updated>
<author>
<name>Suzuki K Poulose</name>
<email>suzuki.poulose@arm.com</email>
</author>
<published>2017-05-16T09:34:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0c428a6a9256fcd66817e12db32a50b405ed2e5c'/>
<id>0c428a6a9256fcd66817e12db32a50b405ed2e5c</id>
<content type='text'>
We yield the kvm-&gt;mmu_lock occassionaly while performing an operation
(e.g, unmap or permission changes) on a large area of stage2 mappings.
However this could possibly cause another thread to clear and free up
the stage2 page tables while we were waiting for regaining the lock and
thus the original thread could end up in accessing memory that was
freed. This patch fixes the problem by making sure that the stage2
pagetable is still valid after we regain the lock. The fact that
mmu_notifer-&gt;release() could be called twice (via __mmu_notifier_release
and mmu_notifier_unregsister) enhances the possibility of hitting
this race where there are two threads trying to unmap the entire guest
shadow pages.

While at it, cleanup the redudant checks around cond_resched_lock in
stage2_wp_range(), as cond_resched_lock already does the same checks.

Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Cc: andreyknvl@google.com
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Signed-off-by: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We yield the kvm-&gt;mmu_lock occassionaly while performing an operation
(e.g, unmap or permission changes) on a large area of stage2 mappings.
However this could possibly cause another thread to clear and free up
the stage2 page tables while we were waiting for regaining the lock and
thus the original thread could end up in accessing memory that was
freed. This patch fixes the problem by making sure that the stage2
pagetable is still valid after we regain the lock. The fact that
mmu_notifer-&gt;release() could be called twice (via __mmu_notifier_release
and mmu_notifier_unregsister) enhances the possibility of hitting
this race where there are two threads trying to unmap the entire guest
shadow pages.

While at it, cleanup the redudant checks around cond_resched_lock in
stage2_wp_range(), as cond_resched_lock already does the same checks.

Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Cc: andreyknvl@google.com
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Signed-off-by: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kvm: arm/arm64: Force reading uncached stage2 PGD</title>
<updated>2017-05-16T09:54:00+00:00</updated>
<author>
<name>Suzuki K Poulose</name>
<email>suzuki.poulose@arm.com</email>
</author>
<published>2017-05-16T09:34:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2952a6070e07ebdd5896f1f5b861acad677caded'/>
<id>2952a6070e07ebdd5896f1f5b861acad677caded</id>
<content type='text'>
Make sure we don't use a cached value of the KVM stage2 PGD while
resetting the PGD.

Cc: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make sure we don't use a cached value of the KVM stage2 PGD while
resetting the PGD.

Cc: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kvm: arm/arm64: Fix race in resetting stage2 PGD</title>
<updated>2017-05-15T10:05:25+00:00</updated>
<author>
<name>Suzuki K Poulose</name>
<email>suzuki.poulose@arm.com</email>
</author>
<published>2017-05-03T14:17:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6c0d706b563af732adb094c5bf807437e8963e84'/>
<id>6c0d706b563af732adb094c5bf807437e8963e84</id>
<content type='text'>
In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on or even holding the lock, could
potentially see that the PGD is still valid and proceed to perform
a stage2 operation and later encounter a NULL PGD.

[223090.242280] Unable to handle kernel NULL pointer dereference at
virtual address 00000040
[223090.262330] PC is at unmap_stage2_range+0x8c/0x428
[223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
[223090.262531] Call trace:
[223090.262533] [&lt;ffff0000080adb78&gt;] unmap_stage2_range+0x8c/0x428
[223090.262535] [&lt;ffff0000080adf40&gt;] kvm_unmap_hva_handler+0x2c/0x3c
[223090.262537] [&lt;ffff0000080ace2c&gt;] handle_hva_to_gpa+0xb0/0x104
[223090.262539] [&lt;ffff0000080af988&gt;] kvm_unmap_hva+0x5c/0xbc
[223090.262543] [&lt;ffff0000080a2478&gt;]
kvm_mmu_notifier_invalidate_page+0x50/0x8c
[223090.262547] [&lt;ffff0000082274f8&gt;]
__mmu_notifier_invalidate_page+0x5c/0x84
[223090.262551] [&lt;ffff00000820b700&gt;] try_to_unmap_one+0x1d0/0x4a0
[223090.262553] [&lt;ffff00000820c5c8&gt;] rmap_walk+0x1cc/0x2e0
[223090.262555] [&lt;ffff00000820c90c&gt;] try_to_unmap+0x74/0xa4
[223090.262557] [&lt;ffff000008230ce4&gt;] migrate_pages+0x31c/0x5ac
[223090.262561] [&lt;ffff0000081f869c&gt;] compact_zone+0x3fc/0x7ac
[223090.262563] [&lt;ffff0000081f8ae0&gt;] compact_zone_order+0x94/0xb0
[223090.262564] [&lt;ffff0000081f91c0&gt;] try_to_compact_pages+0x108/0x290
[223090.262569] [&lt;ffff0000081d5108&gt;] __alloc_pages_direct_compact+0x70/0x1ac
[223090.262571] [&lt;ffff0000081d64a0&gt;] __alloc_pages_nodemask+0x434/0x9f4
[223090.262572] [&lt;ffff0000082256f0&gt;] alloc_pages_vma+0x230/0x254
[223090.262574] [&lt;ffff000008235e5c&gt;] do_huge_pmd_anonymous_page+0x114/0x538
[223090.262576] [&lt;ffff000008201bec&gt;] handle_mm_fault+0xd40/0x17a4
[223090.262577] [&lt;ffff0000081fb324&gt;] __get_user_pages+0x12c/0x36c
[223090.262578] [&lt;ffff0000081fb804&gt;] get_user_pages_unlocked+0xa4/0x1b8
[223090.262579] [&lt;ffff0000080a3ce8&gt;] __gfn_to_pfn_memslot+0x280/0x31c
[223090.262580] [&lt;ffff0000080a3dd0&gt;] gfn_to_pfn_prot+0x4c/0x5c
[223090.262582] [&lt;ffff0000080af3f8&gt;] kvm_handle_guest_abort+0x240/0x774
[223090.262584] [&lt;ffff0000080b2bac&gt;] handle_exit+0x11c/0x1ac
[223090.262586] [&lt;ffff0000080ab99c&gt;] kvm_arch_vcpu_ioctl_run+0x31c/0x648
[223090.262587] [&lt;ffff0000080a1d78&gt;] kvm_vcpu_ioctl+0x378/0x768
[223090.262590] [&lt;ffff00000825df5c&gt;] do_vfs_ioctl+0x324/0x5a4
[223090.262591] [&lt;ffff00000825e26c&gt;] SyS_ioctl+0x90/0xa4
[223090.262595] [&lt;ffff000008085d84&gt;] el0_svc_naked+0x38/0x3c

This patch moves the stage2 PGD manipulation under the lock.

Reported-by: Alexander Graf &lt;agraf@suse.de&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Reviewed-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Signed-off-by: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on or even holding the lock, could
potentially see that the PGD is still valid and proceed to perform
a stage2 operation and later encounter a NULL PGD.

[223090.242280] Unable to handle kernel NULL pointer dereference at
virtual address 00000040
[223090.262330] PC is at unmap_stage2_range+0x8c/0x428
[223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
[223090.262531] Call trace:
[223090.262533] [&lt;ffff0000080adb78&gt;] unmap_stage2_range+0x8c/0x428
[223090.262535] [&lt;ffff0000080adf40&gt;] kvm_unmap_hva_handler+0x2c/0x3c
[223090.262537] [&lt;ffff0000080ace2c&gt;] handle_hva_to_gpa+0xb0/0x104
[223090.262539] [&lt;ffff0000080af988&gt;] kvm_unmap_hva+0x5c/0xbc
[223090.262543] [&lt;ffff0000080a2478&gt;]
kvm_mmu_notifier_invalidate_page+0x50/0x8c
[223090.262547] [&lt;ffff0000082274f8&gt;]
__mmu_notifier_invalidate_page+0x5c/0x84
[223090.262551] [&lt;ffff00000820b700&gt;] try_to_unmap_one+0x1d0/0x4a0
[223090.262553] [&lt;ffff00000820c5c8&gt;] rmap_walk+0x1cc/0x2e0
[223090.262555] [&lt;ffff00000820c90c&gt;] try_to_unmap+0x74/0xa4
[223090.262557] [&lt;ffff000008230ce4&gt;] migrate_pages+0x31c/0x5ac
[223090.262561] [&lt;ffff0000081f869c&gt;] compact_zone+0x3fc/0x7ac
[223090.262563] [&lt;ffff0000081f8ae0&gt;] compact_zone_order+0x94/0xb0
[223090.262564] [&lt;ffff0000081f91c0&gt;] try_to_compact_pages+0x108/0x290
[223090.262569] [&lt;ffff0000081d5108&gt;] __alloc_pages_direct_compact+0x70/0x1ac
[223090.262571] [&lt;ffff0000081d64a0&gt;] __alloc_pages_nodemask+0x434/0x9f4
[223090.262572] [&lt;ffff0000082256f0&gt;] alloc_pages_vma+0x230/0x254
[223090.262574] [&lt;ffff000008235e5c&gt;] do_huge_pmd_anonymous_page+0x114/0x538
[223090.262576] [&lt;ffff000008201bec&gt;] handle_mm_fault+0xd40/0x17a4
[223090.262577] [&lt;ffff0000081fb324&gt;] __get_user_pages+0x12c/0x36c
[223090.262578] [&lt;ffff0000081fb804&gt;] get_user_pages_unlocked+0xa4/0x1b8
[223090.262579] [&lt;ffff0000080a3ce8&gt;] __gfn_to_pfn_memslot+0x280/0x31c
[223090.262580] [&lt;ffff0000080a3dd0&gt;] gfn_to_pfn_prot+0x4c/0x5c
[223090.262582] [&lt;ffff0000080af3f8&gt;] kvm_handle_guest_abort+0x240/0x774
[223090.262584] [&lt;ffff0000080b2bac&gt;] handle_exit+0x11c/0x1ac
[223090.262586] [&lt;ffff0000080ab99c&gt;] kvm_arch_vcpu_ioctl_run+0x31c/0x648
[223090.262587] [&lt;ffff0000080a1d78&gt;] kvm_vcpu_ioctl+0x378/0x768
[223090.262590] [&lt;ffff00000825df5c&gt;] do_vfs_ioctl+0x324/0x5a4
[223090.262591] [&lt;ffff00000825e26c&gt;] SyS_ioctl+0x90/0xa4
[223090.262595] [&lt;ffff000008085d84&gt;] el0_svc_naked+0x38/0x3c

This patch moves the stage2 PGD manipulation under the lock.

Reported-by: Alexander Graf &lt;agraf@suse.de&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Reviewed-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Signed-off-by: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2 registers</title>
<updated>2017-05-15T09:32:04+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>marc.zyngier@arm.com</email>
</author>
<published>2017-05-02T13:30:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=15d2bffdde6268883647c6112970f74d3e1af651'/>
<id>15d2bffdde6268883647c6112970f74d3e1af651</id>
<content type='text'>
The GICv3 documentation is extremely confusing, as it talks about
the number of priorities represented by the ICH_APxRn_EL2 registers,
while it should really talk about the number of preemption levels.

This leads to a bug where we may access undefined ICH_APxRn_EL2
registers, since PREbits is allowed to be smaller than PRIbits.
Thankfully, nobody seem to have taken this path so far...

The fix is to use ICH_VTR_EL2.PREbits instead.

Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The GICv3 documentation is extremely confusing, as it talks about
the number of priorities represented by the ICH_APxRn_EL2 registers,
while it should really talk about the number of preemption levels.

This leads to a bug where we may access undefined ICH_APxRn_EL2
registers, since PREbits is allowed to be smaller than PRIbits.
Thankfully, nobody seem to have taken this path so far...

The fix is to use ICH_VTR_EL2.PREbits instead.

Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt</title>
<updated>2017-05-15T09:31:51+00:00</updated>
<author>
<name>Marc Zyngier</name>
<email>marc.zyngier@arm.com</email>
</author>
<published>2017-05-02T13:30:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d6e77ad1489650afa20da92bb589c8778baa8da'/>
<id>3d6e77ad1489650afa20da92bb589c8778baa8da</id>
<content type='text'>
When an interrupt is injected with the HW bit set (indicating that
deactivation should be propagated to the physical distributor),
special care must be taken so that we never mark the corresponding
LR with the Active+Pending state (as the pending state is kept in
the physycal distributor).

Cc: stable@vger.kernel.org
Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an interrupt is injected with the HW bit set (indicating that
deactivation should be propagated to the physical distributor),
special care must be taken so that we never mark the corresponding
LR with the Active+Pending state (as the pending state is kept in
the physycal distributor).

Cc: stable@vger.kernel.org
Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Reviewed-by: Christoffer Dall &lt;cdall@linaro.org&gt;
Signed-off-by: Christoffer Dall &lt;cdall@linaro.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
