linux-stable.git/drivers/vfio, branch linux-4.1.y

vfio/pci: Virtualize Maximum Read Request Size

2018-05-23T01:36:34+00:00

[ Upstream commit cf0d53ba4947aad6e471491d5b20a567cbe92e56 ]

MRRS defines the maximum read request size a device is allowed to
make.  Drivers will often increase this to allow more data transfer
with a single request.  Completions to this request are bound by the
MPS setting for the bus.  Aside from device quirks (none known), it
doesn't seem to make sense to set an MRRS value less than MPS, yet
this is a likely scenario given that user drivers do not have a
system-wide view of the PCI topology.  Virtualize MRRS such that the
user can set MRRS >= MPS, but use MPS as the floor value that we'll
write to hardware.

Signed-off-by: Alex Williamson 
Signed-off-by: Sasha Levin

vfio/pci: Virtualize Maximum Payload Size

2018-05-23T01:36:34+00:00

[ Upstream commit 523184972b282cd9ca17a76f6ca4742394856818 ]

With virtual PCI-Express chipsets, we now see userspace/guest drivers
trying to match the physical MPS setting to a virtual downstream port.
Of course a lone physical device surrounded by virtual interconnects
cannot make a correct decision for a proper MPS setting.  Instead,
let's virtualize the MPS control register so that writes through to
hardware are disallowed.  Userspace drivers like QEMU assume they can
write anything to the device and we'll filter out anything dangerous.
Since mismatched MPS can lead to AER and other faults, let's add it
to the kernel side rather than relying on userspace virtualization to
handle it.

Signed-off-by: Alex Williamson 
Reviewed-by: Eric Auger 
Signed-off-by: Sasha Levin

vfio-pci: Virtualize PCIe & AF FLR

2018-05-23T01:36:34+00:00

[ Upstream commit ddf9dc0eb5314d6dac8b19b1cc37c739c6896e7e ]

We use a BAR restore trick to try to detect when a user has performed
a device reset, possibly through FLR or other backdoors, to put things
back into a working state.  This is important for backdoor resets, but
we can actually just virtualize the "front door" resets provided via
PCIe and AF FLR.  Set these bits as virtualized + writable, allowing
the default write to set them in vconfig, then we can simply check the
bit, perform an FLR of our own, and clear the bit.  We don't actually
have the granularity in PCI to specify the type of reset we want to
do, but generally devices don't implement both PCIe and AF FLR and
we'll favor these over other types of reset, so we should generally
lineup.  We do test whether the device provides the requested FLR type
to stay consistent with hardware capabilities though.

This seems to fix several instance of devices getting into bad states
with userspace drivers, like dpdk, running inside a VM.

Signed-off-by: Alex Williamson 
Reviewed-by: Greg Rose 
Signed-off-by: Sasha Levin

vfio-pci: Handle error from pci_iomap

2017-09-10T20:35:56+00:00

[ Upstream commit e19f32da5ded958238eac1bbe001192acef191a2 ]

Here, pci_iomap can fail, handle this case release selected
pci regions and return -ENOMEM.

Signed-off-by: Arvind Yadav 
Signed-off-by: Alex Williamson 
Signed-off-by: Sasha Levin

vfio-pci: use 32-bit comparisons for register address for gcc-4.5

2017-09-10T20:35:55+00:00

[ Upstream commit 45e869714489431625c569d21fc952428d761476 ]

Using ancient compilers (gcc-4.5 or older) on ARM, we get a link
failure with the vfio-pci driver:

ERROR: "__aeabi_lcmp" [drivers/vfio/pci/vfio-pci.ko] undefined!

The reason is that the compiler tries to do a comparison of
a 64-bit range. This changes it to convert to a 32-bit number
explicitly first, as newer compilers do for themselves.

Signed-off-by: Arnd Bergmann 
Signed-off-by: Alex Williamson 
Signed-off-by: Sasha Levin

vfio: New external user group/file match

2017-09-10T14:59:18+00:00

[ Upstream commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d ]

At the point where the kvm-vfio pseudo device wants to release its
vfio group reference, we can't always acquire a new reference to make
that happen.  The group can be in a state where we wouldn't allow a
new reference to be added.  This new helper function allows a caller
to match a file to a group to facilitate this.  Given a file and
group, report if they match.  Thus the caller needs to already have a
group reference to match to the file.  This allows the deletion of a
group without acquiring a new reference.

Signed-off-by: Alex Williamson 
Reviewed-by: Eric Auger 
Reviewed-by: Paolo Bonzini 
Tested-by: Eric Auger 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

vfio: Fix group release deadlock

2017-09-10T14:59:18+00:00

[ Upstream commit 811642d8d8a82c0cce8dc2debfdaf23c5a144839 ]

If vfio_iommu_group_notifier() acquires a group reference and that
reference becomes the last reference to the group, then vfio_group_put
introduces a deadlock code path where we're trying to unregister from
the iommu notifier chain from within a callout of that chain.  Use a
work_struct to release this reference asynchronously.

Signed-off-by: Alex Williamson 
Reviewed-by: Eric Auger 
Tested-by: Eric Auger 
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin

vfio/type1: Remove locked page accounting workqueue

2017-06-13T13:29:21+00:00

[ Upstream commit 0cfef2b7410b64d7a430947e0b533314c4f97153 ]

If the mmap_sem is contented then the vfio type1 IOMMU backend will
defer locked page accounting updates to a workqueue task.  This has a
few problems and depending on which side the user tries to play, they
might be over-penalized for unmaps that haven't yet been accounted or
race the workqueue to enter more mappings than they're allowed.  The
original intent of this workqueue mechanism seems to be focused on
reducing latency through the ioctl, but we cannot do so at the cost
of correctness.  Remove this workqueue mechanism and update the
callers to allow for failure.  We can also now recheck the limit under
write lock to make sure we don't exceed it.

vfio_pin_pages_remote() also now necessarily includes an unwind path
which we can jump to directly if the consecutive page pinning finds
that we're exceeding the user's memory limits.  This avoids the
current lazy approach which does accounting and mapping up to the
fault, only to return an error on the next iteration to unwind the
entire vfio_dma.

Cc: stable@vger.kernel.org
Reviewed-by: Peter Xu 
Reviewed-by: Kirti Wankhede 
Signed-off-by: Alex Williamson 
Signed-off-by: Sasha Levin

vfio/pci: Fix integer overflows, bitmask check

2017-06-13T13:29:17+00:00

[ Upstream commit 05692d7005a364add85c6e25a6c4447ce08f913a ]

The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
user-supplied integers, potentially allowing memory corruption. This
patch adds appropriate integer overflow checks, checks the range bounds
for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
vfio_pci_set_irqs_ioctl().

Furthermore, a kzalloc is changed to a kcalloc because the use of a
kzalloc with an integer multiplication allowed an integer overflow
condition to be reached without this patch. kcalloc checks for overflow
and should prevent a similar occurrence.

Signed-off-by: Vlad Tsyrklevich 
Signed-off-by: Alex Williamson 
Signed-off-by: Sasha Levin

vfio: fix ioctl error handling

2016-03-09T18:15:13+00:00

[ Upstream commit 8160c4e455820d5008a1116d2dca35f0363bb062 ]

Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.

Fix up vfio to do
	return copy_to_user(...)) ?
		-EFAULT : 0;

everywhere.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Alex Williamson 
Signed-off-by: Sasha Levin