linux-stable.git/drivers/vhost, branch v4.12.2

mm: support __GFP_REPEAT in kvmalloc_node for >32kB

2017-05-09T00:15:12+00:00

vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
vhost_vsock because it would really like to prefer kmalloc to the
vmalloc fallback - see 23cc5a991c7a ("vhost-net: extend device
allocation to vmalloc") for more context.  Michael Tsirkin has also
noted:

 "__GFP_REPEAT overhead is during allocation time. Using vmalloc means
  all accesses are slowed down. Allocation is not on data path, accesses
  are."

The similar applies to other vhost_kvzalloc users.

Let's teach kvmalloc_node to handle __GFP_REPEAT properly.  There are
two things to be careful about.  First we should prevent from the OOM
killer and so have to involve __GFP_NORETRY by default and secondly
override __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is
ignored for !costly orders.

Supporting __GFP_REPEAT like semantic for !costly request is possible it
would require changes in the page allocator.  This is out of scope of
this patch.

This patch shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170306103032.2540-3-mhocko@kernel.org
Signed-off-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Michael S. Tsirkin 
Cc: David Miller 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

VSOCK: Add virtio vsock vsockmon hooks

2017-04-24T16:35:56+00:00

The virtio drivers deal with struct virtio_vsock_pkt.  Add
virtio_transport_deliver_tap_pkt(pkt) for handing packets to the
vsockmon device.

We call virtio_transport_deliver_tap_pkt(pkt) from
net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of
common code.  This is because the drivers may drop packets before
handing them to common code - we still want to capture them.

Signed-off-by: Gerard Garcia 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Jorgen Hansen 
Signed-off-by: David S. Miller

vhost-vsock: add pkt cancel capability

2017-03-21T21:41:46+00:00

To allow canceling all packets of a connection.

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jorgen Hansen 
Signed-off-by: Peng Tao 
Signed-off-by: David S. Miller

Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2017-03-03T18:16:38+00:00

Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
   header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new 's typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up 
  sched/headers: Remove #ifdefs from 
  sched/headers: Remove the  include from 
  sched/headers, hrtimer: Remove the  include from 
  sched/headers, x86/apic: Remove the  header inclusion from 
  sched/headers, timers: Remove the  include from 
  sched/headers: Remove  from 
  sched/headers: Remove  from 
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove  from 
  sched/headers: Remove the 'init_pid_ns' prototype from 
  sched/headers: Remove  from 
  sched/headers: Remove  from 
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove  from 
  sched/headers: Remove  from 
  sched/headers: Remove  from 
  sched/headers: Remove  from 
  sched/headers: Remove the  include from 
  sched/headers: Remove  from 
  ...

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

2017-03-02T21:53:13+00:00

Pull vhost updates from Michael Tsirkin:
 "virtio, vhost: optimizations, fixes

  Looks like a quiet cycle for vhost/virtio, just a couple of minor
  tweaks. Most notable is automatic interrupt affinity for blk and scsi.
  Hopefully other devices are not far behind"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-console: avoid DMA from stack
  vhost: introduce O(1) vq metadata cache
  virtio_scsi: use virtio IRQ affinity
  virtio_blk: use virtio IRQ affinity
  blk-mq: provide a default queue mapping for virtio device
  virtio: provide a method to get the IRQ affinity mask for a virtqueue
  virtio: allow drivers to request IRQ affinity when creating VQs
  virtio_pci: simplify MSI-X setup
  virtio_pci: don't duplicate the msix_enable flag in struct pci_dev
  virtio_pci: use shared interrupts for virtqueues
  virtio_pci: remove struct virtio_pci_vq_info
  vhost: try avoiding avail index access when getting descriptor
  virtio_mmio: expose header to userspace

sched/headers: Prepare to move signal wakeup & sigpending methods from into

2017-03-02T07:42:32+00:00

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

sched/headers: Prepare for new header dependencies before moving code to

2017-03-02T07:42:28+00:00

We are going to split  out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder  file that just
maps to  to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

sched/headers: Prepare for new header dependencies before moving code to

2017-03-02T07:42:27+00:00

We are going to split  out of , which
will have to be picked up from other headers and .c files.

Create a trivial placeholder  file that just
maps to  to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

vhost: introduce O(1) vq metadata cache

2017-03-01T23:35:06+00:00

When device IOTLB is enabled, all address translations were stored in
interval tree. O(lgN) searching time could be slow for virtqueue
metadata (avail, used and descriptors) since they were accessed much
often than other addresses. So this patch introduces an O(1) array
which points to the interval tree nodes that store the translations of
vq metadata. Those array were update during vq IOTLB prefetching and
were reset during each invalidation and tlb update. Each time we want
to access vq metadata, this small array were queried before interval
tree. This would be sufficient for static mappings but not dynamic
mappings, we could do optimizations on top.

Test were done with l2fwd in guest (2M hugepage):

   noiommu  | before        | after
tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%)
rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%)

We can almost reach the same performance as noiommu mode.

Signed-off-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin

vhost: try avoiding avail index access when getting descriptor

2017-02-27T18:37:27+00:00

If last avail idx is not equal to cached avail idx, we're sure there's
still available buffers in the virtqueue so there's no need to re-read
avail idx. So let's skip this to avoid unnecessary userspace memory
access and memory barrier. Pktgen test show about 3% improvement on rx
pps.

Signed-off-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin