<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/vhost, branch v4.9.232</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>vhost: vsock: kick send_pkt worker once device is started</title>
<updated>2020-05-10T08:28:00+00:00</updated>
<author>
<name>Jia He</name>
<email>justin.he@arm.com</email>
</author>
<published>2020-05-01T04:38:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=619453716db429d68b68b5d846c260ded7846819'/>
<id>619453716db429d68b68b5d846c260ded7846819</id>
<content type='text'>
commit 0b841030625cde5f784dd62aec72d6a766faae70 upstream.

Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connecting from the client side. The vhost vsock client tries to connect
an initializing virtio vsock server.

The abnormal flow looks like:
host-userspace           vhost vsock                       guest vsock
==============           ===========                       ============
connect()     --------&gt;  vhost_transport_send_pkt_work()   initializing
   |                     vq-&gt;private_data==NULL
   |                     will not be queued
   V
schedule_timeout(2s)
                         vhost_vsock_start()  &lt;---------   device ready
                         set vq-&gt;private_data

wait for 2s and failed
connect() again          vq-&gt;private_data!=NULL         recv connecting pkt

Details:
1. Host userspace sends a connect pkt, at that time, guest vsock is under
   initializing, hence the vhost_vsock_start has not been called. So
   vq-&gt;private_data==NULL, and the pkt is not been queued to send to guest
2. Then it sleeps for 2s
3. After guest vsock finishes initializing, vq-&gt;private_data is set
4. When host userspace wakes up after 2s, send connecting pkt again,
   everything is fine.

As suggested by Stefano Garzarella, this fixes it by additional kicking the
send_pkt worker in vhost_vsock_start once the virtio device is started. This
makes the pending pkt sent again.

After this patch, kata-runtime (with vsock enabled) boot time is reduced
from 3s to 1s on a ThunderX2 arm64 server.

[1] https://github.com/kata-containers/runtime/issues/1917

Reported-by: Ning Bo &lt;n.b@live.com&gt;
Suggested-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0b841030625cde5f784dd62aec72d6a766faae70 upstream.

Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connecting from the client side. The vhost vsock client tries to connect
an initializing virtio vsock server.

The abnormal flow looks like:
host-userspace           vhost vsock                       guest vsock
==============           ===========                       ============
connect()     --------&gt;  vhost_transport_send_pkt_work()   initializing
   |                     vq-&gt;private_data==NULL
   |                     will not be queued
   V
schedule_timeout(2s)
                         vhost_vsock_start()  &lt;---------   device ready
                         set vq-&gt;private_data

wait for 2s and failed
connect() again          vq-&gt;private_data!=NULL         recv connecting pkt

Details:
1. Host userspace sends a connect pkt, at that time, guest vsock is under
   initializing, hence the vhost_vsock_start has not been called. So
   vq-&gt;private_data==NULL, and the pkt is not been queued to send to guest
2. Then it sleeps for 2s
3. After guest vsock finishes initializing, vq-&gt;private_data is set
4. When host userspace wakes up after 2s, send connecting pkt again,
   everything is fine.

As suggested by Stefano Garzarella, this fixes it by additional kicking the
send_pkt worker in vhost_vsock_start once the virtio device is started. This
makes the pending pkt sent again.

After this patch, kata-runtime (with vsock enabled) boot time is reduced
from 3s to 1s on a ThunderX2 arm64 server.

[1] https://github.com/kata-containers/runtime/issues/1917

Reported-by: Ning Bo &lt;n.b@live.com&gt;
Suggested-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>vhost: Check docket sk_family instead of call getname</title>
<updated>2020-03-11T06:53:08+00:00</updated>
<author>
<name>Eugenio Pérez</name>
<email>eperezma@redhat.com</email>
</author>
<published>2020-03-05T16:30:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7f574e92e4474f1b33425c4ee9ba48cd4fbe7d75'/>
<id>7f574e92e4474f1b33425c4ee9ba48cd4fbe7d75</id>
<content type='text'>
[ Upstream commit 42d84c8490f9f0931786f1623191fcab397c3d64 ]

Doing so, we save one call to get data we already have in the struct.

Also, since there is no guarantee that getname use sockaddr_ll
parameter beyond its size, we add a little bit of security here.
It should do not do beyond MAX_ADDR_LEN, but syzbot found that
ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25,
versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro).

Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com
Signed-off-by: Eugenio Pérez &lt;eperezma@redhat.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 42d84c8490f9f0931786f1623191fcab397c3d64 ]

Doing so, we save one call to get data we already have in the struct.

Also, since there is no guarantee that getname use sockaddr_ll
parameter beyond its size, we add a little bit of security here.
It should do not do beyond MAX_ADDR_LEN, but syzbot found that
ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25,
versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro).

Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com
Signed-off-by: Eugenio Pérez &lt;eperezma@redhat.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vhost/vsock: accept only packets with the right dst_cid</title>
<updated>2020-01-04T12:41:12+00:00</updated>
<author>
<name>Stefano Garzarella</name>
<email>sgarzare@redhat.com</email>
</author>
<published>2019-12-06T14:39:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0a8f421b7a224447ac4ebdef466ad6151023b33d'/>
<id>0a8f421b7a224447ac4ebdef466ad6151023b33d</id>
<content type='text'>
[ Upstream commit 8a3cc29c316c17de590e3ff8b59f3d6cbfd37b0a ]

When we receive a new packet from the guest, we check if the
src_cid is correct, but we forgot to check the dst_cid.

The host should accept only packets where dst_cid is
equal to the host CID.

Signed-off-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 8a3cc29c316c17de590e3ff8b59f3d6cbfd37b0a ]

When we receive a new packet from the guest, we check if the
src_cid is correct, but we forgot to check the dst_cid.

The host should accept only packets where dst_cid is
equal to the host CID.

Signed-off-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vhost: make sure log_num &lt; in_num</title>
<updated>2019-09-16T06:19:34+00:00</updated>
<author>
<name>yongduan</name>
<email>yongduan@tencent.com</email>
</author>
<published>2019-09-11T09:44:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8d8276867b5ac539f1d6e166a028b51c8b1ceda8'/>
<id>8d8276867b5ac539f1d6e166a028b51c8b1ceda8</id>
<content type='text'>
commit 060423bfdee3f8bc6e2c1bac97de24d5415e2bc4 upstream.

The code assumes log_num &lt; in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: Lidong Chen &lt;lidongchen@tencent.com&gt;
Signed-off-by: ruippan &lt;ruippan@tencent.com&gt;
Signed-off-by: yongduan &lt;yongduan@tencent.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 060423bfdee3f8bc6e2c1bac97de24d5415e2bc4 upstream.

The code assumes log_num &lt; in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: Lidong Chen &lt;lidongchen@tencent.com&gt;
Signed-off-by: ruippan &lt;ruippan@tencent.com&gt;
Signed-off-by: yongduan &lt;yongduan@tencent.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Tyler Hicks &lt;tyhicks@canonical.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>vhost/test: fix build for vhost test</title>
<updated>2019-09-16T06:19:33+00:00</updated>
<author>
<name>Tiwei Bie</name>
<email>tiwei.bie@intel.com</email>
</author>
<published>2019-08-28T05:37:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aac633e9015d2f13f07522d024f906986ea50a06'/>
<id>aac633e9015d2f13f07522d024f906986ea50a06</id>
<content type='text'>
commit 264b563b8675771834419057cbe076c1a41fb666 upstream.

Since vhost_exceeds_weight() was introduced, callers need to specify
the packet weight and byte weight in vhost_dev_init(). Note that, the
packet weight isn't counted in this patch to keep the original behavior
unchanged.

Fixes: e82b9b0727ff ("vhost: introduce vhost_exceeds_weight()")
Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie &lt;tiwei.bie@intel.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 264b563b8675771834419057cbe076c1a41fb666 upstream.

Since vhost_exceeds_weight() was introduced, callers need to specify
the packet weight and byte weight in vhost_dev_init(). Note that, the
packet weight isn't counted in this patch to keep the original behavior
unchanged.

Fixes: e82b9b0727ff ("vhost: introduce vhost_exceeds_weight()")
Cc: stable@vger.kernel.org
Signed-off-by: Tiwei Bie &lt;tiwei.bie@intel.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>vhost: scsi: add weight support</title>
<updated>2019-08-25T08:51:42+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-08-16T23:01:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=02b40edda9fd2e42abae40f5dd85122f13dbe7b8'/>
<id>02b40edda9fd2e42abae40f5dd85122f13dbe7b8</id>
<content type='text'>
commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2 upstream.

This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing scsi kthread from hogging cpu
which is guest triggerable.

This addresses CVE-2019-3900.

Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver")
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
[bwh: Backported to 4.9:
 - Drop changes in vhost_scsi_ctl_handle_vq()
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2 upstream.

This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing scsi kthread from hogging cpu
which is guest triggerable.

This addresses CVE-2019-3900.

Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver")
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
[bwh: Backported to 4.9:
 - Drop changes in vhost_scsi_ctl_handle_vq()
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vhost_net: fix possible infinite loop</title>
<updated>2019-08-25T08:51:42+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-08-16T23:00:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4b586288578a3a2aa4efb969feed86f2d760f082'/>
<id>4b586288578a3a2aa4efb969feed86f2d760f082</id>
<content type='text'>
commit e2412c07f8f3040593dfb88207865a3cd58680c0 upstream.

When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:

while ((sock_len = vhost_net_rx_peek_head_len(net, sock-&gt;sk,
					      &amp;busyloop_intr))) {
...
	/* On overrun, truncate and discard */
	if (unlikely(headcount &gt; UIO_MAXIOV)) {
		iov_iter_init(&amp;msg.msg_iter, READ, vq-&gt;iov, 1, 1);
		err = sock-&gt;ops-&gt;recvmsg(sock, &amp;msg,
					 1, MSG_DONTWAIT | MSG_TRUNC);
		pr_debug("Discarded rx packet: len %zd\n", sock_len);
		continue;
	}
...
}

This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:

1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
   vhost process as much as possible e.g using indirect descriptors or
   other.
2) Malicious VM2 generate packets to VM1 as fast as possible

Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:

- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
  to hit the continue after vhost find guest just add new buffers

This addresses CVE-2019-3900.

Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short")
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
[bwh: Backported to 4.9:
 - Both Tx modes are handled in one loop in handle_tx()
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e2412c07f8f3040593dfb88207865a3cd58680c0 upstream.

When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:

while ((sock_len = vhost_net_rx_peek_head_len(net, sock-&gt;sk,
					      &amp;busyloop_intr))) {
...
	/* On overrun, truncate and discard */
	if (unlikely(headcount &gt; UIO_MAXIOV)) {
		iov_iter_init(&amp;msg.msg_iter, READ, vq-&gt;iov, 1, 1);
		err = sock-&gt;ops-&gt;recvmsg(sock, &amp;msg,
					 1, MSG_DONTWAIT | MSG_TRUNC);
		pr_debug("Discarded rx packet: len %zd\n", sock_len);
		continue;
	}
...
}

This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:

1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
   vhost process as much as possible e.g using indirect descriptors or
   other.
2) Malicious VM2 generate packets to VM1 as fast as possible

Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:

- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
  to hit the continue after vhost find guest just add new buffers

This addresses CVE-2019-3900.

Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short")
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
[bwh: Backported to 4.9:
 - Both Tx modes are handled in one loop in handle_tx()
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vhost: introduce vhost_exceeds_weight()</title>
<updated>2019-08-25T08:51:41+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-08-16T23:00:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=66c8d9d53e657d5068d9f234bc4ec1d703107a48'/>
<id>66c8d9d53e657d5068d9f234bc4ec1d703107a48</id>
<content type='text'>
commit e82b9b0727ff6d665fff2d326162b460dded554d upstream.

We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.

Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
[bwh: Backported to 4.9:
 - In vhost_net, both Tx modes are handled in one loop in handle_tx()
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e82b9b0727ff6d665fff2d326162b460dded554d upstream.

We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.

Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
[bwh: Backported to 4.9:
 - In vhost_net, both Tx modes are handled in one loop in handle_tx()
 - Adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vhost_net: introduce vhost_exceeds_weight()</title>
<updated>2019-08-25T08:51:41+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-08-16T23:00:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6214511f226c684d0da4a6cd1f24175d255ff061'/>
<id>6214511f226c684d0da4a6cd1f24175d255ff061</id>
<content type='text'>
commit 272f35cba53d088085e5952fd81d7a133ab90789 upstream.

Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 272f35cba53d088085e5952fd81d7a133ab90789 upstream.

Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vhost_net: use packet weight for rx handler, too</title>
<updated>2019-08-25T08:51:41+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2019-08-16T23:00:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=73f768b7684cf2e3ff99e74bc99c9ff253aa979f'/>
<id>73f768b7684cf2e3ff99e74bc99c9ff253aa979f</id>
<content type='text'>
commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3 upstream.

Similar to commit a2ac99905f1e ("vhost-net: set packet weight of
tx polling to 2 * vq size"), we need a packet-based limit for
handler_rx, too - elsewhere, under rx flood with small packets,
tx can be delayed for a very long time, even without busypolling.

The pkt limit applied to handle_rx must be the same applied by
handle_tx, or we will get unfair scheduling between rx and tx.
Tying such limit to the queue length makes it less effective for
large queue length values and can introduce large process
scheduler latencies, so a constant valued is used - likewise
the existing bytes limit.

The selected limit has been validated with PVP[1] performance
test with different queue sizes:

queue size		256	512	1024

baseline		366	354	362
weight 128		715	723	670
weight 256		740	745	733
weight 512		600	460	583
weight 1024		423	427	418

A packet weight of 256 gives peek performances in under all the
tested scenarios.

No measurable regression in unidirectional performance tests has
been detected.

[1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit db688c24eada63b1efe6d0d7d835e5c3bdd71fd3 upstream.

Similar to commit a2ac99905f1e ("vhost-net: set packet weight of
tx polling to 2 * vq size"), we need a packet-based limit for
handler_rx, too - elsewhere, under rx flood with small packets,
tx can be delayed for a very long time, even without busypolling.

The pkt limit applied to handle_rx must be the same applied by
handle_tx, or we will get unfair scheduling between rx and tx.
Tying such limit to the queue length makes it less effective for
large queue length values and can introduce large process
scheduler latencies, so a constant valued is used - likewise
the existing bytes limit.

The selected limit has been validated with PVP[1] performance
test with different queue sizes:

queue size		256	512	1024

baseline		366	354	362
weight 128		715	723	670
weight 256		740	745	733
weight 512		600	460	583
weight 1024		423	427	418

A packet weight of 256 gives peek performances in under all the
tested scenarios.

No measurable regression in unidirectional performance tests has
been detected.

[1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben.hutchings@codethink.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
