linux.git/drivers/vhost, branch v7.2-rc1

Merge tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2026-06-21T20:20:19+00:00

Pull non-MM updates from Andrew Morton:

 - "taskstats: fix TGID dead-thread stat retention" (Yiyang Chen)

   Fix a taskstats TGID aggregation bug where fields added in the TGID
   query path were not preserved after thread exit, and adds a kselftest
   covering the regression.

 - "lib/tests: string_helpers: Slight improvements" (Andy Shevchenko)

   Improve lib/tests/string_helpers_kunit.c a little

 - "lib/base64: decode fixes" (Josh Law)

   Address minor issues in lib/base64.c

 - "selftests/filelock: Make output more kselftestish" (Mark Brown)

   Make the output from the ofdlocks test a bit easier for tooling to
   work with. Also ignore the generated file

 - "uaccess: unify inline vs outline copy_{from,to}_user() selection"
   (Yury Norov)

   Simplify the usercopy code by removing the selectability of inlining
   copy_{from,to}_user().

 - "ocfs2: validate inline xattr header consumers" (ZhengYuan Huang)

   Fix a number of possible issues in the ocfs2 xattr code

 - "lib and lib/cmdline enhancements" (Dmitry Antipov)

   Provide additional robustness checking in the cmdline handling code
   and its in-kernel testing and selftests

 - "cleanup the RAID6 P/Q library" (Christoph Hellwig)

   Clean up the RAID6 P/Q library to match the recent updates to the
   RAID 5 XOR library and other CRC/crypto libraries

 - "ocfs2: harden inode validators against forged metadata" (Michael
   Bommarito)

   Add three structural checks to OCFS2 dinode validation so malformed
   on-disk fields are rejected before ocfs2_populate_inode() copies them
   into the in-core inode

 - "lib/raid: replace __get_free_pages() call with kmalloc()" (Mike
   Rapoport)

   Clean up the lib/raid code by using kmalloc() in more places

* tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (108 commits)
  ocfs2: fix circular locking dependency in ocfs2_dio_end_io_write
  ocfs2: fix NULL h_transaction deref in ocfs2_assure_trans_credits
  lib: interval_tree_test: validate benchmark parameters
  ocfs2: avoid moving extents to occupied clusters
  treewide: fix transposed "sign" typos and update spelling.txt
  ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec
  fat: reject BPB volumes whose data area starts beyond total sectors
  selftests/uevent: increase __UEVENT_BUFFER_SIZE to avoid ENOBUFS on busy systems
  lib/test_firmware: allocate the configured into_buf size
  fs: efs: remove unneeded debug prints
  checkpatch: cuppress warnings when Reported-by: is followed by Link:
  MAINTAINERS: add Alexander as a kcov reviewer
  mailmap: update Alexander Sverdlin's Email addresses
  fs: fat: inode: replace sprintf() with scnprintf()
  ocfs2: fix out-of-bounds write in ocfs2_remove_refcount_extent
  ocfs2: fix race between ocfs2_control_install_private() and ocfs2_control_release()
  ocfs2/dlm: require a ref for locking_state debugfs open
  ocfs2: reject FITRIM ranges shorter than a cluster
  ocfs2: validate fast symlink target during inode read
  ocfs2: add journal NULL check in ocfs2_checkpoint_inode()
  ...

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

2026-06-17T18:49:00+00:00

Pull virtio updates from Michael Tsirkin:

 - new virtio CAN driver

 - support for LoongArch architecture in fw_cfg

 - support for firmware notifications in vdpa/octeon_ep

 - support for VFs in virtio core

 - fixes, cleanups all over the place, notably:

    - vhost: fix vhost_get_avail_idx for a non empty ring
      fixing an significant old perf regression

    - READ_ONCE() annotations mean virtio ring is now
      free of KCSAN warnings

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits)
  can: virtio: Fix comment in UAPI header
  can: virtio: Add virtio CAN driver
  virtio: add num_vf callback to virtio_bus
  fw_cfg: Add support for LoongArch architecture
  vdpa/octeon_ep: fix IRQ-to-ring mapping in interrupt handler
  vdpa/octeon_ep: Add vDPA device event handling for firmware notifications
  vdpa/octeon_ep: Use 4 bytes for mailbox signature
  vdpa/octeon_ep: Fix PF->VF mailbox data address calculation
  vhost_task_create: kill unnecessary .exit_signal initialization
  vhost: remove unnecessary module_init/exit functions
  vdpa/mlx5: Use kvzalloc_flex() for MTT command memory
  vdpa_sim_net: switch to dynamic root device
  vdpa_sim_blk: switch to dynamic root device
  virtio-mem: Destroy mutex before freeing virtio_mem
  virtio-balloon: Destroy mutex before freeing virtio_balloon
  tools/virtio: fix build for kmalloc_obj API and missing stubs
  virtio_ring: Add READ_ONCE annotations for device-writable fields
  vduse: fix compat handling for VDUSE_IOTLB_GET_FD/VDUSE_VQ_GET_INFO
  tools/virtio: check mmap return value in vringh_test
  vhost/net: complete zerocopy ubufs only once
  ...

vhost: remove unnecessary module_init/exit functions

2026-06-10T06:17:00+00:00

The vhost driver has unnecessary empty module_init and
module_exit functions. Remove them. Note that if a module_init function
exists, a module_exit function must also exist; otherwise, the module
cannot be unloaded.

Signed-off-by: Ethan Nelson-Moore 
Signed-off-by: Michael S. Tsirkin 
Message-ID: <20260131020010.45647-1-enelsonmoore@gmail.com>

vhost/net: complete zerocopy ubufs only once

2026-06-10T06:16:59+00:00

vhost-net initializes one ubuf_info per outstanding zerocopy TX
descriptor and hands it to the backend socket.  The networking stack may
then clone a zerocopy skb before all skb references are released.  For
example, batman-adv fragmentation reaches skb_split(), which calls
skb_zerocopy_clone() and increments the same ubuf_info refcount.

vhost_zerocopy_complete() currently treats every ubuf callback as a
completed vhost descriptor.  It dereferences ubuf->ctx, writes the
descriptor completion state, and drops the vhost_net_ubuf_ref even when
the callback only releases a cloned skb reference.  A backend reset can
therefore wait for and free the vhost_net_ubuf_ref while another cloned
skb still carries the same ubuf_info.  A later completion then
dereferences the freed ubufs pointer.

KASAN reports the stale completion as:

  BUG: KASAN: slab-use-after-free in vhost_zerocopy_complete+0x1d7/0x1f0
  BUG: KASAN: slab-use-after-free in vhost_zerocopy_complete+0x101/0x1f0
  vhost_zerocopy_complete
  skb_copy_ubufs
  __dev_forward_skb2
  veth_xmit

The freed object was allocated from vhost_net_ioctl() while setting the
backend and freed through kfree_rcu()/kvfree_rcu_bulk after backend
removal, while delayed skb completion still reached
vhost_zerocopy_complete().

Honor the generic ubuf_info refcount before touching vhost state, and run
the vhost descriptor completion only for the final ubuf reference.  This
matches the msg_zerocopy_complete() ownership rule for cloned zerocopy
skbs.

Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
Signed-off-by: Qing Ming 
Signed-off-by: Michael S. Tsirkin 
Message-ID: <20260601104300.197210-1-a0yami@mailbox.org>

vhost/vdpa: validate virtqueue index in mmap and fault paths

2026-06-10T06:14:01+00:00

vhost_vdpa_mmap() and vhost_vdpa_fault() use vma->vm_pgoff as a
virtqueue index for get_vq_notification(), but they do not validate
that the index is smaller than v->nvqs.

The ioctl path already performs both a bounds check and
array_index_nospec(), but the mmap/fault path only checks that the
index fits in u16. This allows an out-of-range queue index to reach
driver-specific get_vq_notification() callbacks.

Fix this by extracting a unified vhost_vdpa_get_vq_notification()
helper that validates the queue index against v->nvqs and applies
array_index_nospec() before calling the driver callback. Both the
mmap and fault paths use this helper, and the bounds checking is
consolidated into a single location.

From source inspection, the most defensible impact is out-of-bounds
access in the callback path, potentially leading to invalid PFN
remaps and crash/DoS.

Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap")
Acked-by: Eugenio Pérez 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Qihang Tang 
Signed-off-by: Michael S. Tsirkin 
Message-ID: <20260508075821.92656-1-q.h.hack.winter@gmail.com>

vhost/vsock: Refuse the connection immediately when guest isn't ready

2026-06-10T06:14:01+00:00

When the host initiates an AF_VSOCK connect() to a guest that has not
yet loaded the virtio-vsock transport (i.e. still booting), the caller
blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT.

A caller that wants to know if the guest is up yet instead of waiting
could theoretically tune SO_VM_SOCKETS_CONNECT_TIMEOUT, but it's tricky
to find the right timeout, if not impossible: there's no way to
distinguish "guest won't reply because it's not up yet" vs "guest is up
and tried to reply, but was too slow".

Furthermore, this delay is pointless:
- If the guest doesn't initialize within this timeout, connect()
  returns ETIMEDOUT.
- If the guest **does** initialize, it'll reply with RST immediately,
  because there won't be a listener on the port yet; connect() returns
  ECONNRESET.

That's also inconsistent with the behavior at other initialization
stages: if a connection is attempted when the guest driver is already
loaded, but nothing is listening yet, we return ECONNRESET immediately
without waiting.

Fix this by checking the RX virtqueue backend in
vhost_transport_send_pkt() before queuing. If it's NULL, return
-EHOSTUNREACH immediately.

Callers that used to get ETIMEDOUT will now usually get EHOSTUNREACH.

Signed-off-by: Denis V. Lunev 
Co-developed-by: Polina Vishneva 
Signed-off-by: Polina Vishneva 
Reviewed-by: Stefano Garzarella 
Signed-off-by: Michael S. Tsirkin 
Message-ID: <20260513145842.809404-1-polina.vishneva@virtuozzo.com>

vhost: fix vhost_get_avail_idx for a non empty ring

2026-06-04T04:25:54+00:00

vhost_get_avail_idx is supposed to report whether it has updated
vq->avail_idx. Instead, it returns whether all entries have been
consumed, which is usually the same. But not always - in
drivers/vhost/net.c and when mergeable buffers have been enabled, the
driver checks whether the combined entries are big enough to store an
incoming packet. If not, the driver re-enables notifications with
available entries still in the ring. The incorrect return value from
vhost_get_avail_idx propagates through vhost_enable_notify and causes
the host to livelock if the guest is not making progress, as vhost will
immediately disable notifications and retry using the available entries.

This goes back to commit d3bb267bbdcb ("vhost: cache avail index in
vhost_enable_notify()") which changed vhost_enable_notify() to compare
the freshly read avail index against vq->last_avail_idx instead of the
previously cached vq->avail_idx. Commit 7ad472397667 ("vhost: move
smp_rmb() into vhost_get_avail_idx()") then carried over the same
comparison when refactoring vhost_enable_notify() to call the unified
vhost_get_avail_idx().

The obvious fix is to make vhost_get_avail_idx do what the comment
says it does and report whether new entries have been added.

Reported-by: ShuangYu 
Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
Cc: Stefan Hajnoczi 
Acked-by: Jason Wang 
Reviewed-by: Stefano Garzarella 
Signed-off-by: Michael S. Tsirkin 
Message-Id: <559b04ae6ce52973c535dc47e461638b7f4c3d63.1772441455.git.mst@redhat.com>

kcov: refactor common handle ID into kcov_common_handle_id

2026-05-29T04:24:45+00:00

Store common handle IDs in "struct kcov_common_handle_id", which consumes
no space in non-KCOV builds.

This cleanup removes #ifdef boilerplate code from subsystems that
integrate with KCOV (in particular in usbip_common.h and skbuff.h, see the
diffstat).

This should also make it easier to add KCOV remote coverage to more
subsystems in the future.

Link: https://lore.kernel.org/20260430-kcov-refactor-common-handle-v1-1-23a0c7a0ba38@google.com
Signed-off-by: Jann Horn 
Acked-by: Greg Kroah-Hartman 
Reviewed-by: Dmitry Vyukov 
Acked-by: Jakub Kicinski 
Cc: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Eugenio Pérez 
Cc: Hongren (Zenithal) Zheng 
Cc: Jann Horn 
Cc: Jason Wang 
Cc: "Michael S. Tsirkin" 
Cc: Shuah Khan 
Cc: Valentina Manea 
Signed-off-by: Andrew Morton

vhost-net: wake queue of tun/tap after ptr_ring consume

2026-05-14T00:52:55+00:00

Add tun_wake_queue() to tun.c and export it for use by vhost-net. The
function validates that the file belongs to a tun/tap device and that
the tfile exists, dereferences the tun_struct under RCU, and delegates
to __tun_wake_queue().

vhost_net_buf_produce() now calls tun_wake_queue() after a successful
batched consume of the ring to allow the netdev subqueue to be woken up.
The point is to allow the queue to be stopped when it gets full, which
is required for traffic shaping - implemented by the following
"avoid ptr_ring tail-drop when a qdisc is present".

Without the corresponding queue stopping, this patch alone causes no
throughput regression for a tap+vhost-net setup sending to a qemu VM:
3.857 Mpps to 3.891 Mpps.

Details: AMD Ryzen 5 5600X at 4.3 GHz, 3200 MHz RAM, isolated QEMU
threads, XDP drop program active in VM, pktgen sender; Avg over
50 runs @ 100,000,000 packets. SRSO and spectre v2 mitigations disabled.

Co-developed-by: Tim Gebauer 
Signed-off-by: Tim Gebauer 
Signed-off-by: Simon Schippers 
Acked-by: Michael S. Tsirkin 
Link: https://patch.msgid.link/20260510151529.43895-3-simon.schippers@tu-dortmund.de
Signed-off-by: Jakub Kicinski

Merge tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2026-04-23T23:50:42+00:00

Pull  networking fixes from Jakub Kicinski:
 "Including fixes from Netfilter.

  Steady stream of fixes. Last two weeks feel comparable to the two
  weeks before the merge window. Lots of AI-aided bug discovery. A newer
  big source is Sashiko/Gemini (Roman Gushchin's system), which points
  out issues in existing code during patch review (maybe 25% of fixes
  here likely originating from Sashiko). Nice thing is these are often
  fixed by the respective maintainers, not drive-bys.

  Current release - new code bugs:

   - kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP

  Previous releases - regressions:

   - add async ndo_set_rx_mode and switch drivers which we promised to
     be called under the per-netdev mutex to it

   - dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops

   - hv_sock: report EOF instead of -EIO for FIN

   - vsock/virtio: fix MSG_PEEK calculation on bytes to copy

  Previous releases - always broken:

   - ipv6: fix possible UAF in icmpv6_rcv()

   - icmp: validate reply type before using icmp_pointers

   - af_unix: drop all SCM attributes for SOCKMAP

   - netfilter: fix a number of bugs in the osf (OS fingerprinting)

   - eth: intel: fix timestamp interrupt configuration for E825C

  Misc:

   - bunch of data-race annotations"

* tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits)
  rxrpc: Fix error handling in rxgk_extract_token()
  rxrpc: Fix re-decryption of RESPONSE packets
  rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets
  rxrpc: Fix missing validation of ticket length in non-XDR key preparsing
  rxgk: Fix potential integer overflow in length check
  rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
  rxrpc: Fix potential UAF after skb_unshare() failure
  rxrpc: Fix rxkad crypto unalignment handling
  rxrpc: Fix memory leaks in rxkad_verify_response()
  net: rds: fix MR cleanup on copy error
  m68k: mvme147: Make me the maintainer
  net: txgbe: fix firmware version check
  selftests/bpf: check epoll readiness during reuseport migration
  tcp: call sk_data_ready() after listener migration
  vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll()
  ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
  tipc: fix double-free in tipc_buf_append()
  llc: Return -EINPROGRESS from llc_ui_connect()
  ipv4: icmp: validate reply type before using icmp_pointers
  selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges
  ...