linux-stable.git/drivers/hv, branch linux-6.14.y

Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()

2025-05-22T12:31:52+00:00

commit 45a442fe369e6c4e0b4aa9f63b31c3f2f9e2090e upstream.

With the netvsc driver changed to use vmbus_sendpacket_mpb_desc()
instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining
callers. Remove it.

Cc:  # 6.1.x
Signed-off-by: Michael Kelley 
Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman

Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges

2025-05-22T12:31:52+00:00

commit 380b75d3078626aadd0817de61f3143f5db6e393 upstream.

vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver
and is hardcoded to create a single GPA range. To allow it to also be
used by the netvsc driver to create multiple GPA ranges, no longer
hardcode as having a single GPA range. Allow the calling driver to
specify the rangecount in the supplied descriptor.

Update the storvsc driver to reflect this new approach.

Cc:  # 6.1.x
Signed-off-by: Michael Kelley 
Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski 
Signed-off-by: Greg Kroah-Hartman

uio_hv_generic: Fix sysfs creation path for ring buffer

2025-05-18T06:26:00+00:00

commit f31fe8165d365379d858c53bef43254c7d6d1cfd upstream.

On regular bootup, devices get registered to VMBus first, so when
uio_hv_generic driver for a particular device type is probed,
the device is already initialized and added, so sysfs creation in
hv_uio_probe() works fine. However, when the device is removed
and brought back, the channel gets rescinded and the device again gets
registered to VMBus. However this time, the uio_hv_generic driver is
already registered to probe for that device and in this case sysfs
creation is tried before the device's kobject gets initialized
completely.

Fix this by moving the core logic of sysfs creation of ring buffer,
from uio_hv_generic to HyperV's VMBus driver, where the rest of the
sysfs attributes for the channels are defined. While doing that, make
use of attribute groups and macros, instead of creating sysfs
directly, to ensure better error handling and code flow.

Problematic path:
vmbus_process_offer (A new offer comes for the VMBus device)
  vmbus_add_channel_work
    vmbus_device_register
      |-> device_register
      |     |...
      |     |-> hv_uio_probe
      |           |...
      |           |-> sysfs_create_bin_file (leads to a warning as
      |                 the primary channel's kobject, which is used to
      |                 create the sysfs file, is not yet initialized)
      |-> kset_create_and_add
      |-> vmbus_add_channel_kobj (initialization of the primary
                                  channel's kobject happens later)

Above code flow is sequential and the warning is always reproducible in
this path.

Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
Cc: stable@kernel.org
Suggested-by: Saurabh Sengar 
Suggested-by: Michael Kelley 
Reviewed-by: Michael Kelley 
Tested-by: Michael Kelley 
Reviewed-by: Dexuan Cui 
Signed-off-by: Naman Jain 
Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com
Signed-off-by: Greg Kroah-Hartman

Drivers: hv: vmbus: Don't release fb_mmio resource in vmbus_free_mmio()

2025-03-10T16:54:06+00:00

The VMBus driver manages the MMIO space it owns via the hyperv_mmio
resource tree. Because the synthetic video framebuffer portion of the
MMIO space is initially setup by the Hyper-V host for each guest, the
VMBus driver does an early reserve of that portion of MMIO space in the
hyperv_mmio resource tree. It saves a pointer to that resource in
fb_mmio. When a VMBus driver requests MMIO space and passes "true"
for the "fb_overlap_ok" argument, the reserved framebuffer space is
used if possible. In that case it's not necessary to do another request
against the "shadow" hyperv_mmio resource tree because that resource
was already requested in the early reserve steps.

However, the vmbus_free_mmio() function currently does no special
handling for the fb_mmio resource. When a framebuffer device is
removed, or the driver is unbound, the current code for
vmbus_free_mmio() releases the reserved resource, leaving fb_mmio
pointing to memory that has been freed. If the same or another
driver is subsequently bound to the device, vmbus_allocate_mmio()
checks against fb_mmio, and potentially gets garbage. Furthermore
a second unbind operation produces this "nonexistent resource" error
because of the unbalanced behavior between vmbus_allocate_mmio() and
vmbus_free_mmio():

[   55.499643] resource: Trying to free nonexistent
			resource <0x00000000f0000000-0x00000000f07fffff>

Fix this by adding logic to vmbus_free_mmio() to recognize when
MMIO space in the fb_mmio reserved area would be released, and don't
release it. This filtering ensures the fb_mmio resource always exists,
and makes vmbus_free_mmio() more parallel with vmbus_allocate_mmio().

Fixes: be000f93e5d7 ("drivers:hv: Track allocations of children of hv_vmbus in private resource tree")
Signed-off-by: Michael Kelley 
Tested-by: Saurabh Sengar 
Reviewed-by: Saurabh Sengar 
Link: https://lore.kernel.org/r/20250310035208.275764-1-mhklinux@outlook.com
Signed-off-by: Wei Liu 
Message-ID: <20250310035208.275764-1-mhklinux@outlook.com>

treewide: const qualify ctl_tables where applicable

2025-01-28T12:48:37+00:00

Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu 
Acked-by: Steven Rostedt (Google)  # for kernel/trace/
Reviewed-by: Martin K. Petersen  # SCSI
Reviewed-by: Darrick J. Wong  # xfs
Acked-by: Jani Nikula 
Acked-by: Corey Minyard 
Acked-by: Wei Liu 
Acked-by: Thomas Gleixner 
Reviewed-by: Bill O'Donnell 
Acked-by: Baoquan He 
Acked-by: Ashutosh Dixit 
Acked-by: Anna Schumaker 
Signed-off-by: Joel Granados

hyperv: Enable the hypercall output page for the VTL mode

2025-01-10T00:54:21+00:00

Due to the hypercall page not being allocated in the VTL mode,
the code resorts to using a part of the input page.

Allocate the hypercall output page in the VTL mode thus enabling
it to use it for output and share code with dom0.

Signed-off-by: Roman Kisel 
Reviewed-by: Nuno Das Neves 
Link: https://lore.kernel.org/r/20250108222138.1623703-4-romank@linux.microsoft.com
Signed-off-by: Wei Liu 
Message-ID: <20250108222138.1623703-4-romank@linux.microsoft.com>

hv_balloon: Fallback to generic_online_page() for non-HV hot added mem

2025-01-10T00:54:21+00:00

The Hyper-V balloon driver installs a custom callback for handling page
onlining operations performed by the memory hotplug subsystem. This
custom callback is global, and overrides the default callback
(generic_online_page) that Linux otherwise uses. The custom callback
properly handles memory that is hot-added by the balloon driver as part
of a Hyper-V hot-add region.

But memory can also be hot-added directly by a device driver for a vPCI
device, particularly GPUs. In such a case, the custom callback installed by
the balloon driver runs, but won't find the page in its hot-add region list
and doesn't online it, which could cause driver initialization failures.

Fix this by having the balloon custom callback run generic_online_page()
when the page isn't part of a Hyper-V hot-add region, thereby doing the
default Linux behavior. This allows device driver hot-adds to work
properly. Similar cases are handled the same way in the virtio-mem driver.

Suggested-by: Vikram Sethi 
Tested-by: Michael Frohlich 
Reviewed-by: Michael Kelley 
Signed-off-by: Jacob Pan 
Link: https://lore.kernel.org/r/20250107180918.1053933-1-jacob.pan@linux.microsoft.com
Signed-off-by: Wei Liu 
Message-ID: <20250107180918.1053933-1-jacob.pan@linux.microsoft.com>

Drivers: hv: vmbus: Log on missing offers if any

2025-01-10T00:54:21+00:00

When resuming from hibernation, log any channels that were present
before hibernation but now are gone.
In general, the boot-time devices configured for a resuming VM should be
the same as the devices in the VM at the time of hibernation. It's
uncommon for the configuration to have been changed such that offers
are missing. Changing the configuration violates the rules for
hibernation anyway.
The cleanup of missing channels is not straight-forward and dependent
on individual device driver functionality and implementation,
so it can be added in future with separate changes.

Signed-off-by: John Starks 
Co-developed-by: Naman Jain 
Signed-off-by: Naman Jain 
Reviewed-by: Easwar Hariharan 
Reviewed-by: Saurabh Sengar 
Reviewed-by: Michael Kelley 
Link: https://lore.kernel.org/r/20250102130712.1661-3-namjain@linux.microsoft.com
Signed-off-by: Wei Liu 
Message-ID: <20250102130712.1661-3-namjain@linux.microsoft.com>

Drivers: hv: vmbus: Wait for boot-time offers during boot and resume

2025-01-10T00:54:21+00:00

Channel offers are requested during VMBus initialization and resume from
hibernation. Add support to wait for all boot-time channel offers to
be delivered and processed before returning from vmbus_request_offers.

This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.

Without this, user mode can race with VMBus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when VMBus has
finished processing boot-time offers.

With this added functionality, remove earlier logic which keeps track
of count of offered channels post resume from hibernation. Once all
offers delivered message is received, no further boot-time offers are
going to be received. Consequently, logic to prevent suspend from
happening after previous resume had missing offers, is also removed.

Co-developed-by: John Starks 
Signed-off-by: John Starks 
Signed-off-by: Naman Jain 
Reviewed-by: Easwar Hariharan 
Reviewed-by: Saurabh Sengar 
Reviewed-by: Michael Kelley 
Link: https://lore.kernel.org/r/20250102130712.1661-2-namjain@linux.microsoft.com
Signed-off-by: Wei Liu 
Message-ID: <20250102130712.1661-2-namjain@linux.microsoft.com>

Drivers: hv: Don't assume cpu_possible_mask is dense

2025-01-10T00:54:21+00:00

Current code allocates the hv_vp_index array with size
num_possible_cpus(). This code assumes cpu_possible_mask is dense,
which is not true in the general case per [1]. If cpu_possible_mask
is sparse, the array might be indexed by a value beyond the size of
the array.

However, the configurations that Hyper-V provides to guest VMs on x86
and ARM64 hardware, in combination with how architecture specific code
assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask.
So the dense assumption is not currently causing failures. But for
robustness against future changes in how cpu_possible_mask is populated,
update the code to no longer assume dense.

The correct approach is to allocate and initialize the array using size
"nr_cpu_ids". While this leaves unused array entries corresponding to
holes in cpu_possible_mask, the holes are assumed to be minimal and hence
the amount of memory wasted by unused entries is minimal.

Using nr_cpu_ids also reduces initialization time, in that the loop to
initialize the array currently rescans cpu_possible_mask on each
iteration. This is n-squared in the number of CPUs, which could be
significant for large CPU counts.

[1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/

Signed-off-by: Michael Kelley 
Acked-by: Peter Zijlstra (Intel) 
Link: https://lore.kernel.org/r/20241003035333.49261-3-mhklinux@outlook.com
Signed-off-by: Wei Liu 
Message-ID: <20241003035333.49261-3-mhklinux@outlook.com>