<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/hv, branch v6.14</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>Drivers: hv: vmbus: Don't release fb_mmio resource in vmbus_free_mmio()</title>
<updated>2025-03-10T16:54:06+00:00</updated>
<author>
<name>Michael Kelley</name>
<email>mhklinux@outlook.com</email>
</author>
<published>2025-03-10T03:52:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=73fe9073c0cc28056cb9de0c8a516dac070f1d1f'/>
<id>73fe9073c0cc28056cb9de0c8a516dac070f1d1f</id>
<content type='text'>
The VMBus driver manages the MMIO space it owns via the hyperv_mmio
resource tree. Because the synthetic video framebuffer portion of the
MMIO space is initially setup by the Hyper-V host for each guest, the
VMBus driver does an early reserve of that portion of MMIO space in the
hyperv_mmio resource tree. It saves a pointer to that resource in
fb_mmio. When a VMBus driver requests MMIO space and passes "true"
for the "fb_overlap_ok" argument, the reserved framebuffer space is
used if possible. In that case it's not necessary to do another request
against the "shadow" hyperv_mmio resource tree because that resource
was already requested in the early reserve steps.

However, the vmbus_free_mmio() function currently does no special
handling for the fb_mmio resource. When a framebuffer device is
removed, or the driver is unbound, the current code for
vmbus_free_mmio() releases the reserved resource, leaving fb_mmio
pointing to memory that has been freed. If the same or another
driver is subsequently bound to the device, vmbus_allocate_mmio()
checks against fb_mmio, and potentially gets garbage. Furthermore
a second unbind operation produces this "nonexistent resource" error
because of the unbalanced behavior between vmbus_allocate_mmio() and
vmbus_free_mmio():

[   55.499643] resource: Trying to free nonexistent
			resource &lt;0x00000000f0000000-0x00000000f07fffff&gt;

Fix this by adding logic to vmbus_free_mmio() to recognize when
MMIO space in the fb_mmio reserved area would be released, and don't
release it. This filtering ensures the fb_mmio resource always exists,
and makes vmbus_free_mmio() more parallel with vmbus_allocate_mmio().

Fixes: be000f93e5d7 ("drivers:hv: Track allocations of children of hv_vmbus in private resource tree")
Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Tested-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Reviewed-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20250310035208.275764-1-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250310035208.275764-1-mhklinux@outlook.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The VMBus driver manages the MMIO space it owns via the hyperv_mmio
resource tree. Because the synthetic video framebuffer portion of the
MMIO space is initially setup by the Hyper-V host for each guest, the
VMBus driver does an early reserve of that portion of MMIO space in the
hyperv_mmio resource tree. It saves a pointer to that resource in
fb_mmio. When a VMBus driver requests MMIO space and passes "true"
for the "fb_overlap_ok" argument, the reserved framebuffer space is
used if possible. In that case it's not necessary to do another request
against the "shadow" hyperv_mmio resource tree because that resource
was already requested in the early reserve steps.

However, the vmbus_free_mmio() function currently does no special
handling for the fb_mmio resource. When a framebuffer device is
removed, or the driver is unbound, the current code for
vmbus_free_mmio() releases the reserved resource, leaving fb_mmio
pointing to memory that has been freed. If the same or another
driver is subsequently bound to the device, vmbus_allocate_mmio()
checks against fb_mmio, and potentially gets garbage. Furthermore
a second unbind operation produces this "nonexistent resource" error
because of the unbalanced behavior between vmbus_allocate_mmio() and
vmbus_free_mmio():

[   55.499643] resource: Trying to free nonexistent
			resource &lt;0x00000000f0000000-0x00000000f07fffff&gt;

Fix this by adding logic to vmbus_free_mmio() to recognize when
MMIO space in the fb_mmio reserved area would be released, and don't
release it. This filtering ensures the fb_mmio resource always exists,
and makes vmbus_free_mmio() more parallel with vmbus_allocate_mmio().

Fixes: be000f93e5d7 ("drivers:hv: Track allocations of children of hv_vmbus in private resource tree")
Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Tested-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Reviewed-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20250310035208.275764-1-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250310035208.275764-1-mhklinux@outlook.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: const qualify ctl_tables where applicable</title>
<updated>2025-01-28T12:48:37+00:00</updated>
<author>
<name>Joel Granados</name>
<email>joel.granados@kernel.org</email>
</author>
<published>2025-01-28T12:48:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1751f872cc97f992ed5c4c72c55588db1f0021e1'/>
<id>1751f872cc97f992ed5c4c72c55588db1f0021e1</id>
<content type='text'>
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &amp;uts_kern/const struct ctl_table *table = \&amp;uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt; # for kernel/trace/
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt; # SCSI
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # xfs
Acked-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
Acked-by: Corey Minyard &lt;cminyard@mvista.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Bill O'Donnell &lt;bodonnel@redhat.com&gt;
Acked-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Acked-by: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Signed-off-by: Joel Granados &lt;joel.granados@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &amp;uts_kern/const struct ctl_table *table = \&amp;uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt; # for kernel/trace/
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt; # SCSI
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # xfs
Acked-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
Acked-by: Corey Minyard &lt;cminyard@mvista.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Bill O'Donnell &lt;bodonnel@redhat.com&gt;
Acked-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Acked-by: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Signed-off-by: Joel Granados &lt;joel.granados@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hyperv: Enable the hypercall output page for the VTL mode</title>
<updated>2025-01-10T00:54:21+00:00</updated>
<author>
<name>Roman Kisel</name>
<email>romank@linux.microsoft.com</email>
</author>
<published>2025-01-08T22:21:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9263abc7fd5d753dbf4cd4bf994bcf9c8c999918'/>
<id>9263abc7fd5d753dbf4cd4bf994bcf9c8c999918</id>
<content type='text'>
Due to the hypercall page not being allocated in the VTL mode,
the code resorts to using a part of the input page.

Allocate the hypercall output page in the VTL mode thus enabling
it to use it for output and share code with dom0.

Signed-off-by: Roman Kisel &lt;romank@linux.microsoft.com&gt;
Reviewed-by: Nuno Das Neves &lt;nunodasneves@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20250108222138.1623703-4-romank@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250108222138.1623703-4-romank@linux.microsoft.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Due to the hypercall page not being allocated in the VTL mode,
the code resorts to using a part of the input page.

Allocate the hypercall output page in the VTL mode thus enabling
it to use it for output and share code with dom0.

Signed-off-by: Roman Kisel &lt;romank@linux.microsoft.com&gt;
Reviewed-by: Nuno Das Neves &lt;nunodasneves@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20250108222138.1623703-4-romank@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250108222138.1623703-4-romank@linux.microsoft.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hv_balloon: Fallback to generic_online_page() for non-HV hot added mem</title>
<updated>2025-01-10T00:54:21+00:00</updated>
<author>
<name>Jacob Pan</name>
<email>jacob.pan@linux.microsoft.com</email>
</author>
<published>2025-01-07T18:09:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1da602ec36a3e208c070ec23895e84cbb621a12e'/>
<id>1da602ec36a3e208c070ec23895e84cbb621a12e</id>
<content type='text'>
The Hyper-V balloon driver installs a custom callback for handling page
onlining operations performed by the memory hotplug subsystem. This
custom callback is global, and overrides the default callback
(generic_online_page) that Linux otherwise uses. The custom callback
properly handles memory that is hot-added by the balloon driver as part
of a Hyper-V hot-add region.

But memory can also be hot-added directly by a device driver for a vPCI
device, particularly GPUs. In such a case, the custom callback installed by
the balloon driver runs, but won't find the page in its hot-add region list
and doesn't online it, which could cause driver initialization failures.

Fix this by having the balloon custom callback run generic_online_page()
when the page isn't part of a Hyper-V hot-add region, thereby doing the
default Linux behavior. This allows device driver hot-adds to work
properly. Similar cases are handled the same way in the virtio-mem driver.

Suggested-by: Vikram Sethi &lt;vsethi@nvidia.com&gt;
Tested-by: Michael Frohlich &lt;mfrohlich@microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Signed-off-by: Jacob Pan &lt;jacob.pan@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20250107180918.1053933-1-jacob.pan@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250107180918.1053933-1-jacob.pan@linux.microsoft.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The Hyper-V balloon driver installs a custom callback for handling page
onlining operations performed by the memory hotplug subsystem. This
custom callback is global, and overrides the default callback
(generic_online_page) that Linux otherwise uses. The custom callback
properly handles memory that is hot-added by the balloon driver as part
of a Hyper-V hot-add region.

But memory can also be hot-added directly by a device driver for a vPCI
device, particularly GPUs. In such a case, the custom callback installed by
the balloon driver runs, but won't find the page in its hot-add region list
and doesn't online it, which could cause driver initialization failures.

Fix this by having the balloon custom callback run generic_online_page()
when the page isn't part of a Hyper-V hot-add region, thereby doing the
default Linux behavior. This allows device driver hot-adds to work
properly. Similar cases are handled the same way in the virtio-mem driver.

Suggested-by: Vikram Sethi &lt;vsethi@nvidia.com&gt;
Tested-by: Michael Frohlich &lt;mfrohlich@microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Signed-off-by: Jacob Pan &lt;jacob.pan@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20250107180918.1053933-1-jacob.pan@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250107180918.1053933-1-jacob.pan@linux.microsoft.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: vmbus: Log on missing offers if any</title>
<updated>2025-01-10T00:54:21+00:00</updated>
<author>
<name>John Starks</name>
<email>jostarks@microsoft.com</email>
</author>
<published>2025-01-02T13:07:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fcf5203e289ca0ef75a18ce74a9eb716f7f1f569'/>
<id>fcf5203e289ca0ef75a18ce74a9eb716f7f1f569</id>
<content type='text'>
When resuming from hibernation, log any channels that were present
before hibernation but now are gone.
In general, the boot-time devices configured for a resuming VM should be
the same as the devices in the VM at the time of hibernation. It's
uncommon for the configuration to have been changed such that offers
are missing. Changing the configuration violates the rules for
hibernation anyway.
The cleanup of missing channels is not straight-forward and dependent
on individual device driver functionality and implementation,
so it can be added in future with separate changes.

Signed-off-by: John Starks &lt;jostarks@microsoft.com&gt;
Co-developed-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Signed-off-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Reviewed-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Link: https://lore.kernel.org/r/20250102130712.1661-3-namjain@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250102130712.1661-3-namjain@linux.microsoft.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When resuming from hibernation, log any channels that were present
before hibernation but now are gone.
In general, the boot-time devices configured for a resuming VM should be
the same as the devices in the VM at the time of hibernation. It's
uncommon for the configuration to have been changed such that offers
are missing. Changing the configuration violates the rules for
hibernation anyway.
The cleanup of missing channels is not straight-forward and dependent
on individual device driver functionality and implementation,
so it can be added in future with separate changes.

Signed-off-by: John Starks &lt;jostarks@microsoft.com&gt;
Co-developed-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Signed-off-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Reviewed-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Link: https://lore.kernel.org/r/20250102130712.1661-3-namjain@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250102130712.1661-3-namjain@linux.microsoft.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: vmbus: Wait for boot-time offers during boot and resume</title>
<updated>2025-01-10T00:54:21+00:00</updated>
<author>
<name>Naman Jain</name>
<email>namjain@linux.microsoft.com</email>
</author>
<published>2025-01-02T13:07:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=113386ca981c3997db6b83272c7ecf47456aeddb'/>
<id>113386ca981c3997db6b83272c7ecf47456aeddb</id>
<content type='text'>
Channel offers are requested during VMBus initialization and resume from
hibernation. Add support to wait for all boot-time channel offers to
be delivered and processed before returning from vmbus_request_offers.

This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.

Without this, user mode can race with VMBus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when VMBus has
finished processing boot-time offers.

With this added functionality, remove earlier logic which keeps track
of count of offered channels post resume from hibernation. Once all
offers delivered message is received, no further boot-time offers are
going to be received. Consequently, logic to prevent suspend from
happening after previous resume had missing offers, is also removed.

Co-developed-by: John Starks &lt;jostarks@microsoft.com&gt;
Signed-off-by: John Starks &lt;jostarks@microsoft.com&gt;
Signed-off-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Reviewed-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Link: https://lore.kernel.org/r/20250102130712.1661-2-namjain@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250102130712.1661-2-namjain@linux.microsoft.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Channel offers are requested during VMBus initialization and resume from
hibernation. Add support to wait for all boot-time channel offers to
be delivered and processed before returning from vmbus_request_offers.

This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.

Without this, user mode can race with VMBus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when VMBus has
finished processing boot-time offers.

With this added functionality, remove earlier logic which keeps track
of count of offered channels post resume from hibernation. Once all
offers delivered message is received, no further boot-time offers are
going to be received. Consequently, logic to prevent suspend from
happening after previous resume had missing offers, is also removed.

Co-developed-by: John Starks &lt;jostarks@microsoft.com&gt;
Signed-off-by: John Starks &lt;jostarks@microsoft.com&gt;
Signed-off-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Reviewed-by: Saurabh Sengar &lt;ssengar@linux.microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Link: https://lore.kernel.org/r/20250102130712.1661-2-namjain@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20250102130712.1661-2-namjain@linux.microsoft.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: Don't assume cpu_possible_mask is dense</title>
<updated>2025-01-10T00:54:21+00:00</updated>
<author>
<name>Michael Kelley</name>
<email>mhklinux@outlook.com</email>
</author>
<published>2024-10-03T03:53:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=16b18fdf6bc7292ae0edbf33d2d693af3240e49d'/>
<id>16b18fdf6bc7292ae0edbf33d2d693af3240e49d</id>
<content type='text'>
Current code allocates the hv_vp_index array with size
num_possible_cpus(). This code assumes cpu_possible_mask is dense,
which is not true in the general case per [1]. If cpu_possible_mask
is sparse, the array might be indexed by a value beyond the size of
the array.

However, the configurations that Hyper-V provides to guest VMs on x86
and ARM64 hardware, in combination with how architecture specific code
assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask.
So the dense assumption is not currently causing failures. But for
robustness against future changes in how cpu_possible_mask is populated,
update the code to no longer assume dense.

The correct approach is to allocate and initialize the array using size
"nr_cpu_ids". While this leaves unused array entries corresponding to
holes in cpu_possible_mask, the holes are assumed to be minimal and hence
the amount of memory wasted by unused entries is minimal.

Using nr_cpu_ids also reduces initialization time, in that the loop to
initialize the array currently rescans cpu_possible_mask on each
iteration. This is n-squared in the number of CPUs, which could be
significant for large CPU counts.

[1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/

Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241003035333.49261-3-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20241003035333.49261-3-mhklinux@outlook.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current code allocates the hv_vp_index array with size
num_possible_cpus(). This code assumes cpu_possible_mask is dense,
which is not true in the general case per [1]. If cpu_possible_mask
is sparse, the array might be indexed by a value beyond the size of
the array.

However, the configurations that Hyper-V provides to guest VMs on x86
and ARM64 hardware, in combination with how architecture specific code
assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask.
So the dense assumption is not currently causing failures. But for
robustness against future changes in how cpu_possible_mask is populated,
update the code to no longer assume dense.

The correct approach is to allocate and initialize the array using size
"nr_cpu_ids". While this leaves unused array entries corresponding to
holes in cpu_possible_mask, the holes are assumed to be minimal and hence
the amount of memory wasted by unused entries is minimal.

Using nr_cpu_ids also reduces initialization time, in that the loop to
initialize the array currently rescans cpu_possible_mask on each
iteration. This is n-squared in the number of CPUs, which could be
significant for large CPU counts.

[1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/

Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241003035333.49261-3-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20241003035333.49261-3-mhklinux@outlook.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h</title>
<updated>2025-01-10T00:54:21+00:00</updated>
<author>
<name>Nuno Das Neves</name>
<email>nunodasneves@linux.microsoft.com</email>
</author>
<published>2024-11-25T23:24:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ef5a3c92a81a1a892ae9edf949625beb68b4bd43'/>
<id>ef5a3c92a81a1a892ae9edf949625beb68b4bd43</id>
<content type='text'>
Switch to using hvhdk.h everywhere in the kernel. This header
includes all the new Hyper-V headers in include/hyperv, which form a
superset of the definitions found in hyperv-tlfs.h.

This makes it easier to add new Hyper-V interfaces without being
restricted to those in the TLFS doc (reflected in hyperv-tlfs.h).

To be more consistent with the original Hyper-V code, the names of
some definitions are changed slightly. Update those where needed.

Update comments in mshyperv.h files to point to include/hyperv for
adding new definitions.

Signed-off-by: Nuno Das Neves &lt;nunodasneves@linux.microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Signed-off-by: Roman Kisel &lt;romank@linux.microsoft.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com
Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switch to using hvhdk.h everywhere in the kernel. This header
includes all the new Hyper-V headers in include/hyperv, which form a
superset of the definitions found in hyperv-tlfs.h.

This makes it easier to add new Hyper-V interfaces without being
restricted to those in the TLFS doc (reflected in hyperv-tlfs.h).

To be more consistent with the original Hyper-V code, the names of
some definitions are changed slightly. Update those where needed.

Update comments in mshyperv.h files to point to include/hyperv for
adding new definitions.

Signed-off-by: Nuno Das Neves &lt;nunodasneves@linux.microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Signed-off-by: Roman Kisel &lt;romank@linux.microsoft.com&gt;
Reviewed-by: Easwar Hariharan &lt;eahariha@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com
Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet</title>
<updated>2024-12-09T18:44:15+00:00</updated>
<author>
<name>Michael Kelley</name>
<email>mhklinux@outlook.com</email>
</author>
<published>2024-11-06T15:42:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=07a756a49f4b4290b49ea46e089cbe6f79ff8d26'/>
<id>07a756a49f4b4290b49ea46e089cbe6f79ff8d26</id>
<content type='text'>
If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is
fully initialized, we can hit the panic below:

hv_utils: Registering HyperV Utility Driver
hv_vmbus: registering driver hv_utils
...
BUG: kernel NULL pointer dereference, address: 0000000000000000
CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1
RIP: 0010:hv_pkt_iter_first+0x12/0xd0
Call Trace:
...
 vmbus_recvpacket
 hv_kvp_onchannelcallback
 vmbus_on_event
 tasklet_action_common
 tasklet_action
 handle_softirqs
 irq_exit_rcu
 sysvec_hyperv_stimer0
 &lt;/IRQ&gt;
 &lt;TASK&gt;
 asm_sysvec_hyperv_stimer0
...
 kvp_register_done
 hvt_op_read
 vfs_read
 ksys_read
 __x64_sys_read

This can happen because the KVP/VSS channel callback can be invoked
even before the channel is fully opened:
1) as soon as hv_kvp_init() -&gt; hvutil_transport_init() creates
/dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and
register itself to the driver by writing a message KVP_OP_REGISTER1 to the
file (which is handled by kvp_on_msg() -&gt;kvp_handle_handshake()) and
reading the file for the driver's response, which is handled by
hvt_op_read(), which calls hvt-&gt;on_read(), i.e. kvp_register_done().

2) the problem with kvp_register_done() is that it can cause the
channel callback to be called even before the channel is fully opened,
and when the channel callback is starting to run, util_probe()-&gt;
vmbus_open() may have not initialized the ringbuffer yet, so the
callback can hit the panic of NULL pointer dereference.

To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in
__vmbus_open(), just before the first hv_ringbuffer_init(), and then we
unload and reload the driver hv_utils, and run the daemon manually within
the 10 seconds.

Fix the panic by reordering the steps in util_probe() so the char dev
entry used by the KVP or VSS daemon is not created until after
vmbus_open() has completed. This reordering prevents the race condition
from happening.

Reported-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Link: https://lore.kernel.org/r/20241106154247.2271-3-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20241106154247.2271-3-mhklinux@outlook.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is
fully initialized, we can hit the panic below:

hv_utils: Registering HyperV Utility Driver
hv_vmbus: registering driver hv_utils
...
BUG: kernel NULL pointer dereference, address: 0000000000000000
CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1
RIP: 0010:hv_pkt_iter_first+0x12/0xd0
Call Trace:
...
 vmbus_recvpacket
 hv_kvp_onchannelcallback
 vmbus_on_event
 tasklet_action_common
 tasklet_action
 handle_softirqs
 irq_exit_rcu
 sysvec_hyperv_stimer0
 &lt;/IRQ&gt;
 &lt;TASK&gt;
 asm_sysvec_hyperv_stimer0
...
 kvp_register_done
 hvt_op_read
 vfs_read
 ksys_read
 __x64_sys_read

This can happen because the KVP/VSS channel callback can be invoked
even before the channel is fully opened:
1) as soon as hv_kvp_init() -&gt; hvutil_transport_init() creates
/dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and
register itself to the driver by writing a message KVP_OP_REGISTER1 to the
file (which is handled by kvp_on_msg() -&gt;kvp_handle_handshake()) and
reading the file for the driver's response, which is handled by
hvt_op_read(), which calls hvt-&gt;on_read(), i.e. kvp_register_done().

2) the problem with kvp_register_done() is that it can cause the
channel callback to be called even before the channel is fully opened,
and when the channel callback is starting to run, util_probe()-&gt;
vmbus_open() may have not initialized the ringbuffer yet, so the
callback can hit the panic of NULL pointer dereference.

To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in
__vmbus_open(), just before the first hv_ringbuffer_init(), and then we
unload and reload the driver hv_utils, and run the daemon manually within
the 10 seconds.

Fix the panic by reordering the steps in util_probe() so the char dev
entry used by the KVP or VSS daemon is not created until after
vmbus_open() has completed. This reordering prevents the race condition
from happening.

Reported-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Link: https://lore.kernel.org/r/20241106154247.2271-3-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20241106154247.2271-3-mhklinux@outlook.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Drivers: hv: util: Don't force error code to ENODEV in util_probe()</title>
<updated>2024-12-09T18:44:14+00:00</updated>
<author>
<name>Michael Kelley</name>
<email>mhklinux@outlook.com</email>
</author>
<published>2024-11-06T15:42:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=96e052d1473843d644ceba2adf46d3d2180b8ca7'/>
<id>96e052d1473843d644ceba2adf46d3d2180b8ca7</id>
<content type='text'>
If the util_init function call in util_probe() returns an error code,
util_probe() always return ENODEV, and the error code from the util_init
function is lost. The error message output in the caller, vmbus_probe(),
doesn't show the real error code.

Fix this by just returning the error code from the util_init function.
There doesn't seem to be a reason to force ENODEV, as other errors
such as ENOMEM can already be returned from util_probe(). And the
code in call_driver_probe() implies that ENODEV should mean that a
matching driver wasn't found, which is not the case here.

Suggested-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Link: https://lore.kernel.org/r/20241106154247.2271-2-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20241106154247.2271-2-mhklinux@outlook.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If the util_init function call in util_probe() returns an error code,
util_probe() always return ENODEV, and the error code from the util_init
function is lost. The error message output in the caller, vmbus_probe(),
doesn't show the real error code.

Fix this by just returning the error code from the util_init function.
There doesn't seem to be a reason to force ENODEV, as other errors
such as ENOMEM can already be returned from util_probe(). And the
code in call_driver_probe() implies that ENODEV should mean that a
matching driver wasn't found, which is not the case here.

Suggested-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Link: https://lore.kernel.org/r/20241106154247.2271-2-mhklinux@outlook.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Message-ID: &lt;20241106154247.2271-2-mhklinux@outlook.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
