<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/pci, branch v6.1.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>Revert "PCI: Clear PCI_STATUS when setting up device"</title>
<updated>2022-12-31T12:33:05+00:00</updated>
<author>
<name>Bjorn Helgaas</name>
<email>bhelgaas@google.com</email>
</author>
<published>2022-11-07T21:31:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cbb17d7087ce8ba60ae4a59e515960955e4c4adf'/>
<id>cbb17d7087ce8ba60ae4a59e515960955e4c4adf</id>
<content type='text'>
[ Upstream commit 44e985938e85503d0a69ec538e15fd33c1a4df05 ]

This reverts commit 6cd514e58f12b211d638dbf6f791fa18d854f09c.

Christophe Fergeau reported that 6cd514e58f12 ("PCI: Clear PCI_STATUS when
setting up device") causes boot failures when trying to start linux guests
with Apple's virtualization framework (for example using
https://developer.apple.com/documentation/virtualization/running_linux_in_a_virtual_machine?language=objc)

6cd514e58f12 only solved a cosmetic problem, so revert it to fix the boot
failures.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137803
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 44e985938e85503d0a69ec538e15fd33c1a4df05 ]

This reverts commit 6cd514e58f12b211d638dbf6f791fa18d854f09c.

Christophe Fergeau reported that 6cd514e58f12 ("PCI: Clear PCI_STATUS when
setting up device") causes boot failures when trying to start linux guests
with Apple's virtualization framework (for example using
https://developer.apple.com/documentation/virtualization/running_linux_in_a_virtual_machine?language=objc)

6cd514e58f12 only solved a cosmetic problem, so revert it to fix the boot
failures.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137803
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: vmd: Fix secondary bus reset for Intel bridges</title>
<updated>2022-12-31T12:32:36+00:00</updated>
<author>
<name>Francisco Munoz</name>
<email>francisco.munoz.ruiz@linux.intel.com</email>
</author>
<published>2022-12-06T00:16:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=38c1d5d2f8ae3b46355d76bc58e1ec7b5db1bb06'/>
<id>38c1d5d2f8ae3b46355d76bc58e1ec7b5db1bb06</id>
<content type='text'>
[ Upstream commit 0a584655ef89541dae4d48d2c523b1480ae80284 ]

The reset was never applied in the current implementation because Intel
Bridges owned by VMD are parentless. Internally, pci_reset_bus() applies
a reset to the parent of the PCI device supplied as argument, but in this
case it failed because there wasn't a parent.

In more detail, this change allows the VMD driver to enumerate NVMe devices
in pass-through configurations when guest reboots are performed. There was
an attempted to fix this, but later we discovered that the code inside
pci_reset_bus() wasn’t triggering secondary bus resets. Therefore, we
updated the parameters passed to it, and now NVMe SSDs attached to VMD
bridges are properly enumerated in VT-d pass-through scenarios.

Link: https://lore.kernel.org/r/20221206001637.4744-1-francisco.munoz.ruiz@linux.intel.com
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
Signed-off-by: Francisco Munoz &lt;francisco.munoz.ruiz@linux.intel.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Reviewed-by: Nirmal Patel &lt;nirmal.patel@linux.intel.com&gt;
Reviewed-by: Jonathan Derrick &lt;jonathan.derrick@linux.dev&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 0a584655ef89541dae4d48d2c523b1480ae80284 ]

The reset was never applied in the current implementation because Intel
Bridges owned by VMD are parentless. Internally, pci_reset_bus() applies
a reset to the parent of the PCI device supplied as argument, but in this
case it failed because there wasn't a parent.

In more detail, this change allows the VMD driver to enumerate NVMe devices
in pass-through configurations when guest reboots are performed. There was
an attempted to fix this, but later we discovered that the code inside
pci_reset_bus() wasn’t triggering secondary bus resets. Therefore, we
updated the parameters passed to it, and now NVMe SSDs attached to VMD
bridges are properly enumerated in VT-d pass-through scenarios.

Link: https://lore.kernel.org/r/20221206001637.4744-1-francisco.munoz.ruiz@linux.intel.com
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
Signed-off-by: Francisco Munoz &lt;francisco.munoz.ruiz@linux.intel.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Reviewed-by: Nirmal Patel &lt;nirmal.patel@linux.intel.com&gt;
Reviewed-by: Jonathan Derrick &lt;jonathan.derrick@linux.dev&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path</title>
<updated>2022-12-31T12:32:34+00:00</updated>
<author>
<name>Frank Li</name>
<email>frank.li@nxp.com</email>
</author>
<published>2022-11-02T14:10:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=21d8b6bd3730fc6af7ff0eb3c65a8364da310f66'/>
<id>21d8b6bd3730fc6af7ff0eb3c65a8364da310f66</id>
<content type='text'>
[ Upstream commit 0c031262d2ddfb938f9668d620d7ed674771646c ]

Replace pci_epc_mem_free_addr() with pci_epf_free_space() in the
error handle path to match pci_epf_alloc_space().

Link: https://lore.kernel.org/r/20221102141014.1025893-4-Frank.Li@nxp.com
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Frank Li &lt;frank.li@nxp.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 0c031262d2ddfb938f9668d620d7ed674771646c ]

Replace pci_epc_mem_free_addr() with pci_epf_free_space() in the
error handle path to match pci_epf_alloc_space().

Link: https://lore.kernel.org/r/20221102141014.1025893-4-Frank.Li@nxp.com
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Frank Li &lt;frank.li@nxp.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: Check for alloc failure in pci_request_irq()</title>
<updated>2022-12-31T12:32:33+00:00</updated>
<author>
<name>Zeng Heng</name>
<email>zengheng4@huawei.com</email>
</author>
<published>2022-11-21T02:00:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=76ca6756bf038bd6d0fa735673e4fc97d63622d0'/>
<id>76ca6756bf038bd6d0fa735673e4fc97d63622d0</id>
<content type='text'>
[ Upstream commit 2d9cd957d40c3ac491b358e7cff0515bb07a3a9c ]

When kvasprintf() fails to allocate memory, it returns a NULL pointer.
Return error from pci_request_irq() so we don't dereference it.

[bhelgaas: commit log]
Fixes: 704e8953d3e9 ("PCI/irq: Add pci_request_irq() and pci_free_irq() helpers")
Link: https://lore.kernel.org/r/20221121020029.3759444-1-zengheng4@huawei.com
Signed-off-by: Zeng Heng &lt;zengheng4@huawei.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 2d9cd957d40c3ac491b358e7cff0515bb07a3a9c ]

When kvasprintf() fails to allocate memory, it returns a NULL pointer.
Return error from pci_request_irq() so we don't dereference it.

[bhelgaas: commit log]
Fixes: 704e8953d3e9 ("PCI/irq: Add pci_request_irq() and pci_free_irq() helpers")
Link: https://lore.kernel.org/r/20221121020029.3759444-1-zengheng4@huawei.com
Signed-off-by: Zeng Heng &lt;zengheng4@huawei.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: imx6: Initialize PHY before deasserting core reset</title>
<updated>2022-12-31T12:32:32+00:00</updated>
<author>
<name>Sascha Hauer</name>
<email>s.hauer@pengutronix.de</email>
</author>
<published>2022-11-01T09:57:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=71bf3550b5303e04c4dcdba8ea0e8288b795b196'/>
<id>71bf3550b5303e04c4dcdba8ea0e8288b795b196</id>
<content type='text'>
[ Upstream commit ae6b9a65af480144da323436d90e149501ea8937 ]

When the PHY is the reference clock provider then it must be initialized
and powered on before the reset on the client is deasserted, otherwise
the link will never come up. The order was changed in cf236e0c0d59.
Restore the correct order to make the driver work again on boards where
the PHY provides the reference clock. This also changes the order for
boards where the Soc is the PHY reference clock divider, but this
shouldn't do any harm.

Link: https://lore.kernel.org/r/20221101095714.440001-1-s.hauer@pengutronix.de
Fixes: cf236e0c0d59 ("PCI: imx6: Do not hide PHY driver callbacks and refine the error handling")
Tested-by: Richard Zhu &lt;hongxing.zhu@nxp.com&gt;
Signed-off-by: Sascha Hauer &lt;s.hauer@pengutronix.de&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ae6b9a65af480144da323436d90e149501ea8937 ]

When the PHY is the reference clock provider then it must be initialized
and powered on before the reset on the client is deasserted, otherwise
the link will never come up. The order was changed in cf236e0c0d59.
Restore the correct order to make the driver work again on boards where
the PHY provides the reference clock. This also changes the order for
boards where the Soc is the PHY reference clock divider, but this
shouldn't do any harm.

Link: https://lore.kernel.org/r/20221101095714.440001-1-s.hauer@pengutronix.de
Fixes: cf236e0c0d59 ("PCI: imx6: Do not hide PHY driver callbacks and refine the error handling")
Tested-by: Richard Zhu &lt;hongxing.zhu@nxp.com&gt;
Signed-off-by: Sascha Hauer &lt;s.hauer@pengutronix.de&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: vmd: Disable MSI remapping after suspend</title>
<updated>2022-12-31T12:32:32+00:00</updated>
<author>
<name>Nirmal Patel</name>
<email>nirmal.patel@linux.intel.com</email>
</author>
<published>2022-11-09T14:26:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5d2e13358243410743a2b89187e28a972fb45d10'/>
<id>5d2e13358243410743a2b89187e28a972fb45d10</id>
<content type='text'>
[ Upstream commit d899aa668498c07ff217b666ae9712990306e682 ]

MSI remapping is disabled by VMD driver for Intel's Icelake and
newer systems in order to improve performance by setting
VMCONFIG_MSI_REMAP. By design VMCONFIG_MSI_REMAP register is cleared
by firmware during boot. The same register gets cleared when system
is put in S3 power state. VMD driver needs to set this register again
in order to avoid interrupt issues with devices behind VMD if MSI
remapping was disabled before.

Link: https://lore.kernel.org/r/20221109142652.450998-1-nirmal.patel@linux.intel.com
Fixes: ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible")
Signed-off-by: Nirmal Patel &lt;nirmal.patel@linux.intel.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Reviewed-by: Francisco Munoz &lt;francisco.munoz.ruiz@linux.intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit d899aa668498c07ff217b666ae9712990306e682 ]

MSI remapping is disabled by VMD driver for Intel's Icelake and
newer systems in order to improve performance by setting
VMCONFIG_MSI_REMAP. By design VMCONFIG_MSI_REMAP register is cleared
by firmware during boot. The same register gets cleared when system
is put in S3 power state. VMD driver needs to set this register again
in order to avoid interrupt issues with devices behind VMD if MSI
remapping was disabled before.

Link: https://lore.kernel.org/r/20221109142652.450998-1-nirmal.patel@linux.intel.com
Fixes: ee81ee84f873 ("PCI: vmd: Disable MSI-X remapping when possible")
Signed-off-by: Nirmal Patel &lt;nirmal.patel@linux.intel.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Reviewed-by: Francisco Munoz &lt;francisco.munoz.ruiz@linux.intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: pci-epf-test: Register notifier if only core_init_notifier is enabled</title>
<updated>2022-12-31T12:32:30+00:00</updated>
<author>
<name>Kunihiko Hayashi</name>
<email>hayashi.kunihiko@socionext.com</email>
</author>
<published>2022-08-25T09:01:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0ea2ba0d5b4ff04905c8afabc147d0ad5b2e019f'/>
<id>0ea2ba0d5b4ff04905c8afabc147d0ad5b2e019f</id>
<content type='text'>
[ Upstream commit 6acd25cc98ce0c9ee4fefdaf44fc8bca534b26e5 ]

The pci_epf_test_notifier function should be installed also if only
core_init_notifier is enabled. Fix the current logic.

Link: https://lore.kernel.org/r/20220825090101.20474-1-hayashi.kunihiko@socionext.com
Fixes: 5e50ee27d4a5 ("PCI: pci-epf-test: Add support to defer core initialization")
Signed-off-by: Kunihiko Hayashi &lt;hayashi.kunihiko@socionext.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Acked-by: Om Prakash Singh &lt;omp@nvidia.com&gt;
Acked-by: Kishon Vijay Abraham I &lt;kishon@ti.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 6acd25cc98ce0c9ee4fefdaf44fc8bca534b26e5 ]

The pci_epf_test_notifier function should be installed also if only
core_init_notifier is enabled. Fix the current logic.

Link: https://lore.kernel.org/r/20220825090101.20474-1-hayashi.kunihiko@socionext.com
Fixes: 5e50ee27d4a5 ("PCI: pci-epf-test: Add support to defer core initialization")
Signed-off-by: Kunihiko Hayashi &lt;hayashi.kunihiko@socionext.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Acked-by: Om Prakash Singh &lt;omp@nvidia.com&gt;
Acked-by: Kishon Vijay Abraham I &lt;kishon@ti.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: dwc: Fix n_fts[] array overrun</title>
<updated>2022-12-31T12:32:30+00:00</updated>
<author>
<name>Vidya Sagar</name>
<email>vidyas@nvidia.com</email>
</author>
<published>2022-09-26T11:19:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b5c7878b034c4c37f8e9adcb9442dfaa65c37d9c'/>
<id>b5c7878b034c4c37f8e9adcb9442dfaa65c37d9c</id>
<content type='text'>
[ Upstream commit 66110361281b2f7da0c8bd51eaf1f152f4236035 ]

commit aeaa0bfe89654 ("PCI: dwc: Move N_FTS setup to common setup")
incorrectly uses pci-&gt;link_gen in deriving the index to the
n_fts[] array also introducing the issue of accessing beyond the
boundaries of array for greater than Gen-2 speeds. This change fixes
that issue.

Link: https://lore.kernel.org/r/20220926111923.22487-1-vidyas@nvidia.com
Fixes: aeaa0bfe8965 ("PCI: dwc: Move N_FTS setup to common setup")
Signed-off-by: Vidya Sagar &lt;vidyas@nvidia.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Reviewed-by: Rob Herring &lt;robh@kernel.org&gt;
Acked-by: Jingoo Han &lt;jingoohan1@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 66110361281b2f7da0c8bd51eaf1f152f4236035 ]

commit aeaa0bfe89654 ("PCI: dwc: Move N_FTS setup to common setup")
incorrectly uses pci-&gt;link_gen in deriving the index to the
n_fts[] array also introducing the issue of accessing beyond the
boundaries of array for greater than Gen-2 speeds. This change fixes
that issue.

Link: https://lore.kernel.org/r/20220926111923.22487-1-vidyas@nvidia.com
Fixes: aeaa0bfe8965 ("PCI: dwc: Move N_FTS setup to common setup")
Signed-off-by: Vidya Sagar &lt;vidyas@nvidia.com&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Reviewed-by: Rob Herring &lt;robh@kernel.org&gt;
Acked-by: Jingoo Han &lt;jingoohan1@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: mt7621: Add sentinel to quirks table</title>
<updated>2022-12-21T16:48:02+00:00</updated>
<author>
<name>John Thomson</name>
<email>git@johnthomson.fastmail.com.au</email>
</author>
<published>2022-12-05T20:46:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a4997bae1b5b012c8a6e2643e26578a7bc2cae36'/>
<id>a4997bae1b5b012c8a6e2643e26578a7bc2cae36</id>
<content type='text'>
commit 19098934f910b4d47cb30251dd39ffa57bef9523 upstream.

Current driver is missing a sentinel in the struct soc_device_attribute
array, which causes an oops when assessed by the
soc_device_match(mt7621_pcie_quirks_match) call.

This was only exposed once the CONFIG_SOC_MT7621 mt7621 soc_dev_attr
was fixed to register the SOC as a device, in:

commit 7c18b64bba3b ("mips: ralink: mt7621: do not use kzalloc too early")

Fix it by adding the required sentinel.

Link: https://lore.kernel.org/lkml/26ebbed1-0fe9-4af9-8466-65f841d0b382@app.fastmail.com
Link: https://lore.kernel.org/r/20221205204645.301301-1-git@johnthomson.fastmail.com.au
Fixes: b483b4e4d3f6 ("staging: mt7621-pci: add quirks for 'E2' revision using 'soc_device_attribute'")
Signed-off-by: John Thomson &lt;git@johnthomson.fastmail.com.au&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Acked-by: Sergio Paracuellos &lt;sergio.paracuellos@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 19098934f910b4d47cb30251dd39ffa57bef9523 upstream.

Current driver is missing a sentinel in the struct soc_device_attribute
array, which causes an oops when assessed by the
soc_device_match(mt7621_pcie_quirks_match) call.

This was only exposed once the CONFIG_SOC_MT7621 mt7621 soc_dev_attr
was fixed to register the SOC as a device, in:

commit 7c18b64bba3b ("mips: ralink: mt7621: do not use kzalloc too early")

Fix it by adding the required sentinel.

Link: https://lore.kernel.org/lkml/26ebbed1-0fe9-4af9-8466-65f841d0b382@app.fastmail.com
Link: https://lore.kernel.org/r/20221205204645.301301-1-git@johnthomson.fastmail.com.au
Fixes: b483b4e4d3f6 ("staging: mt7621-pci: add quirks for 'E2' revision using 'soc_device_attribute'")
Signed-off-by: John Thomson &lt;git@johnthomson.fastmail.com.au&gt;
Signed-off-by: Lorenzo Pieralisi &lt;lpieralisi@kernel.org&gt;
Acked-by: Sergio Paracuellos &lt;sergio.paracuellos@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: hv: Only reuse existing IRTE allocation for Multi-MSI</title>
<updated>2022-11-12T12:43:59+00:00</updated>
<author>
<name>Dexuan Cui</name>
<email>decui@microsoft.com</email>
</author>
<published>2022-11-04T22:29:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c234ba8042920fa83635808dc5673f36869ca280'/>
<id>c234ba8042920fa83635808dc5673f36869ca280</id>
<content type='text'>
Jeffrey added Multi-MSI support to the pci-hyperv driver by the 4 patches:
08e61e861a0e ("PCI: hv: Fix multi-MSI to allow more than one MSI vector")
455880dfe292 ("PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI")
b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
a2bad844a67b ("PCI: hv: Fix interrupt mapping for multi-MSI")

It turns out that the third patch (b4b77778ecc5) causes a performance
regression because all the interrupts now happen on 1 physical CPU (or two
pCPUs, if one pCPU doesn't have enough vectors). When a guest has many PCI
devices, it may suffer from soft lockups if the workload is heavy, e.g.,
see https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/

Commit b4b77778ecc5 itself is good. The real issue is that the hypercall in
hv_irq_unmask() -&gt; hv_arch_irq_unmask() -&gt;
hv_do_hypercall(HVCALL_RETARGET_INTERRUPT...) only changes the target
virtual CPU rather than physical CPU; with b4b77778ecc5, the pCPU is
determined only once in hv_compose_msi_msg() where only vCPU0 is specified;
consequently the hypervisor only uses 1 target pCPU for all the interrupts.

Note: before b4b77778ecc5, the pCPU is determined twice, and when the pCPU
is determined the second time, the vCPU in the effective affinity mask is
used (i.e., it isn't always vCPU0), so the hypervisor chooses different
pCPU for each interrupt.

The hypercall will be fixed in future to update the pCPU as well, but
that will take quite a while, so let's restore the old behavior in
hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for
single-MSI and MSI-X; for multi-MSI, we choose the vCPU in a round-robin
manner for each PCI device, so the interrupts of different devices can
happen on different pCPUs, though the interrupts of each device happen on
some single pCPU.

The hypercall fix may not be backported to all old versions of Hyper-V, so
we want to have this guest side change forever (or at least till we're sure
the old affected versions of Hyper-V are no longer supported).

Fixes: b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
Co-developed-by: Jeffrey Hugo &lt;quic_jhugo@quicinc.com&gt;
Signed-off-by: Jeffrey Hugo &lt;quic_jhugo@quicinc.com&gt;
Co-developed-by: Carl Vanderlip &lt;quic_carlv@quicinc.com&gt;
Signed-off-by: Carl Vanderlip &lt;quic_carlv@quicinc.com&gt;
Signed-off-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mikelley@microsoft.com&gt;
Link: https://lore.kernel.org/r/20221104222953.11356-1-decui@microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Jeffrey added Multi-MSI support to the pci-hyperv driver by the 4 patches:
08e61e861a0e ("PCI: hv: Fix multi-MSI to allow more than one MSI vector")
455880dfe292 ("PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI")
b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
a2bad844a67b ("PCI: hv: Fix interrupt mapping for multi-MSI")

It turns out that the third patch (b4b77778ecc5) causes a performance
regression because all the interrupts now happen on 1 physical CPU (or two
pCPUs, if one pCPU doesn't have enough vectors). When a guest has many PCI
devices, it may suffer from soft lockups if the workload is heavy, e.g.,
see https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/

Commit b4b77778ecc5 itself is good. The real issue is that the hypercall in
hv_irq_unmask() -&gt; hv_arch_irq_unmask() -&gt;
hv_do_hypercall(HVCALL_RETARGET_INTERRUPT...) only changes the target
virtual CPU rather than physical CPU; with b4b77778ecc5, the pCPU is
determined only once in hv_compose_msi_msg() where only vCPU0 is specified;
consequently the hypervisor only uses 1 target pCPU for all the interrupts.

Note: before b4b77778ecc5, the pCPU is determined twice, and when the pCPU
is determined the second time, the vCPU in the effective affinity mask is
used (i.e., it isn't always vCPU0), so the hypervisor chooses different
pCPU for each interrupt.

The hypercall will be fixed in future to update the pCPU as well, but
that will take quite a while, so let's restore the old behavior in
hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for
single-MSI and MSI-X; for multi-MSI, we choose the vCPU in a round-robin
manner for each PCI device, so the interrupts of different devices can
happen on different pCPUs, though the interrupts of each device happen on
some single pCPU.

The hypercall fix may not be backported to all old versions of Hyper-V, so
we want to have this guest side change forever (or at least till we're sure
the old affected versions of Hyper-V are no longer supported).

Fixes: b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
Co-developed-by: Jeffrey Hugo &lt;quic_jhugo@quicinc.com&gt;
Signed-off-by: Jeffrey Hugo &lt;quic_jhugo@quicinc.com&gt;
Co-developed-by: Carl Vanderlip &lt;quic_carlv@quicinc.com&gt;
Signed-off-by: Carl Vanderlip &lt;quic_carlv@quicinc.com&gt;
Signed-off-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Reviewed-by: Michael Kelley &lt;mikelley@microsoft.com&gt;
Link: https://lore.kernel.org/r/20221104222953.11356-1-decui@microsoft.com
Signed-off-by: Wei Liu &lt;wei.liu@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
