From 102c8b26b54e363f85c4c86099ca049a0a76bb58 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Tue, 17 Feb 2026 08:08:35 -0800 Subject: PCI: Allow all bus devices to use the same slot A PCIe hotplug slot applies to the entire secondary bus. Thus, pciehp only allocates a single hotplug_slot for the bridge to that bus. The existing PCI slot, though, would only match to functions on device 0, meaning any devices beyond that, e.g., ARI functions, are not matched to any slot even though they share it. A slot reset will break all the missing devices because the handling skips them. For example, ARI devices with more than 8 functions fail because their state is not properly handled, nor is the attached driver notified of the reset. In the best case, the device will appear unresponsive to the driver, resulting in unexpected errors. A worse possibility may panic the kernel if in-flight transactions trigger hardware reported errors like this real observation: vfio-pci 0000:01:00.0: resetting vfio-pci 0000:01:00.0: reset done {1}[Hardware Error]: Error 1, type: fatal {1}[Hardware Error]: section_type: PCIe error {1}[Hardware Error]: port_type: 0, PCIe end point {1}[Hardware Error]: version: 0.2 {1}[Hardware Error]: command: 0x0140, status: 0x0010 {1}[Hardware Error]: device_id: 0000:01:01.0 {1}[Hardware Error]: slot: 0 {1}[Hardware Error]: secondary_bus: 0x00 {1}[Hardware Error]: vendor_id: 0x1d9b, device_id: 0x0207 {1}[Hardware Error]: class_code: 020000 {1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000 {1}[Hardware Error]: aer_cor_status: 0x00008000, aer_cor_mask: 0x00002000 {1}[Hardware Error]: aer_uncor_status: 0x00010000, aer_uncor_mask: 0x00100000 {1}[Hardware Error]: aer_uncor_severity: 0x006f6030 {1}[Hardware Error]: TLP Header: 0a412800 00192080 60000004 00000004 GHES: Fatal hardware error but panic disabled Kernel panic - not syncing: GHES: Fatal hardware error Allow a slot to be created to claim all devices on a bus, not just a matching device. This is done by introducing a sentinel value, named PCI_SLOT_ALL_DEVICES, which then has the PCI slot match to any device on the bus. This fixes slot resets for pciehp. Since 0xff already has special meaning, the chosen value for this new feature is 0xfe. This will not clash with any actual slot number since they are limited to 5 bits. Signed-off-by: Keith Busch Signed-off-by: Bjorn Helgaas Reviewed-by: Dan Williams Link: https://patch.msgid.link/20260217160836.2709885-3-kbusch@meta.com --- include/linux/pci.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 1c270f1d5123..5ae2dfdb2d6f 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -72,12 +72,20 @@ /* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */ #define PCI_BUS_NUM(x) (((x) >> 8) & 0xff) +/* + * PCI_SLOT_ALL_DEVICES indicates a slot that covers all devices on the bus. + * Used for PCIe hotplug where the physical slot is the entire secondary bus, + * and, if ARI Forwarding is enabled, functions may appear to be on multiple + * devices. + */ +#define PCI_SLOT_ALL_DEVICES 0xfe + /* pci_slot represents a physical slot */ struct pci_slot { struct pci_bus *bus; /* Bus this slot is on */ struct list_head list; /* Node in list of slots */ struct hotplug_slot *hotplug; /* Hotplug info (move here) */ - unsigned char number; /* PCI_SLOT(pci_dev->devfn) */ + unsigned char number; /* Device nr, or PCI_SLOT_ALL_DEVICES */ struct kobject kobj; }; -- cgit v1.2.3