linux-stable.git/drivers/pci, branch linux-5.0.y

PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored

2019-05-25T16:22:26+00:00

[ Upstream commit d5bc73f34cc97c4b4b9202cc93182c2515076edf ]

In most cases, kmalloc() will not be available early in boot when
pci_setup() is called.  Thus, the kstrdup() call that was added to fix the
__initdata bug with the disable_acs_redir parameter usually returns NULL,
so the parameter is discarded and has no effect.

To fix this, store the string that's in initdata until an initcall function
can allocate the memory appropriately.  This way we don't need any
additional static memory.

Fixes: d2fd6e81912a ("PCI: Fix __initdata issue with "pci=disable_acs_redir" parameter")
Signed-off-by: Logan Gunthorpe 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Sasha Levin

PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum

2019-05-25T16:22:18+00:00

commit 4ec73791a64bab25cabf16a6067ee478692e506d upstream.

Due to an erratum in some Pericom PCIe-to-PCI bridges in reverse mode
(conventional PCI on primary side, PCIe on downstream side), the Retrain
Link bit needs to be cleared manually to allow the link training to
complete successfully.

If it is not cleared manually, the link training is continuously restarted
and no devices below the PCI-to-PCIe bridge can be accessed.  That means
drivers for devices below the bridge will be loaded but won't work and may
even crash because the driver is only reading 0xffff.

See the Pericom Errata Sheet PI7C9X111SLB_errata_rev1.2_102711.pdf for
details.  Devices known as affected so far are: PI7C9X110, PI7C9X111SL,
PI7C9X130.

Add a new flag, clear_retrain_link, in struct pci_dev.  Quirks for affected
devices set this bit.

Note that pcie_retrain_link() lives in aspm.c because that's currently the
only place we use it, but this erratum is not specific to ASPM, and we may
retrain links for other reasons in the future.

Signed-off-by: Stefan Mätje 
[bhelgaas: apply regardless of CONFIG_PCIEASPM]
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Andy Shevchenko 
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

PCI: Factor out pcie_retrain_link() function

2019-05-25T16:22:18+00:00

commit 86fa6a344209d9414ea962b1f1ac6ade9dd7563a upstream.

Factor out pcie_retrain_link() to use for Pericom Retrain Link quirk.  No
functional change intended.

Signed-off-by: Stefan Mätje 
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Andy Shevchenko 
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

PCI: rcar: Add the initialization of PCIe link in resume_noirq()

2019-05-25T16:22:17+00:00

commit be20bbcb0a8cb5597cc62b3e28d275919f3431df upstream.

Reestablish the PCIe link very early in the resume process in case it
went down to prevent PCI accesses from hanging the bus. Such accesses
can happen early in the PCI resume process, as early as the
SUSPEND_RESUME_NOIRQ step, thus the link must be reestablished in the
driver resume_noirq() callback.

Fixes: e015f88c368d ("PCI: rcar: Add support for R-Car H3 to pcie-rcar")
Signed-off-by: Kazufumi Ikeda 
Signed-off-by: Gaku Inami 
Signed-off-by: Marek Vasut 
[lorenzo.pieralisi@arm.com: reformatted commit log]
Signed-off-by: Lorenzo Pieralisi 
Reviewed-by: Simon Horman 
Reviewed-by: Geert Uytterhoeven 
Acked-by: Wolfram Sang 
Cc: stable@vger.kernel.org
Cc: Geert Uytterhoeven 
Cc: Phil Edworthy 
Cc: Simon Horman 
Cc: Wolfram Sang 
Cc: linux-renesas-soc@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

PCI/AER: Change pci_aer_init() stub to return void

2019-05-25T16:22:17+00:00

commit 31f996efbd5a7825f4d30150469e9d110aea00e8 upstream.

Commit 60ed982a4e78 ("PCI/AER: Move internal declarations to
drivers/pci/pci.h") changed pci_aer_init() to return "void", but didn't
change the stub for when CONFIG_PCIEAER isn't enabled.  Change the stub to
match.

Fixes: 60ed982a4e78 ("PCI/AER: Move internal declarations to drivers/pci/pci.h")
Signed-off-by: Jisheng Zhang 
Signed-off-by: Bjorn Helgaas 
CC: stable@vger.kernel.org	# v4.19+
Signed-off-by: Greg Kroah-Hartman

PCI: Init PCIe feature bits for managed host bridge alloc

2019-05-25T16:22:17+00:00

commit 6302bf3ef78dd210b5ff4a922afcb7d8eff8a211 upstream.

Two functions allocate a host bridge: devm_pci_alloc_host_bridge() and
pci_alloc_host_bridge().  At the moment, only the unmanaged one initializes
the PCIe feature bits, which prevents from using features such as hotplug
or AER on some systems, when booting with device tree.  Make the
initialization code common.

Fixes: 02bfeb484230 ("PCI/portdrv: Simplify PCIe feature permission checking")
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Bjorn Helgaas 
CC: stable@vger.kernel.org	# v4.17+
Signed-off-by: Greg Kroah-Hartman

PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary

2019-05-25T16:22:17+00:00

commit e0547c81bfcfad01cbbfa93a5e66bb98ab932f80 upstream.

On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
variant, the BIOS does not always reset the secondary Nvidia GPU during
reboot if the laptop is configured in Hybrid Graphics mode.  The reason is
unknown, but the following steps and possibly a good bit of patience will
reproduce the issue:

  1. Boot up the laptop normally in Hybrid Graphics mode
  2. Make sure nouveau is loaded and that the GPU is awake
  3. Allow the Nvidia GPU to runtime suspend itself after being idle
  4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
  5. If nouveau loads up properly, reboot the machine again and go back to
     step 2 until you reproduce the issue

This results in some very strange behavior: the GPU will be left in exactly
the same state it was in when the previously booted kernel started the
reboot.  This has all sorts of bad side effects: for starters, this
completely breaks nouveau starting with a mysterious EVO channel failure
that happens well before we've actually used the EVO channel for anything:

  nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002

This causes a timeout trying to bring up the GR ctx:

  nouveau 0000:01:00.0: timeout
  WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
  Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
  Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
  ...
  nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
  nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
  nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]

The GPU never manages to recover.  Booting without loading nouveau causes
issues as well, since the GPU starts sending spurious interrupts that cause
other device's IRQs to get disabled by the kernel:

  irq 16: nobody cared (try booting with the "irqpoll" option)
  ...
  handlers:
  [<000000007faa9e99>] i801_isr [i2c_i801]
  Disabling IRQ #16
  ...
  serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
  i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
  i801_smbus 0000:00:1f.4: Transaction timeout
  rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
  i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
  i801_smbus 0000:00:1f.4: Transaction timeout
  rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!

This causes the touchpad and sometimes other things to get disabled.

Since this happens without nouveau, we can't fix this problem from nouveau
itself.

Add a PCI quirk for the specific P50 variant of this GPU.  Make sure the
GPU is advertising NoReset- so we don't reset the GPU when the machine is
in Dedicated graphics mode (where the GPU being initialized by the BIOS is
normal and expected).  Map the GPU MMIO space and read the magic 0x2240c
register, which will have bit 1 set if the device was POSTed during a
previous boot.  Once we've confirmed all of this, reset the GPU and
re-disable it - bringing it back to a healthy state.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003
Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.com
Signed-off-by: Lyude Paul 
Signed-off-by: Bjorn Helgaas 
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: Karol Herbst 
Cc: Ben Skeggs 
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

PCI: Mark Atheros AR9462 to avoid bus reset

2019-05-25T16:22:17+00:00

commit 6afb7e26978da5e86e57e540fdce65c8b04f398a upstream.

When using PCI passthrough with this device, the host machine locks up
completely when starting the VM, requiring a hard reboot.  Add a quirk to
avoid bus resets on this device.

Fixes: c3e59ee4e766 ("PCI: Mark Atheros AR93xx to avoid bus reset")
Link: https://lore.kernel.org/linux-pci/20190107213248.3034-1-james.prestwood@linux.intel.com
Signed-off-by: James Prestwood 
Signed-off-by: Bjorn Helgaas 
CC: stable@vger.kernel.org	# v3.14+
Signed-off-by: Greg Kroah-Hartman

PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken

2019-05-25T16:22:16+00:00

commit d28ca864c493637f3c957f4ed9348a94fca6de60 upstream.

ATS is broken on the Radeon R7 GPU (at least for Stoney Ridge based laptop)
and causes IOMMU stalls and system failure.  Disable ATS on these devices
to make them usable again with IOMMU enabled.

Thanks to Joerg Roedel  for help.

[bhelgaas: In the email thread mentioned below, Alex suspects the real
problem is in sbios or iommu, so it may affect only certain systems, and it
may affect other devices in those systems as well.  However, per Joerg we
lack the ability to debug further, so this quirk is the best we can do for
now.]

Link: https://bugzilla.kernel.org/show_bug.cgi?id=194521
Link: https://lore.kernel.org/lkml/20190408103725.30426-1-nickel@altlinux.org
Fixes: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")
Signed-off-by: Nikolai Kostrigin 
Signed-off-by: Bjorn Helgaas 
Acked-by: Joerg Roedel 
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary

2019-05-16T17:40:30+00:00

commit 340d455699400f2c2c0f9b3f703ade3085cdb501 upstream.

When we hot-remove a device, usually the host sends us a PCI_EJECT message,
and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.

When we execute the quick hot-add/hot-remove test, the host may not send
us the PCI_EJECT message if the guest has not fully finished the
initialization by sending the PCI_RESOURCES_ASSIGNED* message to the
host, so it's potentially unsafe to only depend on the
pci_destroy_slot() in hv_eject_device_work() because the code path

create_root_hv_pci_bus()
 -> hv_pci_assign_slots()

is not called in this case. Note: in this case, the host still sends the
guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.

In the quick hot-add/hot-remove test, we can have such a race before
the code path

pci_devices_present_work()
 -> new_pcichild_device()

adds the new device into the hbus->children list, we may have already
received the PCI_EJECT message, and since the tasklet handler

hv_pci_onchannelcallback()

may fail to find the "hpdev" by calling

get_pcichild_wslot(hbus, dev_message->wslot.slot)

hv_pci_eject_device() is not called; Later, by continuing execution

create_root_hv_pci_bus()
 -> hv_pci_assign_slots()

creates the slot and the PCI_BUS_RELATIONS message with
bus_rel->device_count == 0 removes the device from hbus->children, and
we end up being unable to remove the slot in

hv_pci_remove()
 -> hv_pci_remove_slots()

Remove the slot in pci_devices_present_work() when the device
is removed to address this race.

pci_devices_present_work() and hv_eject_device_work() run in the
singled-threaded hbus->wq, so there is not a double-remove issue for the
slot.

We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback()
to the workqueue, because we need the hv_pci_onchannelcallback()
synchronously call hv_pci_eject_device() to poll the channel
ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue
fixed in commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in
hv_compose_msi_msg()")

Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: Dexuan Cui 
[lorenzo.pieralisi@arm.com: rewritten commit log]
Signed-off-by: Lorenzo Pieralisi 
Reviewed-by: Stephen Hemminger 
Reviewed-by:  Michael Kelley 
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman