linux-stable.git/drivers/pci, branch v5.8.2

irqdomain/treewide: Free firmware node after domain removal

2020-08-19T06:27:08+00:00

commit ec0160891e387f4771f953b888b1fe951398e5d9 upstream.

Commit 711419e504eb ("irqdomain: Add the missing assignment of
domain->fwnode for named fwnode") unintentionally caused a dangling pointer
page fault issue on firmware nodes that were freed after IRQ domain
allocation. Commit e3beca48a45b fixed that dangling pointer issue by only
freeing the firmware node after an IRQ domain allocation failure. That fix
no longer frees the firmware node immediately, but leaves the firmware node
allocated after the domain is removed.

The firmware node must be kept around through irq_domain_remove, but should be
freed it afterwards.

Add the missing free operations after domain removal where where appropriate.

Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated")
Signed-off-by: Jon Derrick 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Andy Shevchenko 
Acked-by: Bjorn Helgaas 	# drivers/pci
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com
Signed-off-by: Greg Kroah-Hartman

PCI: Release IVRS table in AMD ACS quirk

2020-08-19T06:26:49+00:00

[ Upstream commit 090688fa4e448284aaa16136372397d7d10814db ]

The acpi_get_table() should be coupled with acpi_put_table() if the mapped
table is not used at runtime to release the table mapping.

In pci_quirk_amd_sb_acs(), IVRS table is just used for checking AMD IOMMU
is supported, not used at runtime, so put the table after using it.

Fixes: 15b100dfd1c9 ("PCI: Claim ACS support for AMD southbridge devices")
Link: https://lore.kernel.org/r/1595411068-15440-1-git-send-email-guohanjun@huawei.com
Signed-off-by: Hanjun Guo 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Sasha Levin

PCI: cadence: Fix updating Vendor ID and Subsystem Vendor ID register

2020-08-19T06:26:45+00:00

[ Upstream commit e3bca37d15dca118f2ef1f0a068bb6e07846ea20 ]

Commit 1b79c5284439 ("PCI: cadence: Add host driver for Cadence PCIe
controller") in order to update Vendor ID, directly wrote to
PCI_VENDOR_ID register. However PCI_VENDOR_ID in root port configuration
space is read-only register and writing to it will have no effect.
Use local management register to configure Vendor ID and Subsystem Vendor
ID.

Link: https://lore.kernel.org/r/20200722110317.4744-10-kishon@ti.com
Fixes: 1b79c5284439 ("PCI: cadence: Add host driver for Cadence PCIe controller")
Signed-off-by: Kishon Vijay Abraham I 
Signed-off-by: Lorenzo Pieralisi 
Reviewed-by: Rob Herring 
Signed-off-by: Sasha Levin

PCI: cadence: Fix cdns_pcie_{host|ep}_setup() error path

2020-08-19T06:26:45+00:00

[ Upstream commit 19abcd790b51b26d775e1170ba2ac086823cceeb ]

commit bd22885aa188 ("PCI: cadence: Refactor driver to use as a core
library") while refactoring the Cadence PCIe driver to be used as
library, removed pm_runtime_get_sync() from cdns_pcie_ep_setup()
and cdns_pcie_host_setup() but missed to remove the corresponding
pm_runtime_put_sync() in the error path. Fix it here.

Link: https://lore.kernel.org/r/20200722110317.4744-3-kishon@ti.com
Fixes: bd22885aa188 ("PCI: cadence: Refactor driver to use as a core library")
Signed-off-by: Kishon Vijay Abraham I 
Signed-off-by: Lorenzo Pieralisi 
Reviewed-by: Rob Herring 
Signed-off-by: Sasha Levin

PCI: rcar: Fix runtime PM imbalance on error

2020-08-19T06:26:43+00:00

[ Upstream commit a68e06e729b1b06c50ee52917d6b825b43e7d269 ]

pm_runtime_get_sync() increments the runtime PM usage counter even
when the call returns an error code. Thus a corresponding decrement is
needed on the error handling path to keep the counter balanced.

Link: https://lore.kernel.org/r/20200709064356.8800-1-dinghao.liu@zju.edu.cn
Fixes: 0df6150e7ceb ("PCI: rcar: Use runtime PM to control controller clock")
Signed-off-by: Dinghao Liu 
Signed-off-by: Lorenzo Pieralisi 
Reviewed-by: Yoshihiro Shimoda 
Signed-off-by: Sasha Levin

PCI: loongson: Use DECLARE_PCI_FIXUP_EARLY for bridge_class_quirk()

2020-08-19T06:26:39+00:00

[ Upstream commit 14110af606965ce07abe4d121c100241c2e73b86 ]

According to the datasheet of Loongson LS7A bridge chip, the old version
of Loongson LS7A PCIE port has a wrong value about PCI class which is
0x060000, the correct value should be 0x060400, this bug can be fixed by
"dev->class = PCI_CLASS_BRIDGE_PCI << 8;" at the software level and it
was fixed in hardware in the latest LS7A versions.

In order to maintain downward compatibility, use DECLARE_PCI_FIXUP_EARLY
instead of DECLARE_PCI_FIXUP_HEADER for bridge_class_quirk() to fix it as
early as possible.

Otherwise, in the function pci_setup_device(), the related code about
"dev->class" such as "class = dev->class >> 8;" and "dev->transparent
= ((dev->class & 0xff) == 1);" maybe get wrong value without EARLY fixup.

Link: https://lore.kernel.org/r/1595065176-460-1-git-send-email-yangtiezhu@loongson.cn
Fixes: 1f58cca5cf2b ("PCI: Add Loongson PCI Controller support")
Signed-off-by: Tiezhu Yang 
Signed-off-by: Lorenzo Pieralisi 
Signed-off-by: Sasha Levin

PCI/ASPM: Add missing newline in sysfs 'policy'

2020-08-19T06:26:37+00:00

[ Upstream commit 3167e3d340c092fd47924bc4d23117a3074ef9a9 ]

When I cat ASPM parameter 'policy' by sysfs, it displays as follows.  Add a
newline for easy reading.  Other sysfs attributes already include a
newline.

  [root@localhost ~]# cat /sys/module/pcie_aspm/parameters/policy
  [default] performance powersave powersupersave [root@localhost ~]#

Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Link: https://lore.kernel.org/r/1594972765-10404-1-git-send-email-wangxiongfeng2@huawei.com
Signed-off-by: Xiongfeng Wang 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Sasha Levin

PCI: Fix pci_cfg_wait queue locking problem

2020-08-19T06:26:32+00:00

[ Upstream commit 2a7e32d0547f41c5ce244f84cf5d6ca7fccee7eb ]

The pci_cfg_wait queue is used to prevent user-space config accesses to
devices while they are recovering from reset.

Previously we used these operations on pci_cfg_wait:

  __add_wait_queue(&pci_cfg_wait, ...)
  __remove_wait_queue(&pci_cfg_wait, ...)
  wake_up_all(&pci_cfg_wait)

The wake_up acquires the wait queue lock, but the add and remove do not.

Originally these were all protected by the pci_lock, but cdcb33f98244
("PCI: Avoid possible deadlock on pci_lock and p->pi_lock"), moved
wake_up_all() outside pci_lock, so it could race with add/remove
operations, which caused occasional kernel panics, e.g., during vfio-pci
hotplug/unplug testing:

  Unable to handle kernel read from unreadable memory at virtual address ffff802dac469000

Resolve this by using wait_event() instead of __add_wait_queue() and
__remove_wait_queue().  The wait queue lock is held by both wait_event()
and wake_up_all(), so it provides mutual exclusion.

Fixes: cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and p->pi_lock")
Link: https://lore.kernel.org/linux-pci/79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com/T/#u
Based-on: https://lore.kernel.org/linux-pci/20191210031527.40136-1-zhengxiang9@huawei.com/
Based-on-patch-by: Xiang Zheng 
Signed-off-by: Bjorn Helgaas 
Tested-by: Xiang Zheng 
Cc: Heyi Guo 
Cc: Biaoxiang Ye 
Signed-off-by: Sasha Levin

PCI: tegra: Revert tegra124 raw_violation_fixup

2020-08-11T13:48:11+00:00

commit e7b856dfcec6d3bf028adee8c65342d7035914a1 upstream.

As reported in https://bugzilla.kernel.org/206217 , raw_violation_fixup
is causing more harm than good in some common use-cases.

This patch is a partial revert of commit:

191cd6fb5d2c ("PCI: tegra: Add SW fixup for RAW violations")

and fixes the following regression since then.

* Description:

When both the NIC and MMC are used one can see the following message:

  NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out

and

  pcieport 0000:00:02.0: AER: Uncorrected (Non-Fatal) error received: 0000:01:00.0
  r8169 0000:01:00.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
  r8169 0000:01:00.0: AER:   device [10ec:8168] error status/mask=00004000/00400000
  r8169 0000:01:00.0: AER:    [14] CmpltTO                (First)
  r8169 0000:01:00.0: AER: can't recover (no error_detected callback)
  pcieport 0000:00:02.0: AER: device recovery failed

After that, the ethernet NIC is not functional anymore even after
reloading the r8169 module. After a reboot, this is reproducible by
copying a large file over the NIC to the MMC.

For some reason this is not reproducible when files are copied to a tmpfs.

* Little background on the fixup, by Manikanta Maddireddy:
  "In the internal testing with dGPU on Tegra124, CmplTO is reported by
dGPU. This happened because FIFO queue in AFI(AXI to PCIe) module
get full by upstream posted writes. Back to back upstream writes
interleaved with infrequent reads, triggers RAW violation and CmpltTO.
This is fixed by reducing the posted write credits and by changing
updateFC timer frequency. These settings are fixed after stress test.

In the current case, RTL NIC is also reporting CmplTO. These settings
seems to be aggravating the issue instead of fixing it."

Link: https://lore.kernel.org/r/20200718100710.15398-1-kwizart@gmail.com
Fixes: 191cd6fb5d2c ("PCI: tegra: Add SW fixup for RAW violations")
Signed-off-by: Nicolas Chauvet 
Signed-off-by: Lorenzo Pieralisi 
Reviewed-by: Manikanta Maddireddy 
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Merge tag 'pci-v5.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

2020-07-30T19:01:42+00:00

Pull PCI fix from Bjorn Helgaas:
 "Disable ASPM on ASM1083/1085 PCIe-to-PCI bridge (Robert Hancock)"

* tag 'pci-v5.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/ASPM: Disable ASPM on ASMedia ASM1083/1085 PCIe-to-PCI bridge