linux-stable.git/drivers/pci, branch v3.2.99

PCI/AER: Report non-fatal errors only to the affected endpoint

2018-02-13T18:32:08+00:00

commit 86acc790717fb60fb51ea3095084e331d8711c74 upstream.

Previously, if an non-fatal error was reported by an endpoint, we
called report_error_detected() for the endpoint, every sibling on the
bus, and their descendents.  If any of them did not implement the
.error_detected() method, do_recovery() failed, leaving all these
devices unrecovered.

For example, the system described in the bugzilla below has two devices:

  0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected()
  0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected()

When a device such as 74:02.0 reported a non-fatal error, do_recovery()
failed because 74:03.0 lacked an .error_detected() method.  But per PCIe
r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and
does not affect 74:03.0:

  Non-fatal errors are uncorrectable errors which cause a particular
  transaction to be unreliable but the Link is otherwise fully functional.
  Isolating Non-fatal from Fatal errors provides Requester/Receiver logic
  in a device or system management software the opportunity to recover from
  the error without resetting the components on the Link and disturbing
  other transactions in progress.  Devices not associated with the
  transaction in error are not impacted by the error.

Report non-fatal errors only to the endpoint that reported them.  We really
want to check for AER_NONFATAL here, but the current code structure doesn't
allow that.  Looking for pci_channel_io_normal is the best we can do now.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Signed-off-by: Gabriele Paoloni 
Signed-off-by: Dongdong Liu 
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Ben Hutchings

PCI: shpchp: Enable bridge bus mastering if MSI is enabled

2017-11-26T13:51:01+00:00

commit 48b79a14505349a29b3e20f03619ada9b33c4b17 upstream.

An SHPC may generate MSIs to notify software about slot or controller
events (SHPC spec r1.0, sec 4.7).  A PCI device can only generate an MSI if
it has bus mastering enabled.

Enable bus mastering if the bridge contains an SHPC that uses MSI for event
notifications.

Signed-off-by: Aleksandr Bezzubikov 
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Marcel Apfelbaum 
Acked-by: Michael S. Tsirkin 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

PCI/PM: Restore the status of PCI devices across hibernation

2017-10-12T14:27:13+00:00

commit e60514bd4485c0c7c5a7cf779b200ce0b95c70d6 upstream.

Currently we saw a lot of "No irq handler" errors during hibernation, which
caused the system hang finally:

  ata4.00: qc timeout (cmd 0xec)
  ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  ata4.00: revalidation failed (errno=-5)
  ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
  do_IRQ: 31.151 No irq handler for vector

According to above logs, there is an interrupt triggered and it is
dispatched to CPU31 with a vector number 151, but there is no handler for
it, thus this IRQ will not get acked and will cause an IRQ flood which
kills the system.  To be more specific, the 31.151 is an interrupt from the
AHCI host controller.

After some investigation, the reason why this issue is triggered is because
the thaw_noirq() function does not restore the MSI/MSI-X settings across
hibernation.

The scenario is illustrated below:

  1. Before hibernation, IRQ 34 is the handler for the AHCI device, which
     is bound to CPU31.

  2. Hibernation starts, the AHCI device is put into low power state.

  3. All the nonboot CPUs are put offline, so IRQ 34 has to be migrated to
     the last alive one - CPU0.

  4. After the snapshot has been created, all the nonboot CPUs are brought
     up again; IRQ 34 remains bound to CPU0.

  5. AHCI devices are put into D0.

  6. The snapshot is written to the disk.

The issue is triggered in step 6.  The AHCI interrupt should be delivered
to CPU0, however it is delivered to the original CPU31 instead, which
causes the "No irq handler" issue.

Ying Huang has provided a clue that, in step 3 it is possible that writing
to the register might not take effect as the PCI devices have been
suspended.

In step 3, the IRQ 34 affinity should be modified from CPU31 to CPU0, but
in fact it is not.  In __pci_write_msi_msg(), if the device is already in
low power state, the low level MSI message entry will not be updated but
cached.  During the device restore process after a normal suspend/resume,
pci_restore_msi_state() writes the cached MSI back to the hardware.

But this is not the case for hibernation.  pci_restore_msi_state() is not
currently called in pci_pm_thaw_noirq(), although pci_save_state() has
saved the necessary PCI cached information in pci_pm_freeze_noirq().

Restore the PCI status for the device during hibernation.  Otherwise the
status might be lost across hibernation (for example, settings for MSI,
MSI-X, ATS, ACS, IOV, etc.), which might cause problems during hibernation.

Suggested-by: Ying Huang 
Suggested-by: Rafael J. Wysocki 
Signed-off-by: Chen Yu 
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Rafael J. Wysocki 
Cc: Len Brown 
Cc: Dan Williams 
Cc: Rui Zhang 
Cc: Ying Huang 
Signed-off-by: Ben Hutchings

PCI: Correct PCI_STD_RESOURCE_END usage

2017-10-12T14:27:10+00:00

commit 2f686f1d9beee135de6d08caea707ec7bfc916d4 upstream.

PCI_STD_RESOURCE_END is (confusingly) the index of the last valid BAR, not
the *number* of BARs.  To iterate through all possible BARs, we need to
include PCI_STD_RESOURCE_END.

Fixes: 9fe373f9997b ("PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size")
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Ben Hutchings

PCI: Disable boot interrupt quirk for ASUS M2N-LR

2017-08-26T01:14:03+00:00

commit c4e649b09f55595e6df6da5465a5b3cfc93557c1 upstream.

The ASUS M2N-LR should not trigger boot interrupt quirks although it
carries an Intel 6702PXH.  On this board the boot interrupt quirks cause
incorrect IRQ assignments and should be disabled.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43074
Tested-by: Solomon Peachy 
Signed-off-by: Stefan Assmann 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Ben Hutchings

PCI: Freeze PME scan before suspending devices

2017-08-26T01:14:01+00:00

commit ea00353f36b64375518662a8ad15e39218a1f324 upstream.

Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
crashes during suspend tests.  Geert Uytterhoeven managed to reproduce the
issue on an M2-W Koelsch board (r8a7791):

  It occurs when the PME scan runs, once per second.  During PME scan, the
  PCI host bridge (rcar-pci) registers are accessed while its module clock
  has already been disabled, leading to the crash.

One reproducer is to configure s2ram to use "s2idle" instead of "deep"
suspend:

  # echo 0 > /sys/module/printk/parameters/console_suspend
  # echo s2idle > /sys/power/mem_sleep
  # echo mem > /sys/power/state

Another reproducer is to write either "platform" or "processors" to
/sys/power/pm_test.  It does not (or is less likely) to happen during full
system suspend ("core" or "none") because system suspend also disables
timers, and thus the workqueue handling PME scans no longer runs.  Geert
believes the issue may still happen in the small window between disabling
module clocks and disabling timers:

  # echo 0 > /sys/module/printk/parameters/console_suspend
  # echo platform > /sys/power/pm_test    # Or "processors"
  # echo mem > /sys/power/state

(Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)

Rafael Wysocki agrees that PME scans should be suspended before the host
bridge registers become inaccessible.  To that end, queue the task on a
workqueue that gets frozen before devices suspend.

Rafael notes however that as a result, some wakeup events may be missed if
they are delivered via PME from a device without working IRQ (which hence
must be polled) and occur after the workqueue has been frozen.  If that
turns out to be an issue in practice, it may be possible to solve it by
calling pci_pme_list_scan() once directly from one of the host bridge's
pm_ops callbacks.

Stacktrace for posterity:

  PM: Syncing filesystems ... [   38.566237] done.
  PM: Preparing system for sleep (mem)
  Freezing user space processes ... [   38.579813] (elapsed 0.001 seconds) done.
  Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
  PM: Suspending system (mem)
  PM: suspend of devices complete after 152.456 msecs
  PM: late suspend of devices complete after 2.809 msecs
  PM: noirq suspend of devices complete after 29.863 msecs
  suspend debug: Waiting for 5 second(s).
  Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
  pgd = c0003000
  [00000000] *pgd=80000040004003, *pmd=00000000
  Internal error: : 1211 [#1] SMP ARM
  Modules linked in:
  CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted
  4.9.0-rc1-koelsch-00011-g68db9bc814362e7f #3383
  Hardware name: Generic R8A7791 (Flattened Device Tree)
  Workqueue: events pci_pme_list_scan
  task: eb56e140 task.stack: eb58e000
  PC is at pci_generic_config_read+0x64/0x6c
  LR is at rcar_pci_cfg_base+0x64/0x84
  pc : []    lr : []    psr: 600d0093
  sp : eb58fe98  ip : c041d750  fp : 00000008
  r10: c0e2283c  r9 : 00000000  r8 : 600d0013
  r7 : 00000008  r6 : eb58fed6  r5 : 00000002  r4 : eb58feb4
  r3 : 00000000  r2 : 00000044  r1 : 00000008  r0 : 00000000
  Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5387d  Table: 6a9f6c80  DAC: 55555555
  Process kworker/1:1 (pid: 20, stack limit = 0xeb58e210)
  Stack: (0xeb58fe98 to 0xeb590000)
  fe80:                                                       00000002 00000044
  fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 eb78a000 eb78a000
  fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 00000000 eb78a000 c0e22830
  fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 eb55f380 c0e22838 eb9a9800 c0235fbc
  ff00: eb55f380 c0e22838 eb55f380 eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100
  ff20: eb55f398 c02366c4 eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000
  ff40: 00000000 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380
  ff60: 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 00000000
  ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac eb5631c0
  ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 00000000 00000000
  ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 3a81336c 10ccd1dd
  [] (pci_generic_config_read) from []
  (pci_bus_read_config_word+0x58/0x80)
  [] (pci_bus_read_config_word) from []
  (pci_check_pme_status+0x34/0x78)
  [] (pci_check_pme_status) from [] (pci_pme_wakeup+0x28/0x54)
  [] (pci_pme_wakeup) from [] (pci_pme_list_scan+0x58/0xb4)
  [] (pci_pme_list_scan) from []
  (process_one_work+0x1bc/0x308)
  [] (process_one_work) from [] (worker_thread+0x2a8/0x3e0)
  [] (worker_thread) from [] (kthread+0xe4/0xfc)
  [] (kthread) from [] (ret_from_fork+0x14/0x2c)
  Code: ea000000 e5903000 f57ff04f e3a00000 (e5843000)
  ---[ end trace 667d43ba3aa9e589 ]---

Fixes: df17e62e5bff ("PCI: Add support for polling PME state on suspended legacy PCI devices")
Reported-and-tested-by: Laurent Pinchart 
Reported-and-tested-by: Geert Uytterhoeven 
Signed-off-by: Lukas Wunner 
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Laurent Pinchart 
Acked-by: Rafael J. Wysocki 
Cc: Mika Westerberg 
Cc: Niklas Söderlund 
Cc: Simon Horman 
Cc: Yinghai Lu 
Cc: Matthew Garrett 
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings

PCI: Only allow WC mmap on prefetchable resources

2017-08-26T01:14:00+00:00

commit cef4d02305a06be581bb7f4353446717a1b319ec upstream.

The /proc/bus/pci mmap interface allows the user to specify whether they
want WC or not.  Don't let them do so on non-prefetchable BARs.

Signed-off-by: David Woodhouse 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Ben Hutchings

PCI: Fix another sanity check bug in /proc/pci mmap

2017-08-26T01:14:00+00:00

commit 17caf56731311c9596e7d38a70c88fcb6afa6a1b upstream.

Don't match MMIO maps with I/O BARs and vice versa.

Signed-off-by: David Woodhouse 
Signed-off-by: Bjorn Helgaas 
Signed-off-by: Ben Hutchings

PCI: Ignore write combining when mapping I/O port space

2017-08-26T01:14:00+00:00

commit 3a92c319c44a7bcee9f48dff9d97d001943b54c6 upstream.

PCI exposes files like /proc/bus/pci/00/00.0 in procfs.  These files
support operations like this:

  ioctl(fd, PCIIOC_MMAP_IS_IO);           # request I/O port space
  ioctl(fd, PCIIOC_WRITE_COMBINE, 1);     # request write-combining
  mmap(fd, ...)

Write combining is useful on PCI memory space, but I don't think it makes
sense on PCI I/O port space.

We *could* change proc_bus_pci_ioctl() to make it impossible to set
mmap_state == pci_mmap_io and write_combine at the same time, but that
would break the following sequence, which is currently legal:

  mmap(fd, ...)                           # default is I/O, non-combining
  ioctl(fd, PCIIOC_WRITE_COMBINE, 1);     # request write-combining
  ioctl(fd, PCIIOC_MMAP_IS_MEM);          # request memory space
  mmap(fd, ...)                           # get write-combining mapping

Ignore the write-combining flag when mapping I/O port space.

This patch should have no functional effect, based on this analysis of all
implementations of pci_mmap_page_range():

  - ia64 mips parisc sh unicore32 x86 do not support mapping of I/O port
    space at all.

  - arm cris microblaze mn10300 sparc xtensa support mapping of I/O port
    space, but ignore the write_combine argument to pci_mmap_page_range().

  - powerpc supports mapping of I/O port space and uses write_combine, and
    it disables write combining for I/O port space in
    __pci_mmap_set_pgprot().

This patch makes it possible to remove __pci_mmap_set_pgprot() from
powerpc, which simplifies that path.

Signed-off-by: Bjorn Helgaas 
Signed-off-by: Ben Hutchings

PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms

2017-08-26T01:14:00+00:00

commit 6bccc7f426abd640f08d8c75fb22f99483f201b4 upstream.

In the PCI_MMAP_PROCFS case when the address being passed by the user is a
'user visible' resource address based on the bus window, and not the actual
contents of the resource, that's what we need to be checking it against.

Signed-off-by: David Woodhouse 
Signed-off-by: Bjorn Helgaas 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings