linux-stable.git/kernel/irq/affinity.c, branch v5.0.4

genirq/affinity: Add is_managed to struct irq_affinity_desc

2018-12-19T10:32:08+00:00

Devices which use managed interrupts usually have two classes of
interrupts:

  - Interrupts for multiple device queues
  - Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

 default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled. That limitation was reported by Kashyap and Sumit.

Expand struct irq_affinity_desc with a new bit 'is_managed' which is set
for truly managed interrupts (queue interrupts) and cleared for the general
device interrupts.

[ tglx: Simplify code and massage changelog ]

Reported-by: Kashyap Desai 
Reported-by: Sumit Saxena 
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: linux-pci@vger.kernel.org
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: bhelgaas@google.com
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-3-douliyangs@gmail.com

genirq/core: Introduce struct irq_affinity_desc

2018-12-19T10:32:08+00:00

The interrupt affinity management uses straight cpumask pointers to convey
the automatically assigned affinity masks for managed interrupts. The core
interrupt descriptor allocation also decides based on the pointer being non
NULL whether an interrupt is managed or not.

Devices which use managed interrupts usually have two classes of
interrupts:

  - Interrupts for multiple device queues
  - Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

 default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled.

To remedy that situation it's required to convey more information than the
cpumasks through various interfaces related to interrupt descriptor
allocation.

Instead of adding yet another argument, create a new data structure
'irq_affinity_desc' which for now just contains the cpumask. This struct
can be expanded to convey auxilliary information in the next step.

No functional change, just preparatory work.

[ tglx: Simplified logic and clarified changelog ]

Suggested-by: Thomas Gleixner 
Suggested-by: Bjorn Helgaas 
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: linux-pci@vger.kernel.org
Cc: kashyap.desai@broadcom.com
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-2-douliyangs@gmail.com

genirq/affinity: Remove excess indentation

2018-12-19T10:32:07+00:00

Plus other coding style issues which stood out while staring at that code.

Signed-off-by: Thomas Gleixner

genirq/affinity: Add support for allocating interrupt sets

2018-11-05T11:16:27+00:00

A driver may have a need to allocate multiple sets of MSI/MSI-X interrupts,
and have them appropriately affinitized.

Add support for defining a number of sets in the irq_affinity structure, of
varying sizes, and get each set affinitized correctly across the machine.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Jens Axboe 
Signed-off-by: Ming Lei 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Ming Lei 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
Cc: linux-block@vger.kernel.org
Link: https://lkml.kernel.org/r/20181102145951.31979-5-ming.lei@redhat.com

genirq/affinity: Pass first vector to __irq_build_affinity_masks()

2018-11-05T11:16:26+00:00

No functional change.

Prepares for support of allocating and affinitizing sets of interrupts, in
which each set of interrupts needs a full two stage spreading. The first
vector argument is necessary for this so the affinitizing starts from the
first vector of each set.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei 
Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke 
Cc: Keith Busch 
Cc: Sagi Grimberg 
Link: https://lkml.kernel.org/r/20181102145951.31979-4-ming.lei@redhat.com

genirq/affinity: Move two stage affinity spreading into a helper function

2018-11-05T11:16:26+00:00

No functional change. Prepares for supporting allocating and affinitizing
interrupt sets.

[ tglx: Minor changelog tweaks ]

Signed-off-by: Ming Lei 
Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke 
Cc: Keith Busch 
Cc: Sagi Grimberg 
Link: https://lkml.kernel.org/r/20181102145951.31979-3-ming.lei@redhat.com

genirq/affinity: Spread IRQs to all available NUMA nodes

2018-11-05T11:16:26+00:00

If the number of NUMA nodes exceeds the number of MSI/MSI-X interrupts
which are allocated for a device, the interrupt affinity spreading code
fails to spread them across all nodes.

The reason is, that the spreading code starts from node 0 and continues up
to the number of interrupts requested for allocation. This leaves the nodes
past the last interrupt unused.

This results in interrupt concentration on the first nodes which violates
the assumption of the block layer that all nodes are covered evenly. As a
consequence the NUMA nodes above the number of interrupts are all assigned
to hardware queue 0 and therefore NUMA node 0, which results in bad
performance and has CPU hotplug implications, because queue 0 gets shut
down when the last CPU of node 0 is offlined.

Go over all NUMA nodes and assign them round-robin to all requested
interrupts to solve this.

[ tglx: Massaged changelog ]

Signed-off-by: Long Li 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ming Lei 
Cc: Michael Kelley 
Link: https://lkml.kernel.org/r/20181102180248.13583-1-longli@linuxonhyperv.com

genirq/affinity: Spread irq vectors among present CPUs as far as possible

2018-04-06T10:19:51+00:00

Commit 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
tried to spread the interrupts accross all possible CPUs to make sure that
in case of phsyical hotplug (e.g. virtualization) the CPUs which get
plugged in after the device was initialized are targeted by a hardware
queue and the corresponding interrupt.

This has a downside in cases where the ACPI tables claim that there are
more possible CPUs than present CPUs and the number of interrupts to spread
out is smaller than the number of possible CPUs. These bogus ACPI tables
are unfortunately not uncommon.

In such a case the vector spreading algorithm assigns interrupts to CPUs
which can never be utilized and as a consequence these interrupts are
unused instead of being mapped to present CPUs. As a result the performance
of the device is suboptimal.

To fix this spread the interrupt vectors in two stages:

 1) Spread as many interrupts as possible among the present CPUs

 2) Spread the remaining vectors among non present CPUs

On a 8 core system, where CPU 0-3 are present and CPU 4-7 are not present,
for a device with 4 queues the resulting interrupt affinity is:

  1) Before 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
	irq 39, cpu list 0
	irq 40, cpu list 1
	irq 41, cpu list 2
	irq 42, cpu list 3

  2) With 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
	irq 39, cpu list 0-2
	irq 40, cpu list 3-4,6
	irq 41, cpu list 5
	irq 42, cpu list 7

  3) With the refined vector spread applied:
	irq 39, cpu list 0,4
	irq 40, cpu list 1,6
	irq 41, cpu list 2,5
	irq 42, cpu list 3,7

On a 8 core system, where all CPUs are present the resulting interrupt
affinity for the 4 queues is:

	irq 39, cpu list 0,1
	irq 40, cpu list 2,3
	irq 41, cpu list 4,5
	irq 42, cpu list 6,7

This is independent of the number of CPUs which are online at the point of
initialization because in such a system the offline CPUs can be easily
onlined afterwards, while in non-present CPUs need to be plugged physically
or virtually which requires external interaction.

The downside of this approach is that in case of physical hotplug the
interrupt vector spreading might be suboptimal when CPUs 4-7 are physically
plugged. Suboptimal from a NUMA point of view and due to the single target
nature of interrupt affinities the later plugged CPUs might not be targeted
by interrupts at all.

Though, physical hotplug systems are not the common case while the broken
ACPI table disease is wide spread. So it's preferred to have as many
interrupts as possible utilized at the point where the device is
initialized.

Block multi-queue devices like NVME create a hardware queue per possible
CPU, so the goal of commit 84676c1f21 to assign one interrupt vector per
possible CPU is still achieved even with physical/virtual hotplug.

[ tglx: Changed from online to present CPUs for the first spreading stage,
  	renamed variables for readability sake, added comments and massaged
  	changelog ]

Reported-by: Laurence Oberman 
Signed-off-by: Ming Lei 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Christoph Hellwig 
Cc: Jens Axboe 
Cc: linux-block@vger.kernel.org
Cc: Christoph Hellwig 
Link: https://lkml.kernel.org/r/20180308105358.1506-5-ming.lei@redhat.com

genirq/affinity: Allow irq spreading from a given starting point

2018-04-06T10:19:51+00:00

To support two stage irq vector spreading, it's required to add a starting
point to the spreading function. No functional change, just preparatory
work for the actual two stage change.

[ tglx: Renamed variables, tidied up the code and massaged changelog ]

Signed-off-by: Ming Lei 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Christoph Hellwig 
Cc: Jens Axboe 
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman 
Cc: Christoph Hellwig 
Link: https://lkml.kernel.org/r/20180308105358.1506-4-ming.lei@redhat.com

genirq/affinity: Move actual irq vector spreading into a helper function

2018-04-06T10:19:51+00:00

No functional change, just prepare for converting to 2-stage irq vector
spreading.

Signed-off-by: Ming Lei 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Christoph Hellwig 
Cc: Jens Axboe 
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman 
Cc: Christoph Hellwig 
Link: https://lkml.kernel.org/r/20180308105358.1506-3-ming.lei@redhat.com