linux-stable.git/drivers/xen, branch v6.1.78

xen/gntdev: Fix the abuse of underlying struct page in DMA-buf import

2024-02-05T20:12:58+00:00

[ Upstream commit 2d2db7d40254d5fb53b11ebd703cd1ed0c5de7a1 ]

DO NOT access the underlying struct page of an sg table exported
by DMA-buf in dmabuf_imp_to_refs(), this is not allowed.
Please see drivers/dma-buf/dma-buf.c:mangle_sg_table() for details.

Fortunately, here (for special Xen device) we can avoid using
pages and calculate gfns directly from dma addresses provided by
the sg table.

Suggested-by: Daniel Vetter 
Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Christian König 
Reviewed-by: Stefano Stabellini 
Acked-by: Daniel Vetter 
Link: https://lore.kernel.org/r/20240107103426.2038075-1-olekstysh@gmail.com
Signed-off-by: Juergen Gross 
Signed-off-by: Sasha Levin

xen: simplify evtchn_do_upcall() call maze

2023-12-08T07:51:20+00:00

[ Upstream commit 37510dd566bdbff31a769cde2fa6654bccdb8b24 ]

There are several functions involved for performing the functionality
of evtchn_do_upcall():

- __xen_evtchn_do_upcall() doing the real work
- xen_hvm_evtchn_do_upcall() just being a wrapper for
  __xen_evtchn_do_upcall(), exposed for external callers
- xen_evtchn_do_upcall() calling __xen_evtchn_do_upcall(), too, but
  without any user

Simplify this maze by:

- removing the unused xen_evtchn_do_upcall()
- removing xen_hvm_evtchn_do_upcall() as the only left caller of
  __xen_evtchn_do_upcall(), while renaming __xen_evtchn_do_upcall() to
  xen_evtchn_do_upcall()

Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
Reviewed-by: Thomas Gleixner 
Signed-off-by: Juergen Gross 
Stable-dep-of: db2832309a82 ("x86/xen: fix percpu vcpu_info allocation")
Signed-off-by: Sasha Levin

xen: Allow platform PCI interrupt to be shared

2023-12-08T07:51:20+00:00

[ Upstream commit 3e8cd711c3da6c3d724076048038cd666bdbb2b5 ]

When we don't use the per-CPU vector callback, we ask Xen to deliver event
channel interrupts as INTx on the PCI platform device. As such, it can be
shared with INTx on other PCI devices.

Set IRQF_SHARED, and make it return IRQ_HANDLED or IRQ_NONE according to
whether the evtchn_upcall_pending flag was actually set. Now I can share
the interrupt:

 11:         82          0   IO-APIC  11-fasteoi   xen-platform-pci, ens4

Drop the IRQF_TRIGGER_RISING. It has no effect when the IRQ is shared,
and besides, the only effect it was having even beforehand was to trigger
a debug message in both I/OAPIC and legacy PIC cases:

[    0.915441] genirq: No set_type function for IRQ 11 (IO-APIC)
[    0.951939] genirq: No set_type function for IRQ 11 (XT-PIC)

Signed-off-by: David Woodhouse 
Reviewed-by: Juergen Gross 
Link: https://lore.kernel.org/r/f9a29a68d05668a3636dd09acd94d970269eaec6.camel@infradead.org
Signed-off-by: Juergen Gross 
Stable-dep-of: db2832309a82 ("x86/xen: fix percpu vcpu_info allocation")
Signed-off-by: Sasha Levin

swiotlb-xen: provide the "max_mapping_size" method

2023-12-03T06:32:11+00:00

commit bff2a2d453a1b683378b4508b86b84389f551a00 upstream.

There's a bug that when using the XEN hypervisor with bios with large
multi-page bio vectors on NVMe, the kernel deadlocks [1].

The deadlocks are caused by inability to map a large bio vector -
dma_map_sgtable always returns an error, this gets propagated to the block
layer as BLK_STS_RESOURCE and the block layer retries the request
indefinitely.

XEN uses the swiotlb framework to map discontiguous pages into contiguous
runs that are submitted to the PCIe device. The swiotlb framework has a
limitation on the length of a mapping - this needs to be announced with
the max_mapping_size method to make sure that the hardware drivers do not
create larger mappings.

Without max_mapping_size, the NVMe block driver would create large
mappings that overrun the maximum mapping size.

Reported-by: Marek Marczykowski-Górecki 
Link: https://lore.kernel.org/stable/ZTNH0qtmint%2FzLJZ@mail-itl/ [1]
Tested-by: Marek Marczykowski-Górecki 
Suggested-by: Christoph Hellwig 
Cc: stable@vger.kernel.org
Signed-off-by: Keith Busch 
Signed-off-by: Mikulas Patocka 
Acked-by: Stefano Stabellini 
Reviewed-by: Christoph Hellwig 
Link: https://lore.kernel.org/r/151bef41-e817-aea9-675-a35fdac4ed@redhat.com
Signed-off-by: Juergen Gross 
Signed-off-by: Greg Kroah-Hartman

xen/events: fix delayed eoi list handling

2023-11-28T17:07:05+00:00

[ Upstream commit 47d970204054f859f35a2237baa75c2d84fcf436 ]

When delaying eoi handling of events, the related elements are queued
into the percpu lateeoi list. In case the list isn't empty, the
elements should be sorted by the time when eoi handling is to happen.

Unfortunately a new element will never be queued at the start of the
list, even if it has a handling time lower than all other list
elements.

Fix that by handling that case the same way as for an empty list.

Fixes: e99502f76271 ("xen/events: defer eoi in case of excessive number of events")
Reported-by: Jan Beulich 
Signed-off-by: Juergen Gross 
Reviewed-by: Oleksandr Tyshchenko 
Signed-off-by: Juergen Gross 
Signed-off-by: Sasha Levin

xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled

2023-11-20T10:52:01+00:00

[ Upstream commit 2c269f42d0f382743ab230308b836ffe5ae9b2ae ]

Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
the table is filled. Then it disables INTx just before clearing MASKALL
bit. Currently this approach is rejected by xen-pciback.
According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).

Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
applies to three places:
 - checking currently enabled interrupts type,
 - transition to MSI/MSI-X - where INTx would be implicitly disabled,
 - clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
   enabled, as device should consider INTx disabled anyway in that case

Fixes: 5e29500eba2a ("xen-pciback: Allow setting PCI_MSIX_FLAGS_MASKALL too")
Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Juergen Gross 
Link: https://lore.kernel.org/r/20231016131348.1734721-1-marmarek@invisiblethingslab.com
Signed-off-by: Juergen Gross 
Signed-off-by: Sasha Levin

xenbus: fix error exit in xenbus_init()

2023-11-20T10:52:01+00:00

[ Upstream commit 44961b81a9e9059b5c0443643915386db7035227 ]

In case an error occurs in xenbus_init(), xen_store_domain_type should
be set to XS_UNKNOWN.

Fix one instance where this action is missing.

Fixes: 5b3353949e89 ("xen: add support for initializing xenstore later as HVM domain")
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
Link: https://lore.kernel.org/r/202304200845.w7m4kXZr-lkp@intel.com/
Signed-off-by: Juergen Gross 
Reviewed-by: Oleksandr Tyshchenko 
Link: https://lore.kernel.org/r/20230822091138.4765-1-jgross@suse.com
Signed-off-by: Juergen Gross 
Signed-off-by: Sasha Levin

xen/events: replace evtchn_rwlock with RCU

2023-10-10T20:00:46+00:00

commit 87797fad6cce28ec9be3c13f031776ff4f104cfc upstream.

In unprivileged Xen guests event handling can cause a deadlock with
Xen console handling. The evtchn_rwlock and the hvc_lock are taken in
opposite sequence in __hvc_poll() and in Xen console IRQ handling.
Normally this is no problem, as the evtchn_rwlock is taken as a reader
in both paths, but as soon as an event channel is being closed, the
lock will be taken as a writer, which will cause read_lock() to block:

CPU0                     CPU1                CPU2
(IRQ handling)           (__hvc_poll())      (closing event channel)

read_lock(evtchn_rwlock)
                         spin_lock(hvc_lock)
                                             write_lock(evtchn_rwlock)
                                                 [blocks]
spin_lock(hvc_lock)
    [blocks]
                        read_lock(evtchn_rwlock)
                            [blocks due to writer waiting,
                             and not in_interrupt()]

This issue can be avoided by replacing evtchn_rwlock with RCU in
xen_free_irq(). Note that RCU is used only to delay freeing of the
irq_info memory. There is no RCU based dereferencing or replacement of
pointers involved.

In order to avoid potential races between removing the irq_info
reference and handling of interrupts, set the irq_info pointer to NULL
only when freeing its memory. The IRQ itself must be freed at that
time, too, as otherwise the same IRQ number could be allocated again
before handling of the old instance would have been finished.

This is XSA-441 / CVE-2023-34324.

Fixes: 54c9de89895e ("xen/events: add a new "late EOI" evtchn framework")
Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: Juergen Gross 
Reviewed-by: Julien Grall 
Signed-off-by: Greg Kroah-Hartman

xen: speed up grant-table reclaim

2023-08-03T08:24:14+00:00

commit c04e9894846c663f3278a414f34416e6e45bbe68 upstream.

When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list.  Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first.  However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first.  As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.

To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable.  The default is still
10, but it can be overridden via a module parameter.

This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.

Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour 
Reviewed-by: Juergen Gross 
Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross 
Signed-off-by: Greg Kroah-Hartman

xenbus: check xen_domain in xenbus_probe_initcall

2023-08-03T08:24:05+00:00

[ Upstream commit 0d8f7cc8057890db08c54fe610d8a94af59da082 ]

The same way we already do in xenbus_init.
Fixes the following warning:

[  352.175563] Trying to free already-free IRQ 0
[  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 free_irq+0xbf/0x350
[...]
[  352.213951] Call Trace:
[  352.214390]  
[  352.214717]  ? __warn+0x81/0x170
[  352.215436]  ? free_irq+0xbf/0x350
[  352.215906]  ? report_bug+0x10b/0x200
[  352.216408]  ? prb_read_valid+0x17/0x20
[  352.216926]  ? handle_bug+0x44/0x80
[  352.217409]  ? exc_invalid_op+0x13/0x60
[  352.217932]  ? asm_exc_invalid_op+0x16/0x20
[  352.218497]  ? free_irq+0xbf/0x350
[  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
[  352.219600]  xenbus_probe+0x7a/0x80
[  352.221030]  xenbus_probe_thread+0x76/0xc0

Fixes: 5b3353949e89 ("xen: add support for initializing xenstore later as HVM domain")
Signed-off-by: Stefano Stabellini 
Tested-by: Petr Mladek 
Reviewed-by: Oleksandr Tyshchenko 

Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2307211609140.3118466@ubuntu-linux-20-04-desktop
Signed-off-by: Juergen Gross 
Signed-off-by: Sasha Levin