| Age | Commit message (Collapse) | Author |
|
struct vfio_cdx_device carries three fields that track whether MSI has
been configured: vdev->cdx_irqs (the allocated vector array), vdev->
msi_count (the array length), and vdev->config_msi (a boolean flag).
The three are set together when vfio_cdx_msi_enable() succeeds and
cleared together by vfio_cdx_msi_disable(). However, the error paths
in vfio_cdx_msi_enable() free the cdx_irqs allocation on failure
without resetting the pointer, leaving it stale and skewed from the
other two fields until the next enable call overwrites it.
Clear vdev->cdx_irqs to NULL alongside the kfree() in both error paths
so the pointer consistently reflects the configured state. With that
invariant restored and access to the MSI state serialized by
cdx_irqs_lock, vdev->config_msi is fully redundant with
(vdev->cdx_irqs != NULL). Drop the config_msi field and switch all
readers to test cdx_irqs directly.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
vfio_cdx_set_msi_trigger() reads vdev->config_msi and operates on the
vdev->cdx_irqs array based on its value, but provides no serialization
against concurrent VFIO_DEVICE_SET_IRQS ioctls. Two callers can race
such that one observes config_msi as set while another clears it and
frees cdx_irqs via vfio_cdx_msi_disable(), resulting in a use-after-free
of the cdx_irqs array.
Add a cdx_irqs_lock mutex to struct vfio_cdx_device and acquire it in
vfio_cdx_set_msi_trigger(), which is the single chokepoint through
which all updates to config_msi, cdx_irqs, and msi_count flow, covering
both the ioctl path and the close-device cleanup path. This keeps the
test of config_msi atomic with the subsequent enable, disable, or
trigger operations.
Drop the pre-call !cdx_irqs test from vfio_cdx_irqs_cleanup() as part
of this change: the optimization it provided is redundant with the
!config_msi early-return inside vfio_cdx_msi_disable(), and leaving the
test in place would be an unsynchronized read of state the new lock is
meant to protect.
Fixes: 848e447e000c ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Add validation to ensure MSI is configured before accessing cdx_irqs
array in vfio_cdx_set_msi_trigger(). Without this check, userspace
can trigger a NULL pointer dereference by calling VFIO_DEVICE_SET_IRQS
with VFIO_IRQ_SET_DATA_BOOL or VFIO_IRQ_SET_DATA_NONE flags before
ever setting up interrupts via VFIO_IRQ_SET_DATA_EVENTFD.
The vfio_cdx_msi_enable() function allocates the cdx_irqs array and
sets config_msi to 1 only when called through the EVENTFD path. The
trigger loop (for DATA_BOOL/DATA_NONE) assumed this had already been
done, but there was no enforcement of this call ordering.
This matches the protection used in the PCI VFIO driver where
vfio_pci_set_msi_trigger() checks irq_is() before the trigger loop.
Fixes: 848e447e000c ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Acked-by: Nipun Gupta <nipun.gupta@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Replace vfio->device_class with a const struct class and drop
the class_create() call.
Compile tested with both CONFIG_VFIO_DEVICE_CDEV on and off (and
CONFIG_VFIO on); found no errors/warns in dmesg.
Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
[Remove unused vfio_cdev_init() args]
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Link: https://lore.kernel.org/r/20260417152814.18026-1-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Convert the bar_mutex acquisition in virtiovf_issue_legacy_rw_cmd()
to use guard(), eliminating the out label and goto-based error paths
in favor of direct returns.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-5-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Convert migf->lock acquisitions in virtiovf_disable_fd() and
virtiovf_save_read() to use guard(). In virtiovf_save_read() this
eliminates the out_unlock label and multiple goto paths by allowing
direct returns, and removes the need for the done variable to double
as an error carrier.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Convert list_lock mutex acquisitions to use guard() and scoped_guard()
where the lock scope aligns with the function or block scope. This
simplifies virtiovf_get_data_buff_from_pos() by replacing goto-based
unwinding with direct returns.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
The list_lock spinlock with IRQ disabling was copied from the mlx5
vfio-pci variant driver, where it is justified by a hardirq async
command completion callback that accesses the protected lists. The
virtio driver has no such interrupt context usage; all list_lock
acquisitions occur in process context via file read/write operations
or state transitions under state_mutex.
Convert list_lock to a mutex to be consistent with peer vfio-pci
variant drivers (hisilicon, pds, qat, xe) which all use mutexes for
equivalent migration data protection. This also fixes a mismatched
spin_lock()/spin_unlock_irq() pair in virtiovf_read_device_context_chunk()
that could incorrectly enable interrupts.
Reported-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Closes: https://lore.kernel.org/all/20260413073603.30538-1-guojinhui.liam@bytedance.com
Fixes: 0bbc82e4ec79 ("vfio/virtio: Add support for the basic live migration functionality")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
On device shutdown, make vfio_pci_core_close_device() call
vfio_pci_dma_buf_cleanup() before the function is disabled via
vfio_pci_core_disable(). This ensures that all access via DMABUFs is
revoked before the function's BARs become inaccessible.
This fixes an issue where, if the function is disabled first, a tiny
window exists in which the function's MSE is cleared and yet BARs
could still be accessed via the DMABUF. The resources would also be
freed and up for grabs by a different driver.
Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO regions")
Signed-off-by: Matt Evans <mattev@meta.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20260415181752.1027604-1-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Pull VFIO updates from Alex Williamson:
- Update QAT vfio-pci variant driver for Gen 5, 420xx devices (Vijay
Sundar Selvamani, Suman Kumar Chakraborty, Giovanni Cabiddu)
- Fix vfio selftest MMIO DMA mapping selftest (Alex Mastro)
- Conversions to const struct class in support of class_create()
deprecation (Jori Koolstra)
- Improve selftest compiler compatibility by avoiding initializer on
variable-length array (Manish Honap)
- Define new uAPI for drivers supporting migration to advise user-
space of new initial data for reducing target startup latency.
Implemented for mlx5 vfio-pci variant driver (Yishai Hadas)
- Enable vfio selftests on aarch64, not just cross-compiles reporting
arm64 (Ted Logan)
- Update vfio selftest driver support to include additional DSA devices
(Yi Lai)
- Unconditionally include debugfs root pointer in vfio device struct,
avoiding a build failure seen in hisi_acc variant driver without
debugfs otherwise (Arnd Bergmann)
- Add support for the s390 ISM (Internal Shared Memory) device via a
new variant driver. The device is unique in the size of its BAR space
(256TiB) and lack of mmap support (Julian Ruess)
- Enforce that vfio-pci drivers implement a name in their ops structure
for use in sequestering SR-IOV VFs (Alex Williamson)
- Prune leftover group notifier code (Paolo Bonzini)
- Fix Xe vfio-pci variant driver to avoid migration support as a
dependency in the reset path and missing release call (Michał
Winiarski)
* tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio: (23 commits)
vfio/xe: Add a missing vfio_pci_core_release_dev()
vfio/xe: Reorganize the init to decouple migration from reset
vfio: remove dead notifier code
vfio/pci: Require vfio_device_ops.name
MAINTAINERS: add VFIO ISM PCI DRIVER section
vfio/ism: Implement vfio_pci driver for ISM devices
vfio/pci: Rename vfio_config_do_rw() to vfio_pci_config_rw_single() and export it
vfio: unhide vdev->debug_root
vfio/qat: add support for Intel QAT 420xx VFs
vfio: selftests: Support DMR and GNR-D DSA devices
vfio: selftests: Build tests on aarch64
vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
vfio/mlx5: consider inflight SAVE during PRE_COPY
net/mlx5: Add IFC bits for migration state
vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set()
vfio: uapi: fix comment typo
vfio: mdev: replace mtty_dev->vd_class with a const struct class
...
|
|
Pull drm updates from Dave Airlie:
"Highlights:
- new DRM RAS infrastructure using netlink
- amdgpu: enable DC on CIK APUs, and more IP enablement, and more
user queue work
- xe: purgeable BO support, and new hw enablement
- dma-buf : add revocable operations
Full summary:
mm:
- two-pass MMU interval notifiers
- add gpu active/reclaim per-node stat counters
math:
- provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI
- implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST()
rust:
- shared tag with driver-core: register macro and io infra
- core: rework DMA coherent API
- core: add interop::list to interop with C linked lists
- core: add more num::Bounded operations
- core: enable generic_arg_infer and add EMSGSIZE
- workqueue: add ARef<T> support for work and delayed work
- add GPU buddy allocator abstraction
- add DRM shmem GEM helper abstraction
- allow drm:::Device to dispatch work and delayed work items
to driver private data
- add dma_resv_lock helper and raw accessors
core:
- introduce DRM RAS infrastructure over netlink
- add connector panel_type property
- fourcc: add ARM interleaved 64k modifier
- colorop: add destroy helper
- suballoc: split into alloc and init helpers
- mode: provide DRM_ARGB_GET*() macros for reading color components
edid:
- provide drm_output_color_Format
dma-buf:
- provide revoke mechanism for shared buffers
- rename move_notify to invalidate_mappings
- always enable move_notify
- protect dma_fence_ops with RCU and improve locking
- clean pages with helpers
atomic:
- allocate drm_private_state via callback
- helper: use system_percpu_wq
buddy:
- make buddy allocator available to gpu level
- add kernel-doc for buddy allocator
- improve aligned allocation
ttm:
- fix fence signalling
- improve tests and docs
- improve handling of gfp_retry_mayfail
- use per-node stat counters to track memory allocations
- port pool to use list_lru
- drop NUMA specific pools
- make pool shrinker numa aware
- track allocated pages per numa node
coreboot:
- cleanup coreboot framebuffer support
sched:
- fix race condition in drm_sched_fini
pagemap:
- enable THP support
- pass pagemap_addr by reference
gem-shmem:
- Track page accessed/dirty status across mmap/vmap
gpusvm:
- reenable device to device migration
- fix unbalanced unclock
bridge:
- anx7625: Support USB-C plus DT bindings
- connector: Fix EDID detection
- dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve
others
- fsl-ldb: Fix visual artifacts plus related DT property
'enable-termination-resistor'
- imx8qxp-pixel-link: Improve bridge reference handling
- lt9611: Support Port-B-only input plus DT bindings
- tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up
- Support TH1520 HDMI plus DT bindings
- waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus
DT bindings
- anx7625: Fix USB Type-C handling
- cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check
- Support Lontium LT8713SX DP MST bridge plus DT bindings
- analogix_dp: Use DP helpers for link training
panel:
- panel-jdi-lt070me05000: Use mipi-dsi multi functions
- panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN
N116BCL-EAK (C2); Support FriendlyELEC plus DT changes
- panel-edp: Fix timings for BOE NV140WUM-N64
- ilitek-ili9882t: Allow GPIO calls to sleep
- jadard: Support TAIGUAN XTI05101-01A
- lxd: Support LXD M9189A plus DT bindings
- mantix: Fix pixel clock; Clean up
- motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings
- novatek: Support Novatek/Tianma NT37700F plus DT bindings
- simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip
PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3"
- novatek-nt36672a: Use mipi_dsi_*_multi() functions
- panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW
MNF307QS3-2
- support Himax HX83121A plus DT bindings
- support JuTouch JT070TM041 plus DT bindings
- support Samsung S6E8FC0 plus DT bindings
- himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support
backlight
- ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings
- simple: support Tianma TM050RDH03 plus DT bindings
amdgpu:
- enable DC by default on CIK APUs
- userq fence ioctl param size fixes
- set panel_type to OLED for eDP
- refactor DC i2c code
- FAMS2 update
- rework ttm handling to allow multiple engines
- DC DCE 6.x cleanup
- DC support for NUTMEG/TRAVIS DP bridge
- DCN 4.2 support
- GC12 idle power fix for compute
- use struct drm_edid in non-DC code
- enable NV12/P010 support on primary planes
- support newer IP discovery tables
- VCN/JPEG 5.0.2 support
- GC/MES 12.1 updates
- USERQ fixes
- add DC idle state manager
- eDP DSC seamless boot
amdkfd:
- GC 12.1 updates
- non 4K page fixes
xe:
- basic Xe3p_LPG and NVL-P enabling patches
- allow VM_BIND decompress support
- add purgeable buffer object support
- add xe_vm_get_property_ioctl
- restrict multi-lrc to VCS/VECS engines
- allow disabling VM overcommit in fault mode
- dGPU memory optimizations
- Workaround cleanups and simplification
- Allow VFs VRAM quote changes using sysfs
- convert GT stats to per-cpu counters
- pagefault refactors
- enable multi-queue on xe3p_xpc
- disable DCC on PTL
- make MMIO communication more robust
- disable D3Cold for BMG on specific platforms
- vfio: improve FLR sync for Xe VFIO
i915/display:
- C10/C20/LT PHY PLL divider verification
- use trans push mechanism to generate PSR frame change on LNL+
- refactor DP DSC slice config
- VGA decode refactoring
- refactor DPT, gen2-4 overlay, masked field register macro helpers
- refactor stolen memory allocation decisions
- prepare for UHBR DP tunnels
- refactor LT PHY PLL to use DPLL framework
- implement register polling/waiting in display code
- add shared stepping header between i915 and display
i915:
- fix potential overflow of shmem scatterlist length
nouveau:
- provide Z cull info to userspace
- initial GA100 support
- shutdown on PCI device shutdown
nova-core:
- harden GSP command queue
- add support for large RPCs
- simplify GSP sequencer and message handling
- refactor falcon firmware handling
- convert to new register macro
- conver to new DMA coherent API
- use checked arithmetic
- add debugfs support for gsp-rm log buffers
- fix aux device registration for multi-GPU
msm:
- CI:
- Uprev mesa
- Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices
- Core:
- Switched to of_get_available_child_by_name()
- DPU:
- Fixes for DSC panels
- Fixed brownout because of the frequency / OPP mismatch
- Quad pipe preparation (not enabled yet)
- Switched to virtual planes by default
- Dropped VBIF_NRT support
- Added support for Eliza platform
- Reworked alpha handling
- Switched to correct CWB definitions on Eliza
- Dropped dummy INTF_0 on MSM8953
- Corrected INTFs related to DP-MST
- DP:
- Removed debug prints looking into PHY internals
- DSI:
- Fixes for DSC panels
- RGB101010 support
- Support for SC8280XP
- Moved PHY bindings from display/ to phy/
- GPU:
- Preemption support for x2-85 and a840
- IFPC support for a840
- SKU detection support for x2-85 and a840
- Expose AQE support (VK ray-pipeline)
- Avoid locking in VM_BIND fence signaling path
- Fix to avoid reclaim in GPU snapshot path
- Disallow foreign mapping of _NO_SHARE BOs
- HDMI:
- Fixed infoframes programming
- MDP5:
- Dropped support for MSM8974v1
- Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998
panthor:
- add tracepoints for power and IRQs
- fix fence handling
- extend timestamp query with flags
- support various sources for timestamp queries
tyr:
- fix names and model/versions
rockchip:
- vop2: use drm logging function
- rk3576 displayport support
- support CRTC background color
atmel-hlcdc:
- support sana5d65 LCD controller
tilcdc:
- use DT bindings schema
- use managed DRM interfaces
- support DRM_BRIDGE_ATTACH_NO_CONNECTOR
verisilicon:
- support DC8200 + DT bindings
virtgpu:
- support PRIME import with 3D enabled
komeda:
- fix integer overflow in AFBC checks
mcde:
- improve bridge handling
gma500:
- use drm client buffer for fbdev framebuffer
amdxdna:
- add sensors ioctls
- provide NPU power estimate
- support column utilization sensor
- allow forcing DMA through IOMMU IOVA
- support per-BO mem usage queries
- refactor GEM implementation
ivpu:
- update boot API to v3.29.4
- limit per-user number of doorbells/contexts
- perform engine reset on TDR error
loongson:
- replace custom code with drm_gem_ttm_dumb_map_offset()
imx:
- support planes behind the primary plane
- fix bus-format selection
vkms:
- support CRTC background color
v3d:
- improve handling of struct v3d_stats
komeda:
- support Arm China Linlon D6 plus DT bindings
imagination:
- improve power-off sequence
- support context-reset notification from firmware
mediatek:
- mtk_dsi: enable hs clock during pre-enable
- Remove all conflicting aperture devices during probe
- Add support for mt8167 display blocks"
* tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernel: (1735 commits)
drm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc
drm/ttm/tests: fix lru_count ASSERT
drm/vram: remove DRM_VRAM_MM_FILE_OPERATIONS from docs
drm/fb-helper: Fix a locking bug in an error path
dma-fence: correct kernel-doc function parameter @flags
ttm/pool: track allocated_pages per numa node.
ttm/pool: make pool shrinker NUMA aware (v2)
ttm/pool: drop numa specific pools
ttm/pool: port to list_lru. (v2)
drm/ttm: use gpu mm stats to track gpu memory allocations. (v4)
mm: add gpu active/reclaim per-node stat counters (v2)
gpu: nova-core: fix missing colon in SEC2 boot debug message
gpu: nova-core: vbios: use from_le_bytes() for PCI ROM header parsing
gpu: nova-core: bitfield: fix broken Default implementation
gpu: nova-core: falcon: pad firmware DMA object size to required block alignment
gpu: nova-core: gsp: fix undefined behavior in command queue code
drm/shmem_helper: Make sure PMD entries get the writeable upgrade
accel/ivpu: Trigger recovery on TDR with OS scheduling
drm/msm: Use of_get_available_child_by_name()
dt-bindings: display/msm: move DSI PHY bindings to phy/ subdir
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:
- powerpc support for huge pfnmaps
- Cleanups to use masked user access
- Rework pnv_ioda_pick_m64_pe() to use better bitmap API
- Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
- Backup region offset update to eflcorehdr
- Fixes for wii/ps3 platform
- Implement JIT support for private stack in powerpc
- Implement JIT support for fsession in powerpc64 trampoline
- Add support for instruction array and indirect jump in powerpc
- Misc selftest fixes and cleanups
Thanks to Abhishek Dubey, Aditya Gupta, Alex Williamson, Amit Machhiwal,
Andrew Donnellan, Bartosz Golaszewski, Cédric Le Goater, Chen Ni,
Christophe Leroy (CS GROUP), Hari Bathini, J. Neuschäfer, Mukesh Kumar
Chaurasiya (IBM), Nam Cao, Nilay Shroff, Pavithra Prakash, Randy Dunlap,
Ritesh Harjani (IBM), Shrikanth Hegde, Sourabh Jain, Vaibhav Jain,
Venkat Rao Bagalkote, and Yury Norov (NVIDIA)
* tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (47 commits)
mailmap: Add entry for Andrew Donnellan
powerpc32/bpf: fix loading fsession func metadata using PPC_LI32
selftest/bpf: Enable gotox tests for powerpc64
powerpc64/bpf: Add support for indirect jump
selftest/bpf: Enable instruction array test for powerpc
powerpc/bpf: Add support for instruction array
powerpc32/bpf: Add fsession support
powerpc64/bpf: Implement fsession support
selftests/bpf: Enable private stack tests for powerpc64
powerpc64/bpf: Implement JIT support for private stack
powerpc: pci-ioda: Optimize pnv_ioda_pick_m64_pe()
powerpc: pci-ioda: use bitmap_alloc() in pnv_ioda_pick_m64_pe()
powerpc/net: Inline checksum wrappers and convert to scoped user access
powerpc/sstep: Convert to scoped user access
powerpc/align: Convert emulate_spe() to scoped user access
powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access
powerpc/futex: Use masked user access
powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
cpuidle: powerpc: avoid double clear when breaking snooze
powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"debugfs:
- Fix NULL pointer dereference in debugfs_create_str()
- Fix misplaced EXPORT_SYMBOL_GPL for debugfs_create_str()
- Fix soundwire debugfs NULL pointer dereference from uninitialized
firmware_file
device property:
- Make fwnode flags modifications thread safe; widen the field to
unsigned long and use set_bit() / clear_bit() based accessors
- Document how to check for the property presence
devres:
- Separate struct devres_node from its "subclasses" (struct devres,
struct devres_group); give struct devres_node its own release and
free callbacks for per-type dispatch
- Introduce struct devres_action for devres actions, avoiding the
ARCH_DMA_MINALIGN alignment overhead of struct devres
- Export struct devres_node and its init/add/remove/dbginfo
primitives for use by Rust Devres<T>
- Fix missing node debug info in devm_krealloc()
- Use guard(spinlock_irqsave) where applicable; consolidate unlock
paths in devres_release_group()
driver_override:
- Convert PCI, WMI, vdpa, s390/cio, s390/ap, and fsl-mc to the
generic driver_override infrastructure, replacing per-bus
driver_override strings, sysfs attributes, and match logic; fixes a
potential UAF from unsynchronized access to driver_override in bus
match() callbacks
- Simplify __device_set_driver_override() logic
kernfs:
- Send IN_DELETE_SELF and IN_IGNORED inotify events on kernfs file
and directory removal
- Add corresponding selftests for memcg
platform:
- Allow attaching software nodes when creating platform devices via a
new 'swnode' field in struct platform_device_info
- Add kerneldoc for struct platform_device_info
software node:
- Move software node initialization from postcore_initcall() to
driver_init(), making it available early in the boot process
- Move kernel_kobj initialization (ksysfs_init) earlier to support
the above
- Remove software_node_exit(); dead code in a built-in unit
SoC:
- Introduce of_machine_read_compatible() and of_machine_read_model()
OF helpers and export soc_attr_read_machine() to replace direct
accesses to of_root from SoC drivers; also enables
CONFIG_COMPILE_TEST coverage for these drivers
sysfs:
- Constify attribute group array pointers to
'const struct attribute_group *const *' in sysfs functions,
device_add_groups() / device_remove_groups(), and struct class
Rust:
- Devres:
- Embed struct devres_node directly in Devres<T> instead of going
through devm_add_action(), avoiding the extra allocation and the
unnecessary ARCH_DMA_MINALIGN alignment
- I/O:
- Turn IoCapable from a marker trait into a functional trait
carrying the raw I/O accessor implementation (io_read /
io_write), providing working defaults for the per-type Io
methods
- Add RelaxedMmio wrapper type, making relaxed accessors usable in
code generic over the Io trait
- Remove overloaded per-type Io methods and per-backend macros
from Mmio and PCI ConfigSpace
- I/O (Register):
- Add IoLoc trait and generic read/write/update methods to the Io
trait, making I/O operations parameterizable by typed locations
- Add register! macro for defining hardware register types with
typed bitfield accessors backed by Bounded values; supports
direct, relative, and array register addressing
- Add write_reg() / try_write_reg() and LocatedRegister trait
- Update PCI sample driver to demonstrate the register! macro
Example:
```
register! {
/// UART control register.
CTRL(u32) @ 0x18 {
/// Receiver enable.
19:19 rx_enable => bool;
/// Parity configuration.
14:13 parity ?=> Parity;
}
/// FIFO watermark and counter register.
WATER(u32) @ 0x2c {
/// Number of datawords in the receive FIFO.
26:24 rx_count;
/// RX interrupt threshold.
17:16 rx_water;
}
}
impl WATER {
fn rx_above_watermark(&self) -> bool {
self.rx_count() > self.rx_water()
}
}
fn init(bar: &pci::Bar<BAR0_SIZE>) {
let water = WATER::zeroed()
.with_const_rx_water::<1>(); // > 3 would not compile
bar.write_reg(water);
let ctrl = CTRL::zeroed()
.with_parity(Parity::Even)
.with_rx_enable(true);
bar.write_reg(ctrl);
}
fn handle_rx(bar: &pci::Bar<BAR0_SIZE>) {
if bar.read(WATER).rx_above_watermark() {
// drain the FIFO
}
}
fn set_parity(bar: &pci::Bar<BAR0_SIZE>, parity: Parity) {
bar.update(CTRL, |r| r.with_parity(parity));
}
```
- IRQ:
- Move 'static bounds from where clauses to trait declarations for
IRQ handler traits
- Misc:
- Enable the generic_arg_infer Rust feature
- Extend Bounded with shift operations, single-bit bool
conversion, and const get()
Misc:
- Make deferred_probe_timeout default a Kconfig option
- Drop auxiliary_dev_pm_ops; the PM core falls back to driver PM
callbacks when no bus type PM ops are set
- Add conditional guard support for device_lock()
- Add ksysfs.c to the DRIVER CORE MAINTAINERS entry
- Fix kernel-doc warnings in base.h
- Fix stale reference to memory_block_add_nid() in documentation"
* tag 'driver-core-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (67 commits)
bus: fsl-mc: use generic driver_override infrastructure
s390/ap: use generic driver_override infrastructure
s390/cio: use generic driver_override infrastructure
vdpa: use generic driver_override infrastructure
platform/wmi: use generic driver_override infrastructure
PCI: use generic driver_override infrastructure
driver core: make software nodes available earlier
software node: remove software_node_exit()
kernel: ksysfs: initialize kernel_kobj earlier
MAINTAINERS: add ksysfs.c to the DRIVER CORE entry
drivers/base/memory: fix stale reference to memory_block_add_nid()
device property: Document how to check for the property presence
soundwire: debugfs: initialize firmware_file to empty string
debugfs: fix placement of EXPORT_SYMBOL_GPL for debugfs_create_str()
debugfs: check for NULL pointer in debugfs_create_str()
driver core: Make deferred_probe_timeout default a Kconfig option
driver core: simplify __device_set_driver_override() clearing logic
driver core: auxiliary bus: Drop auxiliary_dev_pm_ops
device property: Make modifications of fwnode "flags" thread safe
rust: devres: embed struct devres_node directly
...
|
|
The driver is implementing its own .release(), which means that it needs
to call vfio_pci_core_release_dev().
Add the missing call.
Fixes: 1f5556ec8b9ef ("vfio/xe: Add device specific vfio_pci driver variant for Intel graphics")
Reported-by: Niklas Schnelle <schnelle@linux.ibm.com>
Closes: https://lore.kernel.org/kvm/408e262c507e8fd628a71e39904fedd99fa0ee8e.camel@linux.ibm.com/
Cc: stable@vger.kernel.org
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260410224948.900550-2-michal.winiarski@intel.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Attempting to issue reset on VF devices that don't support migration
leads to the following:
BUG: unable to handle page fault for address: 00000000000011f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 7443 Comm: xe_sriov_flr Tainted: G S U 7.0.0-rc1-lgci-xe-xe-4588-cec43d5c2696af219-nodebug+ #1 PREEMPT(lazy)
Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
RIP: 0010:xe_sriov_vfio_wait_flr_done+0xc/0x80 [xe]
Code: ff c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 53 <83> bf f8 11 00 00 02 75 61 41 89 f4 85 f6 74 52 48 8b 47 08 48 89
RSP: 0018:ffffc9000f7c39b8 EFLAGS: 00010202
RAX: ffffffffa04d8660 RBX: ffff88813e3e4000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc9000f7c39c8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888101a48800
R13: ffff88813e3e4150 R14: ffff888130d0d008 R15: ffff88813e3e40d0
FS: 00007877d3d0d940(0000) GS:ffff88890b6d3000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000011f8 CR3: 000000015a762000 CR4: 0000000000f52ef0
PKRU: 55555554
Call Trace:
<TASK>
xe_vfio_pci_reset_done+0x49/0x120 [xe_vfio_pci]
pci_dev_restore+0x3b/0x80
pci_reset_function+0x109/0x140
reset_store+0x5c/0xb0
dev_attr_store+0x17/0x40
sysfs_kf_write+0x72/0x90
kernfs_fop_write_iter+0x161/0x1f0
vfs_write+0x261/0x440
ksys_write+0x69/0xf0
__x64_sys_write+0x19/0x30
x64_sys_call+0x259/0x26e0
do_syscall_64+0xcb/0x1500
? __fput+0x1a2/0x2d0
? fput_close_sync+0x3d/0xa0
? __x64_sys_close+0x3e/0x90
? x64_sys_call+0x1b7c/0x26e0
? do_syscall_64+0x109/0x1500
? __task_pid_nr_ns+0x68/0x100
? __do_sys_getpid+0x1d/0x30
? x64_sys_call+0x10b5/0x26e0
? do_syscall_64+0x109/0x1500
? putname+0x41/0x90
? do_faccessat+0x1e8/0x300
? __x64_sys_access+0x1c/0x30
? x64_sys_call+0x1822/0x26e0
? do_syscall_64+0x109/0x1500
? tick_program_event+0x43/0xa0
? hrtimer_interrupt+0x126/0x260
? irqentry_exit+0xb2/0x710
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7877d5f1c5a4
Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d a5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
RSP: 002b:00007fff48e5f908 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007877d5f1c5a4
RDX: 0000000000000001 RSI: 00007877d621b0c9 RDI: 0000000000000009
RBP: 0000000000000001 R08: 00005fb49113b010 R09: 0000000000000007
R10: 0000000000000000 R11: 0000000000000202 R12: 00007877d621b0c9
R13: 0000000000000009 R14: 00007fff48e5fac0 R15: 00007fff48e5fac0
</TASK>
This is caused by the fact that some of the xe_vfio_pci_core_device
members needed for handling reset are only initialized as part of
migration init.
Fix the problem by reorganizing the code to decouple VF init from
migration init.
Fixes: 1f5556ec8b9ef ("vfio/xe: Add device specific vfio_pci driver variant for Intel graphics")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7352
Cc: stable@vger.kernel.org
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260410224948.900550-1-michal.winiarski@intel.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
group->notifier is dead code. VFIO initializes it and checks it for
emptiness on teardown, but nobody ever registers on it or triggers it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260407175934.1602711-1-pbonzini@redhat.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Link: https://patch.msgid.link/20260324005919.2408620-3-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override")
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex@shazbot.org>
Tested-by: Gui-Dong Han <hanguidong02@gmail.com>
Reviewed-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://patch.msgid.link/20260324005919.2408620-6-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
vfio-pci-core code makes use of the vfio_device_ops.name field in order
to set a default driver_override for VFs created on a user-owned PF.
This avoids default driver matching, which might otherwise bind those
VFs to native drivers.
The mechanism for this currently uses kasprintf(), which will set
driver_override to the literal "(null)" if name is NULL. This is
effective in sequestering the device, but presents a challenging debug
situation to differentiate driver_override being set to "(null)" versus
being NULL and interpreted as "(null)" via the sysfs show attribute.
There's also a tree-wide effort to convert to generic driver_override
support, where passing NULL will generate an error, resulting in a
WARN_ON without setting any driver_override.
All drivers making use of vfio-pci-core already set a driver name,
therefore by requiring this behavior, all of these corner cases are
rendered moot. This is expected to have no impact on current
in-kernel drivers.
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20260331202443.2598404-1-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Add a vfio_pci variant driver for the s390-specific Internal Shared
Memory (ISM) devices used for inter-VM communication.
This enables the development of vfio-pci-based user space drivers for
ISM devices.
On s390, kernel primitives such as ioread() and iowrite() are switched
over from function-handle-based PCI load/stores instructions to PCI
memory-I/O (MIO) loads/stores when these are available and not
explicitly disabled. Since these instructions cannot be used with ISM
devices, ensure that classic function-handle-based PCI instructions are
used instead.
The driver is still required even when MIO instructions are disabled, as
the ISM device relies on the PCI store block (PCISTB) instruction to
perform write operations.
Stores are not fragmented, therefore one ioctl corresponds to exactly
one PCISTB instruction. User space must ensure to not write more than
4096 bytes at once to an ISM BAR which is the maximum payload of the
PCISTB instruction.
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Julian Ruess <julianr@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Link: https://lore.kernel.org/r/20260325-vfio_pci_ism-v8-2-ddc504cde914@linux.ibm.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
export it
A follow-up patch adds a new variant driver for s390 ISM devices. Since
this device uses a 256 TiB BAR 0 that is never mapped, the variant
driver needs its own ISM_VFIO_PCI_OFFSET_MASK. To minimally mirror the
functionality of vfio_pci_config_rw() with such a custom mask, export
vfio_config_do_rw(). To better distinguish the now exported function
from vfio_pci_config_rw(), rename it to vfio_pci_config_rw_single()
emphasizing that it does a single config space read or write.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Julian Ruess <julianr@linux.ibm.com>
Link: https://lore.kernel.org/r/20260325-vfio_pci_ism-v8-1-ddc504cde914@linux.ibm.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Linux 7.0-rc6
Requested by a few people on irc to resolve conflicts in other tress.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Architectures like PowerPC uses runtime defined values for
PMD_ORDER/PUD_ORDER. This is because it can use either RADIX or HASH MMU
at runtime using kernel cmdline. So the pXd_index_size is not known at
compile time. Without this fix, when we add huge pfn support on powerpc
in the next patch, vfio_pci_core driver compilation can fail with the
following errors.
CC [M] drivers/vfio/vfio_main.o
CC [M] drivers/vfio/group.o
CC [M] drivers/vfio/container.o
CC [M] drivers/vfio/virqfd.o
CC [M] drivers/vfio/vfio_iommu_spapr_tce.o
CC [M] drivers/vfio/pci/vfio_pci_core.o
CC [M] drivers/vfio/pci/vfio_pci_intrs.o
CC [M] drivers/vfio/pci/vfio_pci_rdwr.o
CC [M] drivers/vfio/pci/vfio_pci_config.o
CC [M] drivers/vfio/pci/vfio_pci.o
AR kernel/built-in.a
../drivers/vfio/pci/vfio_pci_core.c: In function ‘vfio_pci_vmf_insert_pfn’:
../drivers/vfio/pci/vfio_pci_core.c:1678:9: error: case label does not reduce to an integer constant
1678 | case PMD_ORDER:
| ^~~~
../drivers/vfio/pci/vfio_pci_core.c:1682:9: error: case label does not reduce to an integer constant
1682 | case PUD_ORDER:
| ^~~~
make[6]: *** [../scripts/Makefile.build:289: drivers/vfio/pci/vfio_pci_core.o] Error 1
make[6]: *** Waiting for unfinished jobs....
make[5]: *** [../scripts/Makefile.build:546: drivers/vfio/pci] Error 2
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [../scripts/Makefile.build:546: drivers/vfio] Error 2
make[3]: *** [../scripts/Makefile.build:546: drivers] Error 2
Fixes: f9e54c3a2f5b7 ("vfio/pci: implement huge_fault support")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Alex Williamson <alex@shazbot.org>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/b155e19993ee1f5584c72050192eb468b31c5029.1773058761.git.ritesh.list@gmail.com
|
|
The error path through vfio_pci_core_feature_dma_buf() ignores its
own advice to only use dma_buf_put() after dma_buf_export(), instead
falling through the entire unwind chain. In the unlikely event that
we encounter file descriptor exhaustion, this can result in an
unbalanced refcount on the vfio device and double free of allocated
objects.
Avoid this by moving the "put" directly into the error path and return
the errno rather than entering the unwind chain.
Reported-by: Renato Marziano <renato@marziano.top>
Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
Cc: stable@vger.kernel.org
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Link: https://lore.kernel.org/r/20260323215659.2108191-3-alex.williamson@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Hook into the PCI error handler reset_prepare() callback to notify
the PF about an upcoming VF FLR before reset_done() is executed.
This enables early FLR_PREPARE signaling and ensures that the PF is
aware of the reset before the completion wait begins.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Alex Williamson <alex@shazbot.org>
Link: https://patch.msgid.link/20260309152449.910636-3-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
Extend the qat_vfio_pci variant driver to support QAT 420xx (GEN 5)
Virtual Functions (VFs).
Add the relevant VF device ID to the probe table.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Link: https://lore.kernel.org/r/20260320213622.88549-2-giovanni.cabiddu@intel.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
When userspace opts into VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2, the
driver may report the VFIO_PRECOPY_INFO_REINIT output flag in response
to the VFIO_MIG_GET_PRECOPY_INFO ioctl, along with a new initial_bytes
value.
The presence of the VFIO_PRECOPY_INFO_REINIT flag indicates to the
caller that new initial data is available in the migration stream.
If the firmware reports a new initial-data chunk, any previously dirty
bytes in memory are treated as initial bytes, since the caller must read
both sets before reaching the end of the initial-data region.
In this case, the driver issues a new SAVE command to fetch the data and
prepare it for a subsequent read() from userspace.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260317161753.18964-7-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Consider an inflight SAVE operation during the PRE_COPY phase, so the
caller will wait when no data is currently available but is expected
to arrive.
This enables a follow-up patch to avoid returning -ENOMSG while a new
*initial_bytes* chunk is still pending from an asynchronous SAVE command
issued by the VFIO_MIG_GET_PRECOPY_INFO ioctl.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260317161753.18964-6-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Introduce a core helper function for VFIO_MIG_GET_PRECOPY_INFO and adapt
all drivers to use it.
It centralizes the common code and ensures that output flags are cleared
on entry, in case user opts in to VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
This preventing any unintended echoing of userspace data back to
userspace.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260317161753.18964-4-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
assign info.flags before copy_to_user().
Because they copy the struct in from userspace first, this effectively
echoes userspace-provided flags back as output, preventing the field
from being used to report new reliable data from the drivers.
Add support for a new device feature named
VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
On SET, enables the v2 pre_copy_info behaviour, where the
vfio_precopy_info.flags is a valid output field.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260317161753.18964-3-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Replace vfio->class with a const struct class and drop the
class_create() call.
Compile tested and found no errors/warns in dmesg after enabling
VFIO_GROUP.
Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://lore.kernel.org/r/20260306190628.259203-1-jkoolstra@xs4all.nl
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Currently, the QAT VFIO PCI driver can only be configured when the 4xxx
QAT driver (CRYPTO_DEV_QAT_4XXX) is enabled. This is too restrictive as
the VFIO driver also supports VFs from the 420xx and 6xxx device
families, which share a compatible migration interface.
Extends the Kconfig dependencies to allow configuration when any of the
supported QAT device families (4xxx, 420xx, or 6xxx) are enabled.
Signed-off-by: Vijay Sundar Selvamani <vijay.sundar.selvamani@intel.com>
Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Link: https://lore.kernel.org/r/20260213091403.72338-1-suman.kumar.chakraborty@intel.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Till now VFIO has rejected pinned importers, largely to avoid being used
with the RDMA pinned importer that cannot handle a move_notify() to revoke
access.
Using dma_buf_attach_revocable() it can tell the difference between pinned
importers that support the flow described in dma_buf_invalidate_mappings()
and those that don't.
Thus permit compatible pinned importers.
This is one of two items IOMMUFD requires to remove its private interface
to VFIO's dma-buf.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex@shazbot.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260131-dmabuf-revoke-v7-7-463d956bd527@nvidia.com
|
|
dma-buf invalidation is handled asynchronously by the hardware, so VFIO
must wait until all affected objects have been fully invalidated.
In addition, the dma-buf exporter is expecting that all importers unmap any
buffers they previously mapped.
Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260131-dmabuf-revoke-v7-4-463d956bd527@nvidia.com
|
|
Let's merge 7.0-rc1 to start the new drm-misc-next window
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Pull VFIO updates from Alex Williamson:
"A small cycle with the bulk in selftests and reintroducing poison
handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and
some dmabuf structure consolidation.
- Update outdated mdev comment referencing the renamed
mdev_type_add() function (Julia Lawall)
- Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex
Mastro)
- Relax selftest assertion relative to differences in huge page
handling between legacy (v1) TYPE1 IOMMU mapping behavior and the
compatibility mode supported by IOMMUFD (David Matlack)
- Reintroduce memory poison handling support for non-struct-page-
backed memory in the nvgrace-gpu variant driver (Ankit Agrawal)
- Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure
and semantics (Leon Romanovsky)
- Add missing upstream bridge locking across PCI function reset,
resolving an assertion failure when secondary bus reset is used to
provide that reset (Anthony Pighin)
- Fixes to hisi_acc vfio-pci variant driver to resolve corner case
issues related to resets, repeated migration, and error injection
scenarios (Longfang Liu, Weili Qian)
- Restrict vfio selftest builds to arm64 and x86_64, resolving
compiler warnings on 32-bit archs (Ted Logan)
- Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
stepped up (Ioana Ciornei)"
* tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio:
vfio/fsl-mc: add myself as maintainer
vfio: selftests: only build tests on arm64 and x86_64
hisi_acc_vfio_pci: fix the queue parameter anomaly issue
hisi_acc_vfio_pci: resolve duplicate migration states
hisi_acc_vfio_pci: update status after RAS error
hisi_acc_vfio_pci: fix VF reset timeout issue
vfio/pci: Lock upstream bridge for vfio_pci_core_disable()
types: reuse common phys_vec type instead of DMABUF open‑coded variant
vfio/nvgrace-gpu: register device memory for poison handling
mm: add stubs for PFNMAP memory failure registration functions
vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU
vfio: selftests: Add vfio_dma_mapping_mmio_test
vfio: selftests: Align BAR mmaps for efficient IOMMU mapping
vfio: selftests: Centralize IOMMU mode name definitions
vfio/mdev: update outdated comment
|
|
Add myself as maintainer of the vfio/fsl-mc driver. The driver is still
highly in use on Layerscape DPAA2 SoCs.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20260204100913.3197966-1-ioana.ciornei@nxp.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Backmerging to get bug fixes from v6.19-rc7.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
When the number of QPs initialized by the device, as read via vft, is zero,
it indicates either an abnormal device configuration or an abnormal read
result.
Returning 0 directly in this case would allow the live migration operation
to complete successfully, leading to incorrect parameter configuration after
migration and preventing the service from recovering normal functionality.
Therefore, in such situations, an error should be returned to roll back the
live migration operation.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Link: https://lore.kernel.org/r/20260122020205.2884497-5-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
In special scenarios involving duplicate migrations, after the
first migration is completed, if the original VF device is used
again and then migrated to another destination, the state indicating
data migration completion for the VF device is not reset.
This results in the second migration to the destination being skipped
without performing data migration.
After the modification, it ensures that a complete data migration
is performed after the subsequent migration.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Link: https://lore.kernel.org/r/20260122020205.2884497-4-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
After a RAS error occurs on the accelerator device, the accelerator
device will be reset. The live migration state will be abnormal
after reset, and the original state needs to be restored during
the reset process.
Therefore, reset processing needs to be performed in a live
migration scenario.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Link: https://lore.kernel.org/r/20260122020205.2884497-3-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
If device error occurs during live migration, qemu will
reset the VF. At this time, VF reset and device reset are performed
simultaneously. The VF reset will timeout. Therefore, the QM_RESETTING
flag is used to ensure that VF reset and device reset are performed
serially.
Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration")
Signed-off-by: Weili Qian <qianweili@huawei.com>
Link: https://lore.kernel.org/r/20260122020205.2884497-2-liulongfang@huawei.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Along with renaming the .move_notify() callback, rename the corresponding
dma-buf core function. This makes the expected behavior clear to exporters
calling this function.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260124-dmabuf-revoke-v5-2-f98fca917e96@nvidia.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Some pinned importers, such as non-ODP RDMA ones, cannot invalidate their
mappings and therefore must be prevented from attaching to this exporter.
Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
The commit 7e89efc6e9e4 ("Lock upstream bridge for pci_reset_function()")
added locking of the upstream bridge to the reset function. To catch
paths that are not properly locked, the commit 920f6468924f ("Warn on
missing cfg_access_lock during secondary bus reset") added a warning
if the PCI configuration space was not locked during a secondary bus reset
request.
When a VFIO PCI device is released from userspace ownership, an attempt
to reset the PCI device function may be made. If so, and the upstream bridge
is not locked, the release request results in a warning:
pcieport 0000:00:00.0: unlocked secondary bus reset via:
pci_reset_bus_function+0x188/0x1b8
Add missing upstream bridge locking to vfio_pci_core_disable().
Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()")
Signed-off-by: Anthony Pighin <anthony.pighin@nokia.com>
Link: https://lore.kernel.org/r/BN0PR08MB695171D3AB759C65B6438B5D838DA@BN0PR08MB6951.namprd08.prod.outlook.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
* Reuse common phys_vec, phase out dma_buf_phys_vec
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
After commit fcf463b92a08 ("types: move phys_vec definition to common header"),
we can use the shared phys_vec type instead of the DMABUF‑specific
dma_buf_phys_vec, which duplicated the same structure and semantics.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260107-convert-to-pvec-v1-1-6e3ab8079708@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|