| Age | Commit message (Collapse) | Author |
|
We had extended the sysreg masking infrastructure to more general
registers, instead of restricting it to VNCR-backed registers, since
commit a0162020095e ("KVM: arm64: Extend masking facility to arbitrary
registers"). Fix kvm_get_sysreg_res0() to reflect this fact.
Note that we're sure that we only deal with FGT registers in
kvm_get_sysreg_res0(), the
if (sr < __VNCR_START__)
is actually a never false, which should probably be removed later.
Fixes: 69c19e047dfe ("KVM: arm64: Add TCR2_EL2 to the sysreg arrays")
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260121101631.41037-1-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
|
|
fix typo "definitons" ==> "definitions" in DFHv1 Register Offset comment.
Signed-off-by: Mukesh Kumar Sisodiya <mukeshkumarsisodiya1981@gmail.com>
Link: https://lore.kernel.org/r/20251226150408.2802-1-mukeshkumarsisodiya1981@gmail.com
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
|
|
Add offloading for a link aggregation group supported by the YT921x
switches.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260117162116.1063043-1-mmyangfl@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
New boards: Orange Pi CM5 module + Baseboard, Radxa CM5 module + IO-board.
PCIe-slot-overlay for rk3576-evb1
New peripherals: some of the video decoders on rk3576 and rk3588
Enabled peripherals: many RK3588-NPUs and a lot of other peripherals on
a plethora of boards.
* tag 'v6.20-rockchip-dts64-1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (40 commits)
arm64: dts: rockchip: Add the vdpu383 Video Decoder on rk3576
arm64: dts: rockchip: Add the vdpu381 Video Decoders on RK3588
arm64: dts: rockchip: Add rk3588s-orangepi-cm5-base device tree
dt-bindings: arm: rockchip: Add Orange Pi CM5 Base
arm64: dts: rockchip: Enable second HDMI output on CM3588
arm64: dts: rockchip: Add HDMI to Gameforce Ace
arm64: dts: rockchip: Enable analog sound on RK3576 EVB1
arm64: dts: rockchip: Enable HDMI sound on RK3576 EVB1
arm64: dts: rockchip: Enable HDMI sound on Luckfox Core3576
arm64: dts: rockchip: Enable HDMI sound on FriendlyElec NanoPi M5
arm64: dts: rockchip: Use a readable audio card name on NanoPi M5
arm64: dts: rockchip: enable NPU on rk3588-jaguar
arm64: dts: rockchip: enable NPU on rk3588-tiger
dt-bindings: arm: rockchip: fix description for Radxa CM5
dt-bindings: arm: rockchip: fix description for Radxa CM3I
arm64: dts: rockchip: Add missing everest,es8388 supplies to rk3399-roc-pc-plus
arm64: dts: rockchip: Enable PCIe for ArmSoM Sige1
arm64: dts: rockchip: Enable the NPU on Turing RK1
arm64: dts: rockchip: Enable the NPU on FriendlyElec CM3588
arm64: dts: rockchip: Enable the NPU on NanoPC T6/T6-LTS
...
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
We currently only populate the FGT masks if the underlying HW does
support FEAT_FGT. However, with the addition of the RES1 support for
system registers, this results in a lot of noise at boot time, as
reported by Nathan.
That's because even if FGT isn't supported, we still check for the
attribution of the bits to particular features, and not keeping the
masks up-to-date leads to (fairly harmess) warnings.
Given that we want these checks to be enforced even if the HW doesn't
support FGT, enable the generation of FGT masks unconditionally (this
is rather cheap anyway). Only the storage of the FGT configuration is
avoided, which will save a tiny bit of memory on these machines.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Fixes: c259d763e6b09 ("KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co")
Link: https://lore.kernel.org/r/20260120211558.GA834868@ax162
Link: https://patch.msgid.link/20260122085153.535538-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
HEVC decoder node for RK3288.
* tag 'v6.20-rockchip-dts32-1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: Add vdec node for RK3288
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
CONFIG_DEVICE_PRIVATE is a prerequisite for DRM_XE_GPUSVM.
Explicitly select it so that DRM_XE_GPUSVM is not unintentionally
left out from distro configs not explicitly enabling
CONFIG_DEVICE_PRIVATE.
v2:
- Select also CONFIG_ZONE_DEVICE since it's needed by
CONFIG_DEVICE_PRIVATE.
v3:
- Depend on CONFIG_ZONE_DEVICE rather than selecting it.
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260121091048.41371-3-thomas.hellstrom@linux.intel.com
|
|
CONFIG_DEVICE_PRIVATE is not selected by default by some distros,
for example Fedora, and that leads to a regression in the xe driver
since userptr support gets compiled out.
It turns out that DRM_GPUSVM, which is needed for xe userptr support
compiles also without CONFIG_DEVICE_PRIVATE, but doesn't compile
without CONFIG_ZONE_DEVICE.
Exclude the drm_pagemap files from compilation with !CONFIG_ZONE_DEVICE,
and remove the CONFIG_DEVICE_PRIVATE dependency from CONFIG_DRM_GPUSVM and
the xe driver's selection of it, re-enabling xe userptr for those configs.
v2:
- Don't compile the drm_pagemap files unless CONFIG_ZONE_DEVICE is set.
- Adjust the drm_pagemap.h header accordingly.
Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.18+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/20260121091048.41371-2-thomas.hellstrom@linux.intel.com
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/dt
Armv8 Juno/Vexpress updates for v7.0
This contains a small set of DT updates:
1. Align DTS node naming with established coding style by replacing underscores
with hyphens in node names. This is a safe change and does not affect ABI.
2. Add support for the CMN PMU on the Arm Morello platform, exposing the
CMN-Skeena (CMN-600 r3p1–compatible) PMU via the standard CMN-600 binding.
This enables PMU access on real Morello SDP hardware, where the registers
are functional.
* tag 'juno-updates-7.0' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
arm64: dts: arm: Use hyphen in node names
arm64: dts: morello: Add CMN PMU
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
The Intel VT-d PASID table entry is 512 bits (64 bytes). When replacing
an active PASID entry (e.g., during domain replacement), the current
implementation calculates a new entry on the stack and copies it to the
table using a single structure assignment.
struct pasid_entry *pte, new_pte;
pte = intel_pasid_get_entry(dev, pasid);
pasid_pte_config_first_level(iommu, &new_pte, ...);
*pte = new_pte;
Because the hardware may fetch the 512-bit PASID entry in multiple
128-bit chunks, updating the entire entry while it is active (Present
bit set) risks a "torn" read. In this scenario, the IOMMU hardware
could observe an inconsistent state — partially new data and partially
old data — leading to unpredictable behavior or spurious faults.
Fix this by removing the unsafe "replace" helpers and following the
"clear-then-update" flow, which ensures the Present bit is cleared and
the required invalidation handshake is completed before the new
configuration is applied.
Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
When tearing down a context entry, the current implementation zeros the
entire 128-bit entry using multiple 64-bit writes. This creates a window
where the hardware can fetch a "torn" entry — where some fields are
already zeroed while the 'Present' bit is still set — leading to
unpredictable behavior or spurious faults.
While x86 provides strong write ordering, the compiler may reorder writes
to the two 64-bit halves of the context entry. Even without compiler
reordering, the hardware fetch is not guaranteed to be atomic with
respect to multiple CPU writes.
Align with the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:
1. Clear only the 'Present' (P) bit of the context entry first to
signal the transition of ownership from hardware to software.
2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
3. Perform the required cache and context-cache invalidation to ensure
hardware no longer has cached references to the entry.
4. Fully zero out the entry only after the invalidation is complete.
Also, add a dma_wmb() to context_set_present() to ensure the entry
is fully initialized before the 'Present' bit becomes visible.
Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
Reported-by: Dmytro Maluka <dmaluka@chromium.org>
Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
The Intel VT-d Scalable Mode PASID table entry consists of 512 bits (64
bytes). When tearing down an entry, the current implementation zeros the
entire 64-byte structure immediately using multiple 64-bit writes.
Since the IOMMU hardware may fetch these 64 bytes using multiple
internal transactions (e.g., four 128-bit bursts), updating or zeroing
the entire entry while it is active (P=1) risks a "torn" read. If a
hardware fetch occurs simultaneously with the CPU zeroing the entry, the
hardware could observe an inconsistent state, leading to unpredictable
behavior or spurious faults.
Follow the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:
1. Clear only the 'Present' (P) bit of the PASID entry.
2. Use a dma_wmb() to ensure the cleared bit is visible to hardware
before proceeding.
3. Execute the required invalidation sequence (PASID cache, IOTLB, and
Device-TLB flush) to ensure the hardware has released all cached
references.
4. Only after the flushes are complete, zero out the remaining fields
of the PASID entry.
Also, add a dma_wmb() in pasid_set_present() to ensure that all other
fields of the PASID entry are visible to the hardware before the Present
bit is set.
Fixes: 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Besides the paging domains that use FS, SVM and Nested domains need to
use piotlb invalidation descriptor as well.
Fixes: b33125296b50 ("iommu/vt-d: Create unique domain ops for each stage")
Cc: stable@vger.kernel.org
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20251223065824.6164-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
When writing the address of a freshly allocated zero-initialized PASID
table to a PASID directory entry, do that after the CPU cache flush for
this PASID table, not before it, to avoid the time window when this
PASID table may be already used by non-coherent IOMMU hardware while
its contents in RAM is still some random old data, not zero-initialized.
Fixes: 194b3348bdbb ("iommu/vt-d: Fix PASID directory pointer coherency")
Signed-off-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20251221123508.37495-1-dmaluka@chromium.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation
request when device is disconnected") relies on
pci_dev_is_disconnected() to skip ATS invalidation for
safely-removed devices, but it does not cover link-down caused
by faults, which can still hard-lock the system.
For example, if a VM fails to connect to the PCIe device,
"virsh destroy" is executed to release resources and isolate
the fault, but a hard-lockup occurs while releasing the group fd.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
intel_pasid_tear_down_entry
device_block_translation
blocking_domain_attach_dev
__iommu_attach_device
__iommu_device_set_domain
__iommu_group_set_domain_internal
iommu_detach_group
vfio_iommu_type1_detach_group
vfio_group_detach_container
vfio_group_fops_release
__fput
Although pci_device_is_present() is slower than
pci_dev_is_disconnected(), it still takes only ~70 µs on a
ConnectX-5 (8 GT/s, x2) and becomes even faster as PCIe speed
and width increase.
Besides, devtlb_invalidation_with_pasid() is called only in the
paths below, which are far less frequent than memory map/unmap.
1. mm-struct release
2. {attach,release}_dev
3. set/remove PASID
4. dirty-tracking setup
The gain in system stability far outweighs the negligible cost
of using pci_device_is_present() instead of pci_dev_is_disconnected()
to decide when to skip ATS invalidation, especially under GDR
high-load conditions.
Fixes: 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected")
Cc: stable@vger.kernel.org
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Link: https://lore.kernel.org/r/20251211035946.2071-3-guojinhui.liam@bytedance.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
scalable mode
PCIe endpoints with ATS enabled and passed through to userspace
(e.g., QEMU, DPDK) can hard-lock the host when their link drops,
either by surprise removal or by a link fault.
Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation
request when device is disconnected") adds pci_dev_is_disconnected()
to devtlb_invalidation_with_pasid() so ATS invalidation is skipped
only when the device is being safely removed, but it applies only
when Intel IOMMU scalable mode is enabled.
With scalable mode disabled or unsupported, a system hard-lock
occurs when a PCIe endpoint's link drops because the Intel IOMMU
waits indefinitely for an ATS invalidation that cannot complete.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
domain_context_clear_one_cb
pci_for_each_dma_alias
device_block_translation
blocking_domain_attach_dev
iommu_deinit_device
__iommu_group_remove_device
iommu_release_device
iommu_bus_notifier
blocking_notifier_call_chain
bus_notify
device_del
pci_remove_bus_device
pci_stop_and_remove_bus_device
pciehp_unconfigure_device
pciehp_disable_slot
pciehp_handle_presence_or_link_change
pciehp_ist
Commit 81e921fd3216 ("iommu/vt-d: Fix NULL domain on device release")
adds intel_pasid_teardown_sm_context() to intel_iommu_release_device(),
which calls qi_flush_dev_iotlb() and can also hard-lock the system
when a PCIe endpoint's link drops.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
intel_context_flush_no_pasid
device_pasid_table_teardown
pci_pasid_table_teardown
pci_for_each_dma_alias
intel_pasid_teardown_sm_context
intel_iommu_release_device
iommu_deinit_device
__iommu_group_remove_device
iommu_release_device
iommu_bus_notifier
blocking_notifier_call_chain
bus_notify
device_del
pci_remove_bus_device
pci_stop_and_remove_bus_device
pciehp_unconfigure_device
pciehp_disable_slot
pciehp_handle_presence_or_link_change
pciehp_ist
Sometimes the endpoint loses connection without a link-down event
(e.g., due to a link fault); killing the process (virsh destroy)
then hard-locks the host.
Call Trace:
qi_submit_sync
qi_flush_dev_iotlb
__context_flush_dev_iotlb.part.0
domain_context_clear_one_cb
pci_for_each_dma_alias
device_block_translation
blocking_domain_attach_dev
__iommu_attach_device
__iommu_device_set_domain
__iommu_group_set_domain_internal
iommu_detach_group
vfio_iommu_type1_detach_group
vfio_group_detach_container
vfio_group_fops_release
__fput
pci_dev_is_disconnected() only covers safe-removal paths;
pci_device_is_present() tests accessibility by reading
vendor/device IDs and internally calls pci_dev_is_disconnected().
On a ConnectX-5 (8 GT/s, x2) this costs ~70 µs.
Since __context_flush_dev_iotlb() is only called on
{attach,release}_dev paths (not hot), add pci_device_is_present()
there to skip inaccessible devices and avoid the hard-lock.
Fixes: 37764b952e1b ("iommu/vt-d: Global devTLB flush when present context entry changed")
Fixes: 81e921fd3216 ("iommu/vt-d: Fix NULL domain on device release")
Cc: stable@vger.kernel.org
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Link: https://lore.kernel.org/r/20251211035946.2071-2-guojinhui.liam@bytedance.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
The Rust kernel code should be kept `rustdoc`-clean [1].
Our custom `srctree` link checker in the `rustdoc` target reports:
warning: srctree/ link to include/io-pgtable.h does not exist
Thus fix it.
Link: https://rust-for-linux.com/contributing#submit-checklist-addendum [1]
Fixes: 2e2f6b0ef855 ("rust: iommu: add io_pgtable abstraction")
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
The Rust kernel code should be kept `rustfmt`-clean [1].
Thus run the `rustfmt` target to fix the formatting issue.
Link: https://rust-for-linux.com/contributing#submit-checklist-addendum [1]
Fixes: 2e2f6b0ef855 ("rust: iommu: add io_pgtable abstraction")
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
The uvc_alloc_urb_buffer() function implicitly depended on the
stream->urb_size field, which was set by its caller,
uvc_alloc_urb_buffers(). This implicit data flow makes the code harder
to follow.
More importantly, stream->urb_size was updated within the allocation
loop before the allocation was confirmed to be successful. If the
allocation failed, the stream object would be left with a urb_size that
doesn't correspond to valid, allocated URB buffers.
Refactor uvc_alloc_urb_buffer() to accept the buffer size as an explicit
argument. This makes the function's dependencies clear and improves the
robustness of the error handling path. The stream->urb_size is now set only
after a complete and successful allocation.
This is a pure refactoring and introduces no functional changes.
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Itay Chamiel <itay.chamiel@q.ai>
Link: https://patch.msgid.link/20260114-uvc-alloc-urb-v1-2-cedf3fb66711@chromium.org
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
If a frame has size of less or equal than one packet size
uvc_alloc_urb_buffers() is unable to allocate memory for it due to a
off-by-one error.
Fix the off-by-one-error and now that we are at it, make sure that
stream->urb_size has always a valid value when we return from the
function, even when an error happens.
Fixes: efdc8a9585ce ("V4L/DVB (10295): uvcvideo: Retry URB buffers allocation when the system is low on memory.")
Reported-by: Itay Chamiel <itay.chamiel@q.ai>
Closes: https://lore.kernel.org/linux-media/CANiDSCsSoZf2LsCCoWAUbCg6tJT-ypXR1B85aa6rAdMVYr2iBQ@mail.gmail.com/T/#t
Co-developed-by: Itay Chamiel <itay.chamiel@q.ai>
Signed-off-by: Itay Chamiel <itay.chamiel@q.ai>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Itay Chamiel <itay.chamiel@q.ai>
Link: https://patch.msgid.link/20260114-uvc-alloc-urb-v1-1-cedf3fb66711@chromium.org
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Return buffers if streaming fails to start due to uvc_pm_get() error.
This bug may be responsible for a warning I got running
while :; do yavta -c3 /dev/video0; done
on an xHCI controller which failed under this workload.
I had no luck reproducing this warning again to confirm.
xhci_hcd 0000:09:00.0: HC died; cleaning up
usb 13-2: USB disconnect, device number 2
WARNING: CPU: 2 PID: 29386 at drivers/media/common/videobuf2/videobuf2-core.c:1803 vb2_start_streaming+0xac/0x120
Fixes: 7dd56c47784a ("media: uvcvideo: Remove stream->is_streaming field")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patch.msgid.link/20251015133642.3dede646.michal.pecio@gmail.com
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Some devices, such as the Grandstream GUV3100 and the LSK Meeting Eye
for Business & Home, exhibit entity ID collisions between units and
streaming output terminals.
The UVC specification requires unit and terminal IDs to be unique, and
uses the ID to reference entities:
- In control requests, to identify the target entity
- In the UVC units and terminals descriptors' bSourceID field, to
identify source entities
- In the UVC input header descriptor's bTerminalLink, to identify the
terminal associated with a streaming interface
Entity ID collisions break accessing controls and make the graph
description in the UVC descriptors ambiguous. However, collisions where
one of the entities is a streaming output terminal and the other entity
is not a streaming terminal are less severe. Streaming output terminals
have no controls, and, as they are the final entity in pipelines, they
are never referenced in descriptors as source entities. They are
referenced by ID only from innput header descriptors, which by
definition only reference streaming terminals.
For these reasons, we can work around the collision by giving streaming
output terminals their own ID namespace. Do so by setting bit
UVC_TERM_OUTPUT (15) in the uvc_entity.id field, which is normally never
set as the ID is a 8-bit value.
This ID change doesn't affect the entity name in the media controller
graph as the name isn't constructed from the ID, so there should not be
any impact on the uAPI.
Although this change handles some ID collisions automagically, keep
printing an error in uvc_alloc_new_entity() when a camera has invalid
descriptors. Hopefully this message will help vendors fix their invalid
descriptors.
This new method of handling ID collisions includes a revert of commit
758dbc756aad ("media: uvcvideo: Use heuristic to find stream entity")
that attempted to fix the problem urgently due to regression reports.
Suggested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Tested-by: Lili Orosz <lily@floofy.city>
Co-developed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patch.msgid.link/20251113210400.28618-1-laurent.pinchart@ideasonboard.com
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix ARM64 port of the MSHV driver (Anirudh Rayabharam)
- Fix huge page handling in the MSHV driver (Stanislav Kinsburskii)
- Minor fixes to driver code (Julia Lawall, Michael Kelley)
* tag 'hyperv-fixes-signed-20260121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: handle gpa intercepts for arm64
mshv: add definitions for arm64 gpa intercepts
mshv: Add __user attribute to argument passed to access_ok()
mshv: Store the result of vfs_poll in a variable of type __poll_t
mshv: Align huge page stride with guest mapping
Drivers: hv: Always do Hyper-V panic notification in hv_kmsg_dump()
Drivers: hv: vmbus: fix typo in function name reference
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf-tools fix from Namhyung Kim:
"A minor fix for error handling in the event parser"
* tag 'perf-tools-fixes-for-v6.19-2026-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf parse-events: Fix evsel allocation failure
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
Subject: netfilter: updates for net-next
1) Speed up nftables transactions after earlier transaction failed.
Due to a (harmeless) bug we remained in slow paranoia mode until
a successful transaction completes.
2) Allow generic tracker to resolve clashes, this avoids very rare
packet drops. From Yuto Hamaguchi.
3) Increase the cleanup budget to 64 entries in nf_conncount to reap
more entries in one go, from Fernando Fernandez Mancera.
4) Allow icmp trackers to resolve clashes, this avoids very rare
initial packet drop with test cases that have high-frequency pings.
After this all trackers except tcp and sctp allow clash resolution.
5) Disentangle netfilter headers, don't include nftables/xtables headers
in subsystems that are unrelated.
6) Don't rely on implicit includes coming from nf_conntrack_proto_gre.h.
7) Allow nfnetlink_queue nfq instance struct to get accounted via memcg,
from Scott Mitchell.
8) Reject bogus xt target/match data upfront via netlink policiy in
nft_compat interface rather than relying on x_tables API to do it.
9) Fix nf_conncount breakage when trying to limit loopback flows via
prerouting rule, from Fernando Fernandez Mancera.
This is a recent breakage but not seen as urgent enough to rush this
via net tree at this late stage in development cycle.
10) Fix a possible off-by-one when parsing tcp option in xtables tcpmss
match. Also handled via -next due to late stage in development
cycle.
* tag 'nf-next-26-01-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: xt_tcpmss: check remaining length before reading optlen
netfilter: nf_conncount: fix tracking of connections from localhost
netfilter: nft_compat: add more restrictions on netlink attributes
netfilter: nfnetlink_queue: nfqnl_instance GFP_ATOMIC -> GFP_KERNEL_ACCOUNT allocation
netfilter: nf_conntrack: don't rely on implicit includes
netfilter: don't include xt and nftables.h in unrelated subsystems
netfilter: nf_conntrack: enable icmp clash support
netfilter: nf_conncount: increase the connection clean up limit to 64
netfilter: nf_conntrack: Add allow_clash to generic protocol handler
netfilter: nf_tables: reset table validation state on abort
====================
Link: https://patch.msgid.link/20260120191803.22208-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit adds error handling and rollback logic to
rvu_mbox_handler_attach_resources() to properly clean up partially
attached resources when rvu_attach_block() fails.
Fixes: 746ea74241fa0 ("octeontx2-af: Add RVU block LF provisioning support")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260121033934.1900761-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It turns out that 2500Base-X actually works fine with in-band status on
MediaTek's LynxI PCS -- I wrongly concluded it didn't because it is
broken in all the copper SFP modules and GPON sticks I used for testing.
Hence report LINK_INBAND_ENABLE also for 2500Base-X mode.
This reverts most of commit a003c38d9bbb ("net: pcs: pcs-mtk-lynxi:
correctly report in-band status capabilities").
The removal of the QSGMII interface mode was correct and is left
untouched.
Link: https://github.com/openwrt/openwrt/issues/21436
Fixes: a003c38d9bbb ("net: pcs: pcs-mtk-lynxi: correctly report in-band status capabilities")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/b1cf26157b63fee838be09ae810497fb22fd8104.1768961746.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the following:
BUG: KCSAN: data-race in rxrpc_peer_keepalive_worker / rxrpc_send_data_packet
which is reporting an issue with the reads and writes to ->last_tx_at in:
conn->peer->last_tx_at = ktime_get_seconds();
and:
keepalive_at = peer->last_tx_at + RXRPC_KEEPALIVE_TIME;
The lockless accesses to these to values aren't actually a problem as the
read only needs an approximate time of last transmission for the purposes
of deciding whether or not the transmission of a keepalive packet is
warranted yet.
Also, as ->last_tx_at is a 64-bit value, tearing can occur on a 32-bit
arch.
Fix both of these by switching to an unsigned int for ->last_tx_at and only
storing the LSW of the time64_t. It can then be reconstructed at need
provided no more than 68 years has elapsed since the last transmission.
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Reported-by: syzbot+6182afad5045e6703b3d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/695e7cfb.050a0220.1c677c.036b.GAE@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
Link: https://patch.msgid.link/1107124.1768903985@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-20 (ice, idpf)
For ice:
Cody Haas breaks dependency of needing both RSS key and LUT for
ice_get_rxfh() as ethtool ioctls do not always supply both.
Paul fixes issues related to devlink reload; adding missing deinit HW
call and moving hwmon exit function to the proper call chain.
For idpf:
Mina Almasry moves a register read call into the time sandwich to ensure
the register is properly flushed.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: read lower clock bits inside the time sandwich
ice: fix devlink reload call trace
ice: add missing ice_deinit_hw() in devlink reinit path
ice: Fix persistent failure in ice_get_rxfh
====================
Link: https://patch.msgid.link/20260120224430.410377-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prior to the blamed commit, the bridge_num range was from
0 to ds->max_num_bridges - 1. After the commit, it is from
1 to ds->max_num_bridges.
So this check:
if (bridge_num >= max)
return 0;
must be updated to:
if (bridge_num > max)
return 0;
in order to allow the last bridge_num value (==max) to be used.
This is easiest visible when a driver sets ds->max_num_bridges=1.
The observed behaviour is that even the first created bridge triggers
the netlink extack "Range of offloadable bridges exceeded" warning, and
is handled in software rather than being offloaded.
Fixes: 3f9bb0301d50 ("net: dsa: make dp->bridge_num one-based")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260120211039.3228999-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean says:
====================
Phylink link callback replay helpers for SJA1105 and XPCS
The sja1105 is reducing its direct interaction with the XPCS.
The changes presented here are an older simplification idea, broken out
of a previous patch set to allow for more thorough review.
====================
Link: https://patch.msgid.link/20260119121954.1624535-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sja1105_set_port_config()
Commit a18891b55703 ("net: dsa: sja1105: simplify static configuration
reload") split sja1105_mac_link_up() -> sja1105_adjust_port_config()
into two separate:
- sja1105_set_port_speed()
- sja1105_set_port_config()
in order to pick up the second sja1105_set_port_config() and reuse it
for the sja1105_static_config_reload() procedure which involves saving
and restoring MAC and PCS settings.
Now that these settings are restored by phylink itself, the driver no
longer needs to call its own sja1105_set_port_config(), and the
splitting is unnatural. Merge the functions back, which is to say that
the only supported internal code path is to submit the MAC Configuration
Table entry to hardware after phylink has dictated what we should set it
to.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260119121954.1624535-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sja1105_static_config_reload() changes major settings in the switch and
it requires a reset. A use case is to change things like Qdiscs (but see
sja1105_reset_reasons[] for full list) while PTP synchronization is
running, and the servo loop must not exit the locked state (s2).
Therefore, stopping and restarting the phylink instances of all ports is
not desirable, because that also stops the phylib state machine, and
retriggers a seconds-long auto-negotiation process that breaks PTP.
Thus, saving and restoring the link management settings is handled
privately by the driver.
The method got progressively more complex as SGMII support got added,
because this is handled through the xpcs phylink_pcs component, to which
we don't have unfettered access. Nonetheless, the switch reset line is
hardwired to also reset the XPCS, creating a situation where it loses
state and needs to be reprogrammed at a moment in time outside phylink's
control.
Although commits 907476c66d73 ("net: dsa: sja1105: call PCS
config/link_up via pcs_ops structure") and 41bf58314b17 ("net: dsa:
sja1105: use phylink_pcs internally") made the sja1105 <-> xpcs
interaction slightly prettier, we still depend heavily on the PCS being
"XPCS-like", because to back up its settings, we read the MII_BMCR
register, through a mdiobus_c45_read() operation, breaking all layering
separation.
With the existence of phylink link callback replay helpers, we can do
away with all this custom code and become even more PCS-agnostic, even
though the reset domain is tightly coupled.
This creates the unique opportunity to simplify away even more code than
just the xpcs handling from sja1105_static_config_reload().
The sja1105_set_port_config() method is also invoked from
sja1105_mac_link_up(). And since that is now called directly by
phylink - we can just remove it from sja1105_static_config_reload().
This makes it possible to re-merge sja1105_set_port_speed() and
sja1105_set_port_config() in a later change.
Note that my only setups with sja1105 where the xpcs is used is with the
xpcs on the CPU-facing port (fixed-link). Thus, I cannot test xpcs + PHY.
But the replay procedure walks through all ports, and I did test a
regular RGMII user port + a PHY.
ptp4l[54.552]: master offset 5 s2 freq -931 path delay 764
ptp4l[55.551]: master offset 22 s2 freq -913 path delay 764
ptp4l[56.551]: master offset 13 s2 freq -915 path delay 765
ptp4l[57.552]: master offset 5 s2 freq -919 path delay 765
ptp4l[58.553]: master offset 13 s2 freq -910 path delay 765
ptp4l[59.553]: master offset 13 s2 freq -906 path delay 765
ptp4l[60.553]: master offset 6 s2 freq -909 path delay 765
ptp4l[61.553]: master offset 6 s2 freq -907 path delay 765
ptp4l[62.553]: master offset 6 s2 freq -906 path delay 765
ptp4l[63.553]: master offset 14 s2 freq -896 path delay 765
$ ip link set br0 type bridge vlan_filtering 1
[ 63.983283] sja1105 spi2.0 sw0p0: Link is Down
[ 63.991913] sja1105 spi2.0: Link is Down
[ 64.009784] sja1105 spi2.0: Reset switch and programmed static config. Reason: VLAN filtering
[ 64.020217] sja1105 spi2.0 sw0p0: Link is Up - 1Gbps/Full - flow control off
[ 64.030683] sja1105 spi2.0: Link is Up - 1Gbps/Full - flow control off
ptp4l[64.554]: master offset 7397 s2 freq +6491 path delay 765
ptp4l[65.554]: master offset 38 s2 freq +1352 path delay 765
ptp4l[66.554]: master offset -2225 s2 freq -900 path delay 764
ptp4l[67.555]: master offset -2226 s2 freq -1569 path delay 765
ptp4l[68.555]: master offset -1553 s2 freq -1563 path delay 765
ptp4l[69.555]: master offset -865 s2 freq -1341 path delay 765
ptp4l[70.555]: master offset -401 s2 freq -1137 path delay 765
ptp4l[71.556]: master offset -145 s2 freq -1001 path delay 765
ptp4l[72.558]: master offset -26 s2 freq -926 path delay 765
ptp4l[73.557]: master offset 30 s2 freq -877 path delay 765
ptp4l[74.557]: master offset 47 s2 freq -851 path delay 765
ptp4l[75.557]: master offset 29 s2 freq -855 path delay 765
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260119121954.1624535-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some drivers of MAC + tightly integrated PCS (example: SJA1105 + XPCS
covered by same reset domain) need to perform resets at runtime.
The reset is triggered by the MAC driver, and it needs to restore its
and the PCS' registers, all invisible to phylink.
However, there is a desire to simplify the API through which the MAC and
the PCS interact, so this becomes challenging.
Phylink holds all the necessary state to help with this operation, and
can offer two helpers which walk the MAC and PCS drivers again through
the callbacks required during a destructive reset operation. The
procedure is as follows:
Before reset, MAC driver calls phylink_replay_link_begin():
- Triggers phylink mac_link_down() and pcs_link_down() methods
After reset, MAC driver calls phylink_replay_link_end():
- Triggers phylink mac_config() -> pcs_config() -> mac_link_up() ->
pcs_link_up() methods.
MAC and PCS registers are restored with no other custom driver code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260119121954.1624535-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a trivial change with no functional effect which replaces the
pattern:
if (a) {
if (b) {
do_stuff();
}
}
with:
if (a && b) {
do_stuff();
};
The purpose is to reduce the delta of a subsequent functional change.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260119121954.1624535-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean says:
====================
PHY polarity inversion via generic device tree properties
Using the "rx-polarity" and "tx-polarity" device tree properties
introduced in linux-phy and merged into net-next in
commit 96a2d53f2478 ("Merge tag 'phy_common_properties' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy")
we convert here two existing networking use cases - the EN8811H Ethernet
PHY and the Mediatek LynxI PCS.
Original cover letter:
Polarity inversion (described in patch 4/10) is a feature with at least
4 potential new users waiting for a generic description:
- Horatiu Vultur with the lan966x SerDes
- Daniel Golle with the MaxLinear GSW1xx switches
- Bjørn Mork with the AN8811HB Ethernet PHY
- Me with a custom SJA1105 board, switch which uses the DesignWare XPCS
I became interested in exploring the problem space because I was averse
to the idea of adding vendor-specific device tree properties to describe
a common need.
This set contains an implementation of a generic feature that should
cater to all known needs that were identified during my documentation
phase.
Apart from what is converted here, we also have the following, which I
did not touch:
- "st,px_rx_pol_inv" - its binding is a .txt file and I don't have time
for such a large detour to convert it to dtschema.
- "st,pcie-tx-pol-inv" and "st,sata-tx-pol-inv" - these are defined in a
.txt schema but are not implemented in any driver. My verdict would be
"delete the properties" but again, I would prefer not introducing such
dependency to this series.
====================
Link: https://patch.msgid.link/20260119091220.1493761-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prefer the new "rx-polarity" and "tx-polarity" properties, which in this
case have the advantage that polarity inversion can be specified per
direction (and per protocol, although this isn't useful here).
We use the vendor specific ones as fallback if the standard description
doesn't exist.
Daniel, referring to the Mediatek SDK, clarifies that the combined
SGMII_PN_SWAP_TX_RX register field should be split like this: bit 0 is
TX and bit 1 is RX:
https://lore.kernel.org/linux-phy/aSW--slbJWpXK0nv@makrotopia.org/
Suggested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260119091220.1493761-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Mediatek LynxI PCS is used from the MT7530 DSA driver (where it does
not have an OF presence) and from mtk_eth_soc, where it does
(Documentation/devicetree/bindings/net/pcs/mediatek,sgmiisys.yaml
informs of a combined clock provider + SGMII PCS "SGMIISYS" syscon
block).
Currently, mtk_eth_soc parses the SGMIISYS OF node for the
"mediatek,pnswap" property and sets a bit in the "flags" argument of
mtk_pcs_lynxi_create() if set.
I'd like to deprecate "mediatek,pnswap" in favour of a property which
takes the current phy-mode into consideration. But this is only known at
mtk_pcs_lynxi_config() time, and not known at mtk_pcs_lynxi_create(),
when the SGMIISYS OF node is parsed.
To achieve that, we must pass the OF node of the PCS, if it exists, to
mtk_pcs_lynxi_create(), and let the PCS take a reference on it and
handle property parsing whenever it wants.
Use the fwnode API which is more general than OF (in case we ever need
to describe the PCS using some other format). This API should be NULL
tolerant, so add no particular tests for the mt7530 case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260119091220.1493761-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reference the common PHY properties, and update the example to use them.
Note that a PCS subnode exists, and it seems a better container of the
polarity description than the SGMIISYS node that hosts "mediatek,pnswap".
So use that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260119091220.1493761-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prefer the new "rx-polarity" and "tx-polarity" properties, and use the
vendor specific ones as fallback if the standard description doesn't
exist.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260119091220.1493761-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
"airoha,pnswap-tx"
Reference the common PHY properties, and update the example to use them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260119091220.1493761-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We need to apply the tx_chan_offset to the netfilter cfg channel or the
output channel will be incorrect for asp-3.0 and newer.
Fixes: e9f31435ee7d ("net: bcmasp: Add support for asp-v3.0")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260120192339.2031648-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
gro: inline tcp6_gro_{receive,complete}
On some platforms, GRO stack is too deep and causes cpu stalls.
Decreasing call depths by one shows a 1.5 % gain on Zen2 cpus.
(32 RX queues, 100Gbit NIC, RFS enabled, tcp_rr with 128 threads and 10,000 flows)
We can go further by inlining ipv6_gro_{receive,complete}
and take care of IPv4 if there is interest.
Note: two temporary __always_inline will be replaced with
inline_for_performance when/if available.
Cumulative size increase for this series (of 3):
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.3
add/remove: 2/2 grow/shrink: 5/1 up/down: 1572/-471 (1101)
Function old new delta
ipv6_gro_receive 1069 1846 +777
ipv6_gro_complete 433 733 +300
tcp6_check_fraglist_gro - 272 +272
tcp6_gro_complete 227 306 +79
tcp4_gro_complete 325 397 +72
ipv6_offload_init 218 274 +56
__pfx_tcp6_check_fraglist_gro - 16 +16
__pfx___skb_incr_checksum_unnecessary 32 - -32
__skb_incr_checksum_unnecessary 186 - -186
tcp6_gro_receive 959 706 -253
Total: Before=22592724, After=22593825, chg +0.00%
====================
Link: https://patch.msgid.link/20260120164903.1912995-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove one function call from GRO stack for native IPv6 + TCP packets.
$ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3
add/remove: 0/0 grow/shrink: 1/1 up/down: 298/-5 (293)
Function old new delta
ipv6_gro_complete 435 733 +298
tcp6_gro_complete 311 306 -5
Total: Before=22593532, After=22593825, chg +0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260120164903.1912995-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
FDO/LTO are unable to inline tcp6_gro_receive() from ipv6_gro_receive()
Make sure tcp6_check_fraglist_gro() is only called only when needed,
so that compiler can leave it out-of-line.
$ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2
add/remove: 2/0 grow/shrink: 3/1 up/down: 1123/-253 (870)
Function old new delta
ipv6_gro_receive 1069 1846 +777
tcp6_check_fraglist_gro - 272 +272
ipv6_offload_init 218 274 +56
__pfx_tcp6_check_fraglist_gro - 16 +16
ipv6_gro_complete 433 435 +2
tcp6_gro_receive 959 706 -253
Total: Before=22592662, After=22593532, chg +0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260120164903.1912995-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
clang does not inline this helper in GRO fast path.
We can save space and cpu cycles.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/2 grow/shrink: 2/0 up/down: 156/-218 (-62)
Function old new delta
tcp6_gro_complete 227 311 +84
tcp4_gro_complete 325 397 +72
__pfx___skb_incr_checksum_unnecessary 32 - -32
__skb_incr_checksum_unnecessary 186 - -186
Total: Before=22592724, After=22592662, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260120164903.1912995-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After 3cbf4ffba5ee ("net: plumb network namespace into __skb_flow_dissect")
we have to provide a net pointer to __skb_flow_dissect(),
either via skb->dev, skb->sk, or a user provided pointer.
In the following case, syzbot was able to cook a bare skb.
WARNING: net/core/flow_dissector.c:1131 at __skb_flow_dissect+0xb57/0x68b0 net/core/flow_dissector.c:1131, CPU#1: syz.2.1418/11053
Call Trace:
<TASK>
bond_flow_dissect drivers/net/bonding/bond_main.c:4093 [inline]
__bond_xmit_hash+0x2d7/0xba0 drivers/net/bonding/bond_main.c:4157
bond_xmit_hash_xdp drivers/net/bonding/bond_main.c:4208 [inline]
bond_xdp_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5139 [inline]
bond_xdp_get_xmit_slave+0x1fd/0x710 drivers/net/bonding/bond_main.c:5515
xdp_master_redirect+0x13f/0x2c0 net/core/filter.c:4388
bpf_prog_run_xdp include/net/xdp.h:700 [inline]
bpf_test_run+0x6b2/0x7d0 net/bpf/test_run.c:421
bpf_prog_test_run_xdp+0x795/0x10e0 net/bpf/test_run.c:1390
bpf_prog_test_run+0x2c7/0x340 kernel/bpf/syscall.c:4703
__sys_bpf+0x562/0x860 kernel/bpf/syscall.c:6182
__do_sys_bpf kernel/bpf/syscall.c:6274 [inline]
__se_sys_bpf kernel/bpf/syscall.c:6272 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6272
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
Fixes: 58deb77cc52d ("bonding: balance ICMP echoes in layer3+4 mode")
Reported-by: syzbot+c46409299c70a221415e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/696faa23.050a0220.4cb9c.001f.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matteo Croce <mcroce@redhat.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260120161744.1893263-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both send_mcast4() and send_mcast6() use sleep 2 to wait for the tunnel
connection between the gateway and the relay, and for the listener
socket to be created in the LISTENER namespace.
However, tests sometimes fail because packets are sent before the
connection is fully established.
Increase the waiting time to make the tests more reliable, and use
wait_local_port_listen() to explicitly wait for the listener socket.
Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20260120133930.863845-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can change tcp_rsk() and inet_rsk() to propagate their argument
const qualifier thanks to container_of_const().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260120125353.1470456-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the parameter pmac_id_valid argument of be_cmd_get_mac_from_list() is
set to false, the driver may request the PMAC_ID from the firmware of the
network card, and this function will store that PMAC_ID at the provided
address pmac_id. This is the contract of this function.
However, there is a location within the driver where both
pmac_id_valid == false and pmac_id == NULL are being passed. This could
result in dereferencing a NULL pointer.
To resolve this issue, it is necessary to pass the address of a stub
variable to the function.
Fixes: 95046b927a54 ("be2net: refactor MAC-addr setup code")
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Link: https://patch.msgid.link/20260120113734.20193-1-a.vatoropin@crpt.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|