summaryrefslogtreecommitdiff
path: root/include/linux/platform_data
diff options
context:
space:
mode:
authorJakub Kicinski <kuba@kernel.org>2026-05-06 19:03:39 -0700
committerJakub Kicinski <kuba@kernel.org>2026-05-06 19:03:39 -0700
commitb6de642b9c24f5f857732e5d696fe47e43b1729a (patch)
treef6e7c2022614010f32b33d6d6ae0c0025ee3a617 /include/linux/platform_data
parentc978803c5d4dc0501efcdb32df055b35bd81ba27 (diff)
parent25933aca10121a436ac7540be2650b35d9909d3f (diff)
Merge branch 'net-mlx5-improve-representor-lifecycle-and-late-ib-representor-loading'
Tariq Toukan says: ==================== net/mlx5: Improve representor lifecycle and late IB representor loading This series addresses two problems that have been present for years, and fixes one representor reload error-unwind case exposed while making the reload path reusable. First, there is no coordination between E-Switch reconfiguration and representor registration. The E-Switch can be mid-way through a mode change or VF count update while mlx5_ib walks in and registers or unregisters representors. Nothing stops them. The race window is small and there is no field report, but it is clearly wrong. Second, loading mlx5_ib while the device is already in switchdev mode does not bring up the IB representors. mlx5_eswitch_register_vport_reps() only stores callbacks; nobody triggers the actual load after registration. The series fixes the registration race with a per-E-Switch representor mutex. The lock is introduced first, then LAG shared-FDB and multiport E-Switch transitions are adjusted so auxiliary device rescans and IB representor reloads do not hold ldev->lock while taking the representor lock. This keeps the intermediate commits bisectable before the stricter E-Switch serialization and lock assertions are enabled. After the LAG ordering is fixed, all E-Switch reconfiguration paths that create, destroy, load, or unload representors take the representor mutex. esw_mode_change() deliberately drops the mutex around mlx5_rescan_drivers_locked(), because auxiliary probe and remove paths re-enter mlx5_eswitch_register_vport_reps() and mlx5_eswitch_unregister_vport_reps() on the same thread. The shared-FDB peer IB registration path can hold one E-Switch representor mutex and then register peer representor ops on another E-Switch. The series annotates that case as nested locking so lockdep can distinguish it from recursive locking on the same E-Switch. For the missing IB representors, mlx5_eswitch_register_vport_reps() queues a work item that acquires the devlink lock and loads all relevant representors. This is the change that actually fixes the long-standing bug. The reload path also learns to track which representor types were loaded by the current attempt, so an error does not unload representors that were already active before the retry. Patch 1 is cleanup. LAG and MPESW had the same representor reload sequence duplicated in several places and the copies had started to drift. This consolidates them into one helper. Patch 2 lets E-Switch workqueue callers choose GFP allocation flags. Patch 3 adds the per-E-Switch representor lifecycle lock and helper APIs. Patch 4 adjusts the LAG shared-FDB and multiport E-Switch transitions so auxiliary device rescans and IB representor reloads run without ldev->lock held while taking the representor lock. Patch 5 protects the E-Switch reconfiguration, representor registration and peer IB representor paths with the representor lock. Patch 6 fixes representor load error unwind so only representor types loaded by the current attempt are unloaded on failure. Patch 7 moves the representor load triggered by mlx5_eswitch_register_vport_reps() onto the work queue. This is the patch that fixes IB representors not coming up when mlx5_ib is loaded while the device is already in switchdev mode. ==================== Link: https://patch.msgid.link/20260503202726.266415-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'include/linux/platform_data')
0 files changed, 0 insertions, 0 deletions