linux.git/include/linux/mlx5, branch v6.19

net/mlx5: Move SF dev table notifier registration outside the PF devlink lock

2025-11-20T04:32:28+00:00

This completes the previous patches by moving notifier registration for
SF dev tables outside the devlink locked critical section in
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.

This is only done for non-SFs, since SFs do not have a SF HW table
themselves.

After this patch, notifiers can grab the PF devlink lock (soon to be
necessary) without creating a locking cycle.

Signed-off-by: Cosmin Ratiu 
Reviewed-by: Carolina Jubran 
Signed-off-by: Tariq Toukan 
Link: https://patch.msgid.link/1763325940-1231508-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski

net/mlx5: Move the SF table notifiers outside the devlink lock

2025-11-20T04:32:28+00:00

Move the SF table notifiers registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.

This is only done for non-SFs, since SFs do not have a SF table
themselves and thus don't need notifiers.

Signed-off-by: Cosmin Ratiu 
Reviewed-by: Carolina Jubran 
Signed-off-by: Tariq Toukan 
Link: https://patch.msgid.link/1763325940-1231508-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski

net/mlx5: Move the SF HW table notifier outside the devlink lock

2025-11-20T04:32:27+00:00

Move the SF HW table notifier registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.

This is only done for non-SFs, since SFs do not have a SF HW table
themselves.

Signed-off-by: Cosmin Ratiu 
Reviewed-by: Carolina Jubran 
Signed-off-by: Tariq Toukan 
Link: https://patch.msgid.link/1763325940-1231508-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski

net/mlx5: Move the vhca event notifier outside of the devlink lock

2025-11-20T04:32:27+00:00

The vhca event notifier consists of an atomic notifier for vhca state
changes (used for SF events), multiple workqueues and a blocking
notifier chain for delivering the vhca state change events for further
processing.

This patch moves the vhca notifier head outside of mlx5_init_one() /
mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit()
functions.

This allows called notifiers to grab the PF devlink lock which was
previously impossible because it would create a circular lock
dependency.

mlx5_vhca_event_stop() is now called earlier in the cleanup phase and
flushes the workqueues to ensure that after the call, there are no
pending events. This simplifies the cleanup flow for vhca event
consumers.

Signed-off-by: Cosmin Ratiu 
Reviewed-by: Carolina Jubran 
Signed-off-by: Tariq Toukan 
Link: https://patch.msgid.link/1763325940-1231508-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski

net/mlx5: Move the esw mode notifier chain outside the devlink lock

2025-11-20T04:32:27+00:00

The esw mode change notifier chain is initialized/cleaned up in
mlx5_init_one() / mlx5_uninit_one() with the devlink lock held.

Move the notifier head from the eswitch struct into mlx5_priv directly,
and initialize it outside the critical section. This will allow notifier
registration to happen earlier in the init procedure in subsequent
patches.

Signed-off-by: Cosmin Ratiu 
Reviewed-by: Carolina Jubran 
Signed-off-by: Tariq Toukan 
Link: https://patch.msgid.link/1763325940-1231508-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski

net/mlx5: Abort new commands if all command slots are stalled

2025-11-19T02:53:34+00:00

In case of a FW issue, FW might be not responding to FW commands,
causing kernel lockout for a long period of time, e.g. rtnl_lock held
while ethtool is trying to collect stats waiting for FW to respond to
multiple commands, when all of them will timeout.

While there's no immediate indication of the FW lockout, we can safely
assume that something is wrong when all command slots are busy and in
a timeout state and no FW completion was received on any of them.

In such case, start immediately failing new commands.

Signed-off-by: Saeed Mahameed 
Reviewed-by: Moshe Shemesh 
Signed-off-by: Tariq Toukan 
Link: https://patch.msgid.link/1763415729-1238421-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

2025-11-15T02:55:38+00:00

Tariq Toukan says:

====================
mlx5-next updates 2025-11-13

The following pull-request contains common mlx5 updates

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Expose definition for 1600Gbps link mode
  net/mlx5: fs, set non default device per namespace
  net/mlx5: fs, Add other_eswitch support for steering tables
  net/mlx5: Add OTHER_ESWITCH HW capabilities
  net/mlx5: Add direct ST mode support for RDMA
  PCI/TPH: Expose pcie_tph_get_st_table_loc()
  {rdma,net}/mlx5: Query vports mac address from device
====================

Link: https://patch.msgid.link/1763027252-1168760-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2025-11-13T20:35:38+00:00

Cross-merge networking fixes after downstream PR (net-6.18-rc6).

No conflicts, adjacent changes in:

drivers/net/phy/micrel.c
  96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface")
  61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814")

and a trivial one in tools/testing/selftests/drivers/net/Makefile.

Signed-off-by: Jakub Kicinski

net/mlx5: Expose definition for 1600Gbps link mode

2025-11-12T08:35:14+00:00

This patch exposes new link mode for 1600Gbps, utilizing 8 lanes at
200Gbps per lane.

Co-developed-by: Yael Chemla 
Reviewed-by: Shahar Shitrit 
Signed-off-by: Tariq Toukan 
Link: https://patch.msgid.link/1762863888-1092798-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky

mlx5: Fix default values in create CQ

2025-11-11T14:12:18+00:00

Currently, CQs without a completion function are assigned the
mlx5_add_cq_to_tasklet function by default. This is problematic since
only user CQs created through the mlx5_ib driver are intended to use
this function.

Additionally, all CQs that will use doorbells instead of polling for
completions must call mlx5_cq_arm. However, the default CQ creation flow
leaves a valid value in the CQ's arm_db field, allowing FW to send
interrupts to polling-only CQs in certain corner cases.

These two factors would allow a polling-only kernel CQ to be triggered
by an EQ interrupt and call a completion function intended only for user
CQs, causing a null pointer exception.

Some areas in the driver have prevented this issue with one-off fixes
but did not address the root cause.

This patch fixes the described issue by adding defaults to the create CQ
flow. It adds a default dummy completion function to protect against
null pointer exceptions, and it sets an invalid command sequence number
by default in kernel CQs to prevent the FW from sending an interrupt to
the CQ until it is armed. User CQs are responsible for their own
initialization values.

Callers of mlx5_core_create_cq are responsible for changing the
completion function and arming the CQ per their needs.

Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Akiva Goldberger 
Reviewed-by: Moshe Shemesh 
Signed-off-by: Tariq Toukan 
Acked-by: Leon Romanovsky 
Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni