linux-stable.git/drivers/infiniband, branch v6.18

mlx5: Fix default values in create CQ

2025-11-11T14:12:18+00:00

Currently, CQs without a completion function are assigned the
mlx5_add_cq_to_tasklet function by default. This is problematic since
only user CQs created through the mlx5_ib driver are intended to use
this function.

Additionally, all CQs that will use doorbells instead of polling for
completions must call mlx5_cq_arm. However, the default CQ creation flow
leaves a valid value in the CQ's arm_db field, allowing FW to send
interrupts to polling-only CQs in certain corner cases.

These two factors would allow a polling-only kernel CQ to be triggered
by an EQ interrupt and call a completion function intended only for user
CQs, causing a null pointer exception.

Some areas in the driver have prevented this issue with one-off fixes
but did not address the root cause.

This patch fixes the described issue by adding defaults to the create CQ
flow. It adds a default dummy completion function to protect against
null pointer exceptions, and it sets an invalid command sequence number
by default in kernel CQs to prevent the FW from sending an interrupt to
the CQ until it is armed. User CQs are responsible for their own
initialization values.

Callers of mlx5_core_create_cq are responsible for changing the
completion function and arming the CQ per their needs.

Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Akiva Goldberger 
Reviewed-by: Moshe Shemesh 
Signed-off-by: Tariq Toukan 
Acked-by: Leon Romanovsky 
Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni

RDMA/irdma: Fix vf_id size to u16 to avoid overflow

2025-11-02T11:46:01+00:00

Correctly size the vf_id to u16 to avoid overflow.

Signed-off-by: Jay Bhat 
Signed-off-by: Tatyana Nikolova 
Link: https://patch.msgid.link/20251031021726.1003-6-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky

RDMA/hns: Remove an extra blank line

2025-10-27T09:44:00+00:00

Remove an extra blank line.

Signed-off-by: Guofeng Yue 
Signed-off-by: Junxian Huang 
Link: https://patch.msgid.link/20251016114051.1963197-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky

RDMA/hns: Fix wrong WQE data when QP wraps around

2025-10-27T09:44:00+00:00

When QP wraps around, WQE data from the previous use at the same
position still remains as driver does not clear it. The WQE field
layout differs across different opcodes, causing that the fields
that are not explicitly assigned for the current opcode retain
stale values, and are issued to HW by mistake. Such fields are as
follows:

* MSG_START_SGE_IDX field in ATOMIC WQE
* BLOCK_SIZE and ZBVA fields in FRMR WQE
* DirectWQE fields when DirectWQE not used

For ATOMIC WQE, always set the latest sge index in MSG_START_SGE_IDX
as required by HW.

For FRMR WQE and DirectWQE, clear only those unassigned fields
instead of the entire WQE to avoid performance penalty.

Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Junxian Huang 
Link: https://patch.msgid.link/20251016114051.1963197-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky

RDMA/hns: Fix the modification of max_send_sge

2025-10-27T09:44:00+00:00

The actual sge number may exceed the value specified in init_attr->cap
when HW needs extra sge to enable inline feature. Since these extra
sges are not expected by ULP, return the user-specified value to ULP
instead of the expanded sge number.

Fixes: 0c5e259b06a8 ("RDMA/hns: Fix incorrect sge nums calculation")
Signed-off-by: wenglianfa 
Signed-off-by: Junxian Huang 
Link: https://patch.msgid.link/20251016114051.1963197-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky

RDMA/hns: Fix recv CQ and QP cache affinity

2025-10-27T09:44:00+00:00

Currently driver enforces affinity between QP cache and send CQ
cache, which helps improve the performance of sending, but doesn't
set affinity with recv CQ cache, resulting in suboptimal performance
of receiving.

Use one CQ bank per context to ensure the affinity among QP, send CQ
and recv CQ. For kernel ULP, CQ bank is fixed to 0.

Fixes: 9e03dbea2b06 ("RDMA/hns: Fix CQ and QP cache affinity")
Signed-off-by: Chengchang Tang 
Signed-off-by: Junxian Huang 
Link: https://patch.msgid.link/20251016114051.1963197-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky

RDMA/uverbs: Fix umem release in UVERBS_METHOD_CQ_CREATE

2025-10-19T11:31:25+00:00

In `UVERBS_METHOD_CQ_CREATE`, umem should be released if anything goes
wrong. Currently, if `create_cq_umem` fails, umem would not be
released or referenced, causing a possible leak.

In this patch, we release umem at `UVERBS_METHOD_CQ_CREATE`, the driver
should not release umem if it returns an error code.

Fixes: 1a40c362ae26 ("RDMA/uverbs: Add a common way to create CQ with umem")
Signed-off-by: Shuhao Fu 
Link: https://patch.msgid.link/aOh1le4YqtYwj-hH@osx.local
Signed-off-by: Leon Romanovsky

RDMA/irdma: Set irdma_cq cq_num field during CQ create

2025-10-19T11:02:11+00:00

The driver maintains a CQ table that is used to ensure that a CQ is
still valid when processing CQ related AEs. When a CQ is destroyed,
the table entry is cleared, using irdma_cq.cq_num as the index. This
field was never being set, so it was just always clearing out entry
0.

Additionally, the cq_num field size was increased to accommodate HW
supporting more than 64K CQs.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jacob Moroni 
Link: https://patch.msgid.link/20250923142439.943930-1-jmoroni@google.com
Acked-by: Tatyana Nikolova 
Signed-off-by: Leon Romanovsky

RDMA/irdma: Fix SD index calculation

2025-10-19T11:01:28+00:00

In some cases, it is possible for pble_rsrc->next_fpm_addr to be
larger than u32, so remove the u32 cast to avoid unintentional
truncation.

This fixes the following error that can be observed when registering
massive memory regions:

[  447.227494] (NULL ib_device): cqp opcode = 0x1f maj_err_code = 0xffff min_err_code = 0x800c
[  447.227505] (NULL ib_device): [Update PE SDs Cmd Error][op_code=21] status=-5 waiting=1 completion_err=1 maj=0xffff min=0x800c

Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager")
Signed-off-by: Jacob Moroni 
Link: https://patch.msgid.link/20250923190850.1022773-1-jmoroni@google.com
Acked-by: Tatyana Nikolova 
Signed-off-by: Leon Romanovsky

RDMA/bnxt_re: Fix a potential memory leak in destroy_gsi_sqp

2025-10-19T10:47:55+00:00

The current error handling path in bnxt_re_destroy_gsi_sqp() could lead
to a resource leak. When bnxt_qplib_destroy_qp() fails, the function
jumps to the 'fail' label and returns immediately, skipping the call
to bnxt_qplib_free_qp_res().

Continue the resource teardown even if bnxt_qplib_destroy_qp() fails,
which aligns with the driver's general error handling strategy and
prevents the potential leak.

Fixes: 8dae419f9ec73 ("RDMA/bnxt_re: Refactor queue pair creation code")
Signed-off-by: YanLong Dai 
Link: https://patch.msgid.link/20250924061444.11288-1-daiyanlong@kylinos.cn
Signed-off-by: Leon Romanovsky