<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/infiniband, branch v6.18</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mlx5: Fix default values in create CQ</title>
<updated>2025-11-11T14:12:18+00:00</updated>
<author>
<name>Akiva Goldberger</name>
<email>agoldberger@nvidia.com</email>
</author>
<published>2025-11-09T09:49:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e5eba42f01340f73888dfe560be2806057c25913'/>
<id>e5eba42f01340f73888dfe560be2806057c25913</id>
<content type='text'>
Currently, CQs without a completion function are assigned the
mlx5_add_cq_to_tasklet function by default. This is problematic since
only user CQs created through the mlx5_ib driver are intended to use
this function.

Additionally, all CQs that will use doorbells instead of polling for
completions must call mlx5_cq_arm. However, the default CQ creation flow
leaves a valid value in the CQ's arm_db field, allowing FW to send
interrupts to polling-only CQs in certain corner cases.

These two factors would allow a polling-only kernel CQ to be triggered
by an EQ interrupt and call a completion function intended only for user
CQs, causing a null pointer exception.

Some areas in the driver have prevented this issue with one-off fixes
but did not address the root cause.

This patch fixes the described issue by adding defaults to the create CQ
flow. It adds a default dummy completion function to protect against
null pointer exceptions, and it sets an invalid command sequence number
by default in kernel CQs to prevent the FW from sending an interrupt to
the CQ until it is armed. User CQs are responsible for their own
initialization values.

Callers of mlx5_core_create_cq are responsible for changing the
completion function and arming the CQ per their needs.

Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Akiva Goldberger &lt;agoldberger@nvidia.com&gt;
Reviewed-by: Moshe Shemesh &lt;moshe@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Acked-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, CQs without a completion function are assigned the
mlx5_add_cq_to_tasklet function by default. This is problematic since
only user CQs created through the mlx5_ib driver are intended to use
this function.

Additionally, all CQs that will use doorbells instead of polling for
completions must call mlx5_cq_arm. However, the default CQ creation flow
leaves a valid value in the CQ's arm_db field, allowing FW to send
interrupts to polling-only CQs in certain corner cases.

These two factors would allow a polling-only kernel CQ to be triggered
by an EQ interrupt and call a completion function intended only for user
CQs, causing a null pointer exception.

Some areas in the driver have prevented this issue with one-off fixes
but did not address the root cause.

This patch fixes the described issue by adding defaults to the create CQ
flow. It adds a default dummy completion function to protect against
null pointer exceptions, and it sets an invalid command sequence number
by default in kernel CQs to prevent the FW from sending an interrupt to
the CQ until it is armed. User CQs are responsible for their own
initialization values.

Callers of mlx5_core_create_cq are responsible for changing the
completion function and arming the CQ per their needs.

Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Akiva Goldberger &lt;agoldberger@nvidia.com&gt;
Reviewed-by: Moshe Shemesh &lt;moshe@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Acked-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/irdma: Fix vf_id size to u16 to avoid overflow</title>
<updated>2025-11-02T11:46:01+00:00</updated>
<author>
<name>Jay Bhat</name>
<email>jay.bhat@intel.com</email>
</author>
<published>2025-10-31T02:17:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=320258783765316d2baae99c26e461ee634054fe'/>
<id>320258783765316d2baae99c26e461ee634054fe</id>
<content type='text'>
Correctly size the vf_id to u16 to avoid overflow.

Signed-off-by: Jay Bhat &lt;jay.bhat@intel.com&gt;
Signed-off-by: Tatyana Nikolova &lt;tatyana.e.nikolova@intel.com&gt;
Link: https://patch.msgid.link/20251031021726.1003-6-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Correctly size the vf_id to u16 to avoid overflow.

Signed-off-by: Jay Bhat &lt;jay.bhat@intel.com&gt;
Signed-off-by: Tatyana Nikolova &lt;tatyana.e.nikolova@intel.com&gt;
Link: https://patch.msgid.link/20251031021726.1003-6-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/hns: Remove an extra blank line</title>
<updated>2025-10-27T09:44:00+00:00</updated>
<author>
<name>Guofeng Yue</name>
<email>yueguofeng@h-partners.com</email>
</author>
<published>2025-10-16T11:40:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b8c9aab4c738e5e9814915768ac6c184fe36ab93'/>
<id>b8c9aab4c738e5e9814915768ac6c184fe36ab93</id>
<content type='text'>
Remove an extra blank line.

Signed-off-by: Guofeng Yue &lt;yueguofeng@h-partners.com&gt;
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove an extra blank line.

Signed-off-by: Guofeng Yue &lt;yueguofeng@h-partners.com&gt;
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/hns: Fix wrong WQE data when QP wraps around</title>
<updated>2025-10-27T09:44:00+00:00</updated>
<author>
<name>Junxian Huang</name>
<email>huangjunxian6@hisilicon.com</email>
</author>
<published>2025-10-16T11:40:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fe9622011f955e35ba84d3af7b2f2fed31cf8ca1'/>
<id>fe9622011f955e35ba84d3af7b2f2fed31cf8ca1</id>
<content type='text'>
When QP wraps around, WQE data from the previous use at the same
position still remains as driver does not clear it. The WQE field
layout differs across different opcodes, causing that the fields
that are not explicitly assigned for the current opcode retain
stale values, and are issued to HW by mistake. Such fields are as
follows:

* MSG_START_SGE_IDX field in ATOMIC WQE
* BLOCK_SIZE and ZBVA fields in FRMR WQE
* DirectWQE fields when DirectWQE not used

For ATOMIC WQE, always set the latest sge index in MSG_START_SGE_IDX
as required by HW.

For FRMR WQE and DirectWQE, clear only those unassigned fields
instead of the entire WQE to avoid performance penalty.

Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When QP wraps around, WQE data from the previous use at the same
position still remains as driver does not clear it. The WQE field
layout differs across different opcodes, causing that the fields
that are not explicitly assigned for the current opcode retain
stale values, and are issued to HW by mistake. Such fields are as
follows:

* MSG_START_SGE_IDX field in ATOMIC WQE
* BLOCK_SIZE and ZBVA fields in FRMR WQE
* DirectWQE fields when DirectWQE not used

For ATOMIC WQE, always set the latest sge index in MSG_START_SGE_IDX
as required by HW.

For FRMR WQE and DirectWQE, clear only those unassigned fields
instead of the entire WQE to avoid performance penalty.

Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/hns: Fix the modification of max_send_sge</title>
<updated>2025-10-27T09:44:00+00:00</updated>
<author>
<name>wenglianfa</name>
<email>wenglianfa@huawei.com</email>
</author>
<published>2025-10-16T11:40:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f5a7cbea5411668d429eb4ffe96c4063fe8dac9e'/>
<id>f5a7cbea5411668d429eb4ffe96c4063fe8dac9e</id>
<content type='text'>
The actual sge number may exceed the value specified in init_attr-&gt;cap
when HW needs extra sge to enable inline feature. Since these extra
sges are not expected by ULP, return the user-specified value to ULP
instead of the expanded sge number.

Fixes: 0c5e259b06a8 ("RDMA/hns: Fix incorrect sge nums calculation")
Signed-off-by: wenglianfa &lt;wenglianfa@huawei.com&gt;
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The actual sge number may exceed the value specified in init_attr-&gt;cap
when HW needs extra sge to enable inline feature. Since these extra
sges are not expected by ULP, return the user-specified value to ULP
instead of the expanded sge number.

Fixes: 0c5e259b06a8 ("RDMA/hns: Fix incorrect sge nums calculation")
Signed-off-by: wenglianfa &lt;wenglianfa@huawei.com&gt;
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/hns: Fix recv CQ and QP cache affinity</title>
<updated>2025-10-27T09:44:00+00:00</updated>
<author>
<name>Chengchang Tang</name>
<email>tangchengchang@huawei.com</email>
</author>
<published>2025-10-16T11:40:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c4b67b514af8c2d73c64b36e0cd99e9b26b9ac82'/>
<id>c4b67b514af8c2d73c64b36e0cd99e9b26b9ac82</id>
<content type='text'>
Currently driver enforces affinity between QP cache and send CQ
cache, which helps improve the performance of sending, but doesn't
set affinity with recv CQ cache, resulting in suboptimal performance
of receiving.

Use one CQ bank per context to ensure the affinity among QP, send CQ
and recv CQ. For kernel ULP, CQ bank is fixed to 0.

Fixes: 9e03dbea2b06 ("RDMA/hns: Fix CQ and QP cache affinity")
Signed-off-by: Chengchang Tang &lt;tangchengchang@huawei.com&gt;
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently driver enforces affinity between QP cache and send CQ
cache, which helps improve the performance of sending, but doesn't
set affinity with recv CQ cache, resulting in suboptimal performance
of receiving.

Use one CQ bank per context to ensure the affinity among QP, send CQ
and recv CQ. For kernel ULP, CQ bank is fixed to 0.

Fixes: 9e03dbea2b06 ("RDMA/hns: Fix CQ and QP cache affinity")
Signed-off-by: Chengchang Tang &lt;tangchengchang@huawei.com&gt;
Signed-off-by: Junxian Huang &lt;huangjunxian6@hisilicon.com&gt;
Link: https://patch.msgid.link/20251016114051.1963197-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/uverbs: Fix umem release in UVERBS_METHOD_CQ_CREATE</title>
<updated>2025-10-19T11:31:25+00:00</updated>
<author>
<name>Shuhao Fu</name>
<email>sfual@cse.ust.hk</email>
</author>
<published>2025-10-10T02:55:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d8713158faad0fd4418cb2f4e432c3876ad53a1f'/>
<id>d8713158faad0fd4418cb2f4e432c3876ad53a1f</id>
<content type='text'>
In `UVERBS_METHOD_CQ_CREATE`, umem should be released if anything goes
wrong. Currently, if `create_cq_umem` fails, umem would not be
released or referenced, causing a possible leak.

In this patch, we release umem at `UVERBS_METHOD_CQ_CREATE`, the driver
should not release umem if it returns an error code.

Fixes: 1a40c362ae26 ("RDMA/uverbs: Add a common way to create CQ with umem")
Signed-off-by: Shuhao Fu &lt;sfual@cse.ust.hk&gt;
Link: https://patch.msgid.link/aOh1le4YqtYwj-hH@osx.local
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In `UVERBS_METHOD_CQ_CREATE`, umem should be released if anything goes
wrong. Currently, if `create_cq_umem` fails, umem would not be
released or referenced, causing a possible leak.

In this patch, we release umem at `UVERBS_METHOD_CQ_CREATE`, the driver
should not release umem if it returns an error code.

Fixes: 1a40c362ae26 ("RDMA/uverbs: Add a common way to create CQ with umem")
Signed-off-by: Shuhao Fu &lt;sfual@cse.ust.hk&gt;
Link: https://patch.msgid.link/aOh1le4YqtYwj-hH@osx.local
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/irdma: Set irdma_cq cq_num field during CQ create</title>
<updated>2025-10-19T11:02:11+00:00</updated>
<author>
<name>Jacob Moroni</name>
<email>jmoroni@google.com</email>
</author>
<published>2025-09-23T14:24:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5575b7646b94c0afb0f4c0d86e00e13cf3397a62'/>
<id>5575b7646b94c0afb0f4c0d86e00e13cf3397a62</id>
<content type='text'>
The driver maintains a CQ table that is used to ensure that a CQ is
still valid when processing CQ related AEs. When a CQ is destroyed,
the table entry is cleared, using irdma_cq.cq_num as the index. This
field was never being set, so it was just always clearing out entry
0.

Additionally, the cq_num field size was increased to accommodate HW
supporting more than 64K CQs.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jacob Moroni &lt;jmoroni@google.com&gt;
Link: https://patch.msgid.link/20250923142439.943930-1-jmoroni@google.com
Acked-by: Tatyana Nikolova &lt;tatyana.e.nikolova@intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The driver maintains a CQ table that is used to ensure that a CQ is
still valid when processing CQ related AEs. When a CQ is destroyed,
the table entry is cleared, using irdma_cq.cq_num as the index. This
field was never being set, so it was just always clearing out entry
0.

Additionally, the cq_num field size was increased to accommodate HW
supporting more than 64K CQs.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jacob Moroni &lt;jmoroni@google.com&gt;
Link: https://patch.msgid.link/20250923142439.943930-1-jmoroni@google.com
Acked-by: Tatyana Nikolova &lt;tatyana.e.nikolova@intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/irdma: Fix SD index calculation</title>
<updated>2025-10-19T11:01:28+00:00</updated>
<author>
<name>Jacob Moroni</name>
<email>jmoroni@google.com</email>
</author>
<published>2025-09-23T19:08:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8d158f47f1f33d8747e80c3afbea5aa337e59d41'/>
<id>8d158f47f1f33d8747e80c3afbea5aa337e59d41</id>
<content type='text'>
In some cases, it is possible for pble_rsrc-&gt;next_fpm_addr to be
larger than u32, so remove the u32 cast to avoid unintentional
truncation.

This fixes the following error that can be observed when registering
massive memory regions:

[  447.227494] (NULL ib_device): cqp opcode = 0x1f maj_err_code = 0xffff min_err_code = 0x800c
[  447.227505] (NULL ib_device): [Update PE SDs Cmd Error][op_code=21] status=-5 waiting=1 completion_err=1 maj=0xffff min=0x800c

Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager")
Signed-off-by: Jacob Moroni &lt;jmoroni@google.com&gt;
Link: https://patch.msgid.link/20250923190850.1022773-1-jmoroni@google.com
Acked-by: Tatyana Nikolova &lt;tatyana.e.nikolova@intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In some cases, it is possible for pble_rsrc-&gt;next_fpm_addr to be
larger than u32, so remove the u32 cast to avoid unintentional
truncation.

This fixes the following error that can be observed when registering
massive memory regions:

[  447.227494] (NULL ib_device): cqp opcode = 0x1f maj_err_code = 0xffff min_err_code = 0x800c
[  447.227505] (NULL ib_device): [Update PE SDs Cmd Error][op_code=21] status=-5 waiting=1 completion_err=1 maj=0xffff min=0x800c

Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager")
Signed-off-by: Jacob Moroni &lt;jmoroni@google.com&gt;
Link: https://patch.msgid.link/20250923190850.1022773-1-jmoroni@google.com
Acked-by: Tatyana Nikolova &lt;tatyana.e.nikolova@intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/bnxt_re: Fix a potential memory leak in destroy_gsi_sqp</title>
<updated>2025-10-19T10:47:55+00:00</updated>
<author>
<name>YanLong Dai</name>
<email>daiyanlong@kylinos.cn</email>
</author>
<published>2025-09-24T06:14:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=88de89f184661ebb946804a5abdf2bdec7f0a7ab'/>
<id>88de89f184661ebb946804a5abdf2bdec7f0a7ab</id>
<content type='text'>
The current error handling path in bnxt_re_destroy_gsi_sqp() could lead
to a resource leak. When bnxt_qplib_destroy_qp() fails, the function
jumps to the 'fail' label and returns immediately, skipping the call
to bnxt_qplib_free_qp_res().

Continue the resource teardown even if bnxt_qplib_destroy_qp() fails,
which aligns with the driver's general error handling strategy and
prevents the potential leak.

Fixes: 8dae419f9ec73 ("RDMA/bnxt_re: Refactor queue pair creation code")
Signed-off-by: YanLong Dai &lt;daiyanlong@kylinos.cn&gt;
Link: https://patch.msgid.link/20250924061444.11288-1-daiyanlong@kylinos.cn
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current error handling path in bnxt_re_destroy_gsi_sqp() could lead
to a resource leak. When bnxt_qplib_destroy_qp() fails, the function
jumps to the 'fail' label and returns immediately, skipping the call
to bnxt_qplib_free_qp_res().

Continue the resource teardown even if bnxt_qplib_destroy_qp() fails,
which aligns with the driver's general error handling strategy and
prevents the potential leak.

Fixes: 8dae419f9ec73 ("RDMA/bnxt_re: Refactor queue pair creation code")
Signed-off-by: YanLong Dai &lt;daiyanlong@kylinos.cn&gt;
Link: https://patch.msgid.link/20250924061444.11288-1-daiyanlong@kylinos.cn
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
