linux-stable.git/drivers/infiniband, branch v6.6.78

RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]"

2025-02-08T08:52:22+00:00

[ Upstream commit edc4ef0e0154096d6c0cf5e06af6fc330dbad9d1 ]

The Call Trace is as below:
"
  
  ? show_regs.cold+0x1a/0x1f
  ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
  ? __warn+0x84/0xd0
  ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
  ? report_bug+0x105/0x180
  ? handle_bug+0x46/0x80
  ? exc_invalid_op+0x19/0x70
  ? asm_exc_invalid_op+0x1b/0x20
  ? __rxe_cleanup+0x12c/0x170 [rdma_rxe]
  ? __rxe_cleanup+0x124/0x170 [rdma_rxe]
  rxe_destroy_qp.cold+0x24/0x29 [rdma_rxe]
  ib_destroy_qp_user+0x118/0x190 [ib_core]
  rdma_destroy_qp.cold+0x43/0x5e [rdma_cm]
  rtrs_cq_qp_destroy.cold+0x1d/0x2b [rtrs_core]
  rtrs_srv_close_work.cold+0x1b/0x31 [rtrs_server]
  process_one_work+0x21d/0x3f0
  worker_thread+0x4a/0x3c0
  ? process_one_work+0x3f0/0x3f0
  kthread+0xf0/0x120
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  
"
When too many rdma resources are allocated, rxe needs more time to
handle these rdma resources. Sometimes with the current timeout, rxe
can not release the rdma resources correctly.

Compared with other rdma drivers, a bigger timeout is used.

Fixes: 215d0a755e1b ("RDMA/rxe: Stop lookup of partially built objects")
Signed-off-by: Zhu Yanjun 
Link: https://patch.msgid.link/20250110160927.55014-1-yanjun.zhu@linux.dev
Tested-by: Joe Klein 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Sasha Levin

RDMA/mlx5: Fix indirect mkey ODP page count

2025-02-08T08:52:22+00:00

[ Upstream commit 235f238402194a78ac5fb882a46717eac817e5d1 ]

Restrict the check for the number of pages handled during an ODP page
fault to direct mkeys.
Perform the check right after handling the page fault and don't
propagate the number of handled pages to callers.

Indirect mkeys and their associated direct mkeys can have different
start addresses. As a result, the calculation of the number of pages to
handle for an indirect mkey may not match the actual page fault
handling done on the direct mkey.

For example:
A 4K sized page fault on a KSM mkey that has a start address that is not
aligned to a page will result a calculation that assumes the number of
pages required to handle are 2.
While the underlying MTT might be aligned will require fetching only a
single page.
Thus, do the calculation and compare number of pages handled only per
direct mkey.

Fixes: db570d7deafb ("IB/mlx5: Add ODP support to MW")
Signed-off-by: Michael Guralnik 
Reviewed-by: Artemy Kovalyov 
Link: https://patch.msgid.link/86c483d9e75ce8fe14e9ff85b62df72b779f8ab1.1736187990.git.leon@kernel.org
Signed-off-by: Leon Romanovsky 
Signed-off-by: Sasha Levin

RDMA/bnxt_re: Fix to drop reference to the mmap entry in case of error

2025-02-08T08:52:19+00:00

[ Upstream commit c84f0f4f49d81645f49c3269fdcc3b84ce61e795 ]

In the error handling path of bnxt_re_mmap(), driver should invoke
rdma_user_mmap_entry_put() to free the reference of mmap entry in case
the error happens after rdma_user_mmap_entry_get was called.

Fixes: ea2224857882 ("RDMA/bnxt_re: Update alloc_page uapi for pacing")
Reviewed-by: Saravanan Vajravel 
Reviewed-by: Kashyap Desai 
Signed-off-by: Kalesh AP 
Signed-off-by: Selvin Xavier 
Link: https://patch.msgid.link/20250104061519.2540178-1-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky 
Signed-off-by: Sasha Levin

RDMA/srp: Fix error handling in srp_add_port

2025-02-08T08:52:14+00:00

[ Upstream commit a3cbf68c69611188cd304229e346bffdabfd4277 ]

As comment of device_add() says, if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count.

Add a put_device() call before returning from the function to decrement
reference count for cleanup.

Found by code review.

Fixes: c8e4c2397655 ("RDMA/srp: Rework the srp_add_port() error path")
Signed-off-by: Ma Ke 
Link: https://patch.msgid.link/20241217075538.2909996-1-make_ruc2021@163.com
Signed-off-by: Bart Van Assche 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Sasha Levin

RDMA/rxe: Fix mismatched max_msg_sz

2025-02-08T08:52:13+00:00

[ Upstream commit db03b70969aab4ef111a3369cfd90ea4da3a6aa0 ]

User mode queries max_msg_sz as 0x800000 by command 'ibv_devinfo -v',
however ibv_post_send/ibv_post_recv has a limit of 2^31. Fix this
mismatched information.

Signed-off-by: zhenwei pi 
Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock")
Fixes: 5bf944f24129 ("RDMA/rxe: Add error messages")
Link: https://patch.msgid.link/20241216121953.765331-1-pizhenwei@bytedance.com
Review-by: Zhu Yanjun 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Sasha Levin

RDMA/rxe: Improve newline in printing messages

2025-02-08T08:52:13+00:00

[ Upstream commit 6482718086bf69f9616d0b86d03514b17afaeb08 ]

Previously rxe_{dbg,info,err}() macros are appended built-in newline,
but some users will add redundant newline sometimes. So remove the
built-in newline for these macros.

In terms of rxe_{dbg,info,err}_xxx() macros, because they don't have
built-in newline, append newline when using them.

CC: Daisuke Matsuda 
Reviewed-by: Daisuke Matsuda 
Acked-by: Zhu Yanjun 
Signed-off-by: Li Zhijian 
Link: https://lore.kernel.org/r/20240109083253.3629967-1-lizhijian@fujitsu.com
Signed-off-by: Leon Romanovsky 
Stable-dep-of: db03b70969aa ("RDMA/rxe: Fix mismatched max_msg_sz")
Signed-off-by: Sasha Levin

rdma/cxgb4: Prevent potential integer overflow on 32bit

2025-02-08T08:52:11+00:00

[ Upstream commit bd96a3935e89486304461a21752f824fc25e0f0b ]

The "gl->tot_len" variable is controlled by the user.  It comes from
process_responses().  On 32bit systems, the "gl->tot_len + sizeof(struct
cpl_pass_accept_req) + sizeof(struct rss_header)" addition could have an
integer wrapping bug.  Use size_add() to prevent this.

Fixes: 1cab775c3e75 ("RDMA/cxgb4: Fix LE hash collision bug for passive open connection")
Link: https://patch.msgid.link/r/86b404e1-4a75-4a35-a34e-e3054fa554c7@stanley.mountain
Signed-off-by: Dan Carpenter 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Sasha Levin

RDMA/mlx4: Avoid false error about access to uninitialized gids array

2025-02-08T08:52:11+00:00

[ Upstream commit 1f53d88cbb0dcc7df235bf6611ae632b254fccd8 ]

Smatch generates the following false error report:
drivers/infiniband/hw/mlx4/main.c:393 mlx4_ib_del_gid() error: uninitialized symbol 'gids'.

Traditionally, we are not changing kernel code and asking people to fix
the tools. However in this case, the fix can be done by simply rearranging
the code to be more clear.

Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks")
Link: https://patch.msgid.link/6a3a1577463da16962463fcf62883a87506e9b62.1733233426.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky 
Signed-off-by: Sasha Levin

RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop

2025-02-01T17:37:52+00:00

commit 8be3e5b0c96beeefe9d5486b96575d104d3e7d17 upstream.

Driver waits indefinitely for the fifo occupancy to go below a threshold
as soon as the pacing interrupt is received. This can cause soft lockup on
one of the processors, if the rate of DB is very high.

Add a loop count for FPGA and exit the __wait_for_fifo_occupancy_below_th
if the loop is taking more time. Pacing will be continuing until the
occupancy is below the threshold. This is ensured by the checks in
bnxt_re_pacing_timer_exp and further scheduling the work for pacing based
on the fifo occupancy.

Fixes: 2ad4e6303a6d ("RDMA/bnxt_re: Implement doorbell pacing algorithm")
Link: https://patch.msgid.link/r/1728373302-19530-7-git-send-email-selvin.xavier@broadcom.com
Reviewed-by: Kalesh AP 
Reviewed-by: Chandramohan Akula 
Signed-off-by: Selvin Xavier 
Signed-off-by: Jason Gunthorpe 
[ Add the declaration of variable pacing_data to make it work on 6.6.y ]
Signed-off-by: Alva Lan 
Signed-off-by: Greg Kroah-Hartman

RDMA/bnxt_re: Fix to export port num to ib_query_qp

2025-01-23T16:21:14+00:00

[ Upstream commit 34db8ec931b84d1426423f263b1927539e73b397 ]

Current driver implementation doesn't populate the port_num
field in query_qp. Adding the code to convert internal firmware
port id to ibv defined port number and export it.

Reviewed-by: Saravanan Vajravel 
Reviewed-by: Kalesh AP 
Signed-off-by: Hongguang Gao 
Signed-off-by: Selvin Xavier 
Link: https://patch.msgid.link/20241211083931.968831-5-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky 
Signed-off-by: Sasha Levin