linux-stable.git/drivers/infiniband/ulp, branch v3.2.85

IB/ipoib: Don't allow MC joins during light MC flush

2016-11-20T01:01:41+00:00

commit 344bacca8cd811809fc33a249f2738ab757d327f upstream.

This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.

The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.

[18332.714265] ------------[ cut here ]------------
[18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core]
...
[18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[18332.779411]  0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000
[18332.784960]  0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300
[18332.790547]  ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280
[18332.796199] Call Trace:
[18332.798015]  [] dump_stack+0x63/0x8c
[18332.801831]  [] __warn+0xd1/0xf0
[18332.805403]  [] warn_slowpath_null+0x1d/0x20
[18332.809706]  [] ib_dealloc_pd+0xff/0x120 [ib_core]
[18332.814384]  [] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib]
[18332.820031]  [] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib]
[18332.825220]  [] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib]
[18332.830290]  [] ipoib_uninit+0x2f/0x40 [ib_ipoib]
[18332.834911]  [] rollback_registered_many+0x1aa/0x2c0
[18332.839741]  [] rollback_registered+0x31/0x40
[18332.844091]  [] unregister_netdevice_queue+0x48/0x80
[18332.848880]  [] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib]
[18332.853848]  [] delete_child+0x7d/0xf0 [ib_ipoib]
[18332.858474]  [] dev_attr_store+0x18/0x30
[18332.862510]  [] sysfs_kf_write+0x3a/0x50
[18332.866349]  [] kernfs_fop_write+0x120/0x170
[18332.870471]  [] __vfs_write+0x28/0xe0
[18332.874152]  [] ? percpu_down_read+0x1f/0x50
[18332.878274]  [] vfs_write+0xa2/0x1a0
[18332.881896]  [] SyS_write+0x46/0xa0
[18332.885632]  [] do_syscall_64+0x57/0xb0
[18332.889709]  [] entry_SYSCALL64_slow_path+0x25/0x25
[18332.894727] ---[ end trace 09ebbe31f831ef17 ]---

Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Alex Vesker 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Doug Ledford 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

IB/ipoib: Fix memory corruption in ipoib cm mode connect flow

2016-11-20T01:01:36+00:00

commit 546481c2816ea3c061ee9d5658eb48070f69212e upstream.

When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
	neigh_add_path --> ipoib_cm_create_tx -->
	queue_work (pointer to path is in the cm/tx struct)
	#while the work is still in the queue,
	#the port goes down and causes the ipoib_flush_paths:
	ipoib_flush_paths --> path_free --> kfree(path)
	#at this point the work scheduled starts.
	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
	 -> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support')
Signed-off-by: Erez Shitrit 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Doug Ledford 
[bwh: Backported to 3.2: s/neigh->daddr/neigh->neighbour->ha/]
Signed-off-by: Ben Hutchings

IB/srp: Fix a sporadic crash triggered by cable pulling

2014-07-11T12:33:35+00:00

commit 024ca90151f5e4296d30f72c13ff9a075e23c9ec upstream.

Avoid that the loops that iterate over the request ring can encounter
a pointer to a SCSI command in req->scmnd that is no longer associated
with that request. If the function srp_unmap_data() is invoked twice
for a SCSI command that is not in flight then that would cause
ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument,
resulting in a kernel oops.

Reported-by: Sagi Grimberg 
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche 
Reviewed-by: Sagi Grimberg 
Signed-off-by: Roland Dreier 
Signed-off-by: Ben Hutchings

IPoIB: Fix send lockup due to missed TX completion

2013-04-10T02:20:02+00:00

commit 1ee9e2aa7b31427303466776f455d43e5e3c9275 upstream.

Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Reviewed-by: Dean Luick 
Signed-off-by: Mike Marciniszyn 
Signed-off-by: Roland Dreier 
Signed-off-by: Ben Hutchings

IB/srp: Avoid having aborted requests hang

2012-10-17T02:49:19+00:00

commit d8536670916a685df116b5c2cb256573fd25e4e3 upstream.

We need to call scsi_done() for commands after we abort them.

Signed-off-by: Bart Van Assche 
Acked-by: David Dillow 
Signed-off-by: Roland Dreier 
Signed-off-by: Ben Hutchings

IB/srp: Fix use-after-free in srp_reset_req()

2012-10-17T02:49:18+00:00

commit 9b796d06d5d1b1e85ae2316a283ea11dd739ef96 upstream.

srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.

Signed-off-by: Bart Van Assche 
Acked-by: David Dillow 
Signed-off-by: Roland Dreier 
Signed-off-by: Ben Hutchings

IPoIB: Fix use-after-free of multicast object

2012-10-17T02:49:17+00:00

commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream.

Fix a crash in ipoib_mcast_join_task().  (with help from Or Gerlitz)

Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries.  This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called.  To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy 
Signed-off-by: Roland Dreier 
Signed-off-by: Ben Hutchings

IB/srp: Fix a race condition

2012-09-12T02:37:04+00:00

commit 220329916c72ee3d54ae7262b215a050f04a18fc upstream.

Avoid a crash caused by the scmnd->scsi_done(scmnd) call in
srp_process_rsp() being invoked with scsi_done == NULL.  This can
happen if a reply is received during or after a command abort.

Reported-by: Joseph Glanville 
Reference: http://marc.info/?l=linux-rdma&m=134314367801595
Acked-by: David Dillow 
Signed-off-by: Bart Van Assche 
Signed-off-by: Roland Dreier 
Signed-off-by: Ben Hutchings

IB/iser: Post initial receive buffers before sending the final login request

2012-04-02T16:52:36+00:00

commit 89e984e2c2cd14f77ccb26c47726ac7f13b70ae8 upstream.

An iser target may send iscsi NO-OP PDUs as soon as it marks the iSER
iSCSI session as fully operative.  This means that there is window
where there are no posted receive buffers on the initiator side, so
it's possible for the iSER RC connection to break because of RNR NAK /
retry errors.  To fix this, rely on the flags bits in the login
request to have FFP (0x3) in the lower nibble as a marker for the
final login request, and post an initial chunk of receive buffers
before sending that login request instead of after getting the login
response.

Signed-off-by: Or Gerlitz 
Signed-off-by: Roland Dreier 
Signed-off-by: Greg Kroah-Hartman

IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses

2012-03-01T00:31:03+00:00

[ Upstream commit 936d7de3d736e0737542641269436f4b5968e9ef ]

Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb->cb
between its header_ops.create method and its .ndo_start_xmit
method.  Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack.  This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.

Signed-off-by: Roland Dreier 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman