<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/infiniband/ulp, branch v3.9.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>IPoIB: Fix send lockup due to missed TX completion</title>
<updated>2013-03-23T01:01:04+00:00</updated>
<author>
<name>Mike Marciniszyn</name>
<email>mike.marciniszyn@intel.com</email>
</author>
<published>2013-02-26T15:46:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1ee9e2aa7b31427303466776f455d43e5e3c9275'/>
<id>1ee9e2aa7b31427303466776f455d43e5e3c9275</id>
<content type='text'>
Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: &lt;stable@vger.kernel.org&gt;
Reviewed-by: Dean Luick &lt;dean.luick@intel.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@intel.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: &lt;stable@vger.kernel.org&gt;
Reviewed-by: Dean Luick &lt;dean.luick@intel.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@intel.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next</title>
<updated>2013-02-26T17:17:56+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2013-02-26T17:17:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ef4e359d9b9e2dc022f79840fd207796b524a893'/>
<id>ef4e359d9b9e2dc022f79840fd207796b524a893</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Free ipoib neigh on path record failure so path rec queries are retried</title>
<updated>2013-02-25T17:42:15+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2013-02-25T17:42:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f72dd56690aba26fc87fc64e98dd4cc66f27122c'/>
<id>f72dd56690aba26fc87fc64e98dd4cc66f27122c</id>
<content type='text'>
If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().

Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.

Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().

Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.

Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/srp: Fail I/O requests if the transport is offline</title>
<updated>2013-02-25T17:31:14+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2013-02-21T17:20:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2ce19e72f4d570c87e025ee6fca4eae699a8b712'/>
<id>2ce19e72f4d570c87e025ee6fca4eae699a8b712</id>
<content type='text'>
If an SRP target is no longer reachable and srp_reset_host() fails to
reconnect then ib_srp will invoke scsi_remove_host().  That function
will invoke __scsi_remove_device() for each LUN.  And that last
function will change the device state from SDEV_TRANSPORT_OFFLINE into
SDEV_CANCEL.  Certain user space software, e.g. older versions of
multipathd, continue queueing I/O to SCSI devices that are in the
SDEV_CANCEL state.

If these I/O requests are submitted as SG_IO that means that the
REQ_PREEMPT flag will be set and hence that these requests will be
passed to srp_queuecommand().  These requests will time out.  If new
requests are queued fast enough from user space these active requests
will prevent __scsi_remove_device() to finish.

Avoid this by failing I/O requests in the SDEV_CANCEL state if the
transport is offline.  Introduce a new variable to keep track of the
transport state instead of failing requests if (!target-&gt;connected ||
target-&gt;qp_in_error), so that the SCSI error handler has a chance to
retry commands after a transport layer failure occurred.

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If an SRP target is no longer reachable and srp_reset_host() fails to
reconnect then ib_srp will invoke scsi_remove_host().  That function
will invoke __scsi_remove_device() for each LUN.  And that last
function will change the device state from SDEV_TRANSPORT_OFFLINE into
SDEV_CANCEL.  Certain user space software, e.g. older versions of
multipathd, continue queueing I/O to SCSI devices that are in the
SDEV_CANCEL state.

If these I/O requests are submitted as SG_IO that means that the
REQ_PREEMPT flag will be set and hence that these requests will be
passed to srp_queuecommand().  These requests will time out.  If new
requests are queued fast enough from user space these active requests
will prevent __scsi_remove_device() to finish.

Avoid this by failing I/O requests in the SDEV_CANCEL state if the
transport is offline.  Introduce a new variable to keep track of the
transport state instead of failing requests if (!target-&gt;connected ||
target-&gt;qp_in_error), so that the SCSI error handler has a chance to
retry commands after a transport layer failure occurred.

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/srp: Avoid endless SCSI error handling loop</title>
<updated>2013-02-25T17:31:14+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2013-02-21T17:19:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c7c4e7ff8047e43c45628b85ac200582e9404c39'/>
<id>c7c4e7ff8047e43c45628b85ac200582e9404c39</id>
<content type='text'>
If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out.  If aborting fails, a device reset will be attempted.  If
the device reset also fails a host reset will be attempted.  If the
host reset also fails the whole procedure will be repeated.

srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started.  Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.

Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
  has already started or if reconnecting fails.

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: David Dillow &lt;dave@thedillows.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out.  If aborting fails, a device reset will be attempted.  If
the device reset also fails a host reset will be attempted.  If the
host reset also fails the whole procedure will be repeated.

srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started.  Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.

Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
  has already started or if reconnecting fails.

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: David Dillow &lt;dave@thedillows.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/srp: Avoid sending a task management function needlessly</title>
<updated>2013-02-25T17:31:14+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2013-02-21T17:18:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3780d1f08856f692116bcf026e4acf1c521df1c7'/>
<id>3780d1f08856f692116bcf026e4acf1c521df1c7</id>
<content type='text'>
Do not send a task management function if sending will fail anyway
because either there is no RDMA/RC connection or the QP is in the
error state.

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: David Dillow &lt;dave@thedillows.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do not send a task management function if sending will fail anyway
because either there is no RDMA/RC connection or the QP is in the
error state.

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: David Dillow &lt;dave@thedillows.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/srp: Track connection state properly</title>
<updated>2013-02-25T17:31:14+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2013-02-21T17:16:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e1b2f13aba9ff714d23ecd4a950e744ee7ad72e1'/>
<id>e1b2f13aba9ff714d23ecd4a950e744ee7ad72e1</id>
<content type='text'>
Remove an assignment that incorrectly overwrites the connection state
update by srp_connect_target().

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: David Dillow &lt;dave@thedillows.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove an assignment that incorrectly overwrites the connection state
update by srp_connect_target().

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: David Dillow &lt;dave@thedillows.org&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.8
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/iser: Enable iser when FMRs are not supported</title>
<updated>2013-02-22T08:22:30+00:00</updated>
<author>
<name>Or Gerlitz</name>
<email>ogerlitz@mellanox.com</email>
</author>
<published>2013-02-21T14:50:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5525d210fd55952262f30ba45b2acceb4a6a50e9'/>
<id>5525d210fd55952262f30ba45b2acceb4a6a50e9</id>
<content type='text'>
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Alex Tabachnik &lt;alext@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Alex Tabachnik &lt;alext@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/iser: Avoid error prints on EAGAIN registration failures</title>
<updated>2013-02-22T08:22:29+00:00</updated>
<author>
<name>Or Gerlitz</name>
<email>ogerlitz@mellanox.com</email>
</author>
<published>2013-02-21T14:50:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=819a087316a63ef7d60e7b816d18c4e29a05861a'/>
<id>819a087316a63ef7d60e7b816d18c4e29a05861a</id>
<content type='text'>
Under IO/CPU stress its possible that the FMR pool might not have a
free FMR mapping element for iSER to use because of incomplete
background unmapping processing.  In that case we get -EAGAIN and the
IO is pushed back to the SCSI layer which soon retries it.  No need to
be so verbose about that.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Under IO/CPU stress its possible that the FMR pool might not have a
free FMR mapping element for iSER to use because of incomplete
background unmapping processing.  In that case we get -EAGAIN and the
IO is pushed back to the SCSI layer which soon retries it.  No need to
be so verbose about that.

Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML</title>
<updated>2013-02-22T08:22:29+00:00</updated>
<author>
<name>Or Gerlitz</name>
<email>ogerlitz@mellanox.com</email>
</author>
<published>2013-02-21T14:50:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b96e4abaaeda62b5292026a9deaacb4066431ac7'/>
<id>b96e4abaaeda62b5292026a9deaacb4066431ac7</id>
<content type='text'>
ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128

Signed-off-by: Roi Dayan &lt;roid@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128

Signed-off-by: Roi Dayan &lt;roid@mellanox.com&gt;
Signed-off-by: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
