<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/infiniband, branch linux-2.6.38.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>IB/cm: Bump reference count on cm_id before invoking callback</title>
<updated>2011-03-23T20:03:37+00:00</updated>
<author>
<name>Sean Hefty</name>
<email>sean.hefty@intel.com</email>
</author>
<published>2011-02-23T16:17:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6fee8c663b4c7253b4ee68ae2680c8c4cb25a986'/>
<id>6fee8c663b4c7253b4ee68ae2680c8c4cb25a986</id>
<content type='text'>
commit 29963437a48475036353b95ab142bf199adb909e upstream.

When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
refcount of the cm_id is initialized to 1.  However, cm_process_work
will decrement the refcount after invoking all callbacks.  The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.

If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id.  This can lead to a crash when the cm callback thread tries
to access the cm_id.

This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.

Signed-off-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Acked-by: Doug Ledford &lt;dledford@redhat.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 29963437a48475036353b95ab142bf199adb909e upstream.

When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
refcount of the cm_id is initialized to 1.  However, cm_process_work
will decrement the refcount after invoking all callbacks.  The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.

If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id.  This can lead to a crash when the cm callback thread tries
to access the cm_id.

This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.

Signed-off-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Acked-by: Doug Ledford &lt;dledford@redhat.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/cma: Fix crash in request handlers</title>
<updated>2011-03-23T20:03:35+00:00</updated>
<author>
<name>Sean Hefty</name>
<email>sean.hefty@intel.com</email>
</author>
<published>2011-02-23T16:11:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4d94af00d818c4bfe9ca1f051ab25d9e6b5720c0'/>
<id>4d94af00d818c4bfe9ca1f051ab25d9e6b5720c0</id>
<content type='text'>
commit 25ae21a10112875763c18b385624df713a288a05 upstream.

Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS.  The crash has the following call trace:

    cm_process_work
       cma_req_handler
          cma_disable_callback
          rdma_create_id
             kzalloc
             init_completion
          cma_get_net_info
          cma_save_net_info
          cma_any_addr
             cma_zero_addr
          rdma_translate_ip
             rdma_copy_addr
          cma_acquire_dev
             rdma_addr_get_sgid
             ib_find_cached_gid
             cma_attach_to_dev
          ucma_event_handler
             kzalloc
             ib_copy_ah_attr_to_user
          cma_comp

[ preempted ]

    cma_write
        copy_from_user
        ucma_destroy_id
           copy_from_user
           _ucma_find_context
           ucma_put_ctx
           ucma_free_ctx
              rdma_destroy_id
                 cma_exch
                 cma_cancel_operation
                 rdma_node_get_transport

        rt_mutex_slowunlock
        bad_area_nosemaphore
        oops_enter

They were able to reproduce the crash multiple times with the
following details:

    Crash seems to always happen on the:
            mutex_unlock(&amp;conn_id-&gt;handler_mutex);
    as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers.  When a new connection request is received, the rdma_cm
allocates a new connection identifier.  This identifier has a single
reference count on it.  If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory.  However, the request
handlers may still be in the process of running.  When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.

Signed-off-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Acked-by: Doug Ledford &lt;dledford@redhat.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 25ae21a10112875763c18b385624df713a288a05 upstream.

Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS.  The crash has the following call trace:

    cm_process_work
       cma_req_handler
          cma_disable_callback
          rdma_create_id
             kzalloc
             init_completion
          cma_get_net_info
          cma_save_net_info
          cma_any_addr
             cma_zero_addr
          rdma_translate_ip
             rdma_copy_addr
          cma_acquire_dev
             rdma_addr_get_sgid
             ib_find_cached_gid
             cma_attach_to_dev
          ucma_event_handler
             kzalloc
             ib_copy_ah_attr_to_user
          cma_comp

[ preempted ]

    cma_write
        copy_from_user
        ucma_destroy_id
           copy_from_user
           _ucma_find_context
           ucma_put_ctx
           ucma_free_ctx
              rdma_destroy_id
                 cma_exch
                 cma_cancel_operation
                 rdma_node_get_transport

        rt_mutex_slowunlock
        bad_area_nosemaphore
        oops_enter

They were able to reproduce the crash multiple times with the
following details:

    Crash seems to always happen on the:
            mutex_unlock(&amp;conn_id-&gt;handler_mutex);
    as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers.  When a new connection request is received, the rdma_cm
allocates a new connection identifier.  This identifier has a single
reference count on it.  If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory.  However, the request
handlers may still be in the process of running.  When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.

Signed-off-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Acked-by: Doug Ledford &lt;dledford@redhat.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'nes' and 'qib' into for-next</title>
<updated>2011-02-17T22:04:59+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-02-17T22:04:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=814b0a61204d24f9fba6f7c575e6450d15ce2cf1'/>
<id>814b0a61204d24f9fba6f7c575e6450d15ce2cf1</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/qib: Prevent double completions after a timeout or RNR error</title>
<updated>2011-02-17T22:04:50+00:00</updated>
<author>
<name>Mike Marciniszyn</name>
<email>mike.marciniszyn@qlogic.com</email>
</author>
<published>2011-02-16T15:48:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c0af2c057d7ce3f0b260f9380d187a82bb5cab28'/>
<id>c0af2c057d7ce3f0b260f9380d187a82bb5cab28</id>
<content type='text'>
There is a double completion associated with error handling for RC QPs.

The sequence is:

 - The do_rc_ack() routine fields an RNR nack and there are 0
   rnr_retries configured on the QP.
 - qib_error_qp() stops the pending timer
 - qib_rc_send_complete() is called from sdma_complete()
 - qib_rc_send_complete() starts the timer because the msb of the psn
   just completed says an ack is needed.
 - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
 - rc_timeout() calls qib_restart_rc()
 - qib_restart_rc() calls qib_send_complete() with a
   IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
   past

The fix avoids starting the timer since another packet will never
arrive.

Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a double completion associated with error handling for RC QPs.

The sequence is:

 - The do_rc_ack() routine fields an RNR nack and there are 0
   rnr_retries configured on the QP.
 - qib_error_qp() stops the pending timer
 - qib_rc_send_complete() is called from sdma_complete()
 - qib_rc_send_complete() starts the timer because the msb of the psn
   just completed says an ack is needed.
 - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
 - rc_timeout() calls qib_restart_rc()
 - qib_restart_rc() calls qib_send_complete() with a
   IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
   past

The fix avoids starting the timer since another packet will never
arrive.

Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/qib: Fix double add_timer()</title>
<updated>2011-02-10T19:24:08+00:00</updated>
<author>
<name>Mike Marciniszyn</name>
<email>mike.marciniszyn@qlogic.com</email>
</author>
<published>2011-02-10T14:11:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=414ed90cee32486c50f91b28990443e0dc21c868'/>
<id>414ed90cee32486c50f91b28990443e0dc21c868</id>
<content type='text'>
The following panic BUG_ON occurs during qib testing:

    Kernel BUG at include/linux/timer.h:82

    RIP  [&lt;ffffffff881f7109&gt;] :ib_qib:start_timer+0x73/0x89
     RSP &lt;ffffffff80425bd0&gt;
     &lt;0&gt;Kernel panic - not syncing: Fatal exception
     &lt;0&gt;Dumping qib trace buffer from panic
    qib_set_lid INFO: IB0:1 got a lid: 0xf8
    Done dumping qib trace buffer
    BUG: warning at kernel/panic.c:137/panic() (Tainted: G

The flaw is due to a missing state test when processing responses that
results in an add_timer() call when the same timer is already queued.
This code was executing in parallel with a QP destroy on another CPU
that had changed the state to reset, but the missing test caused to
response handling code to run on into the panic.

Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following panic BUG_ON occurs during qib testing:

    Kernel BUG at include/linux/timer.h:82

    RIP  [&lt;ffffffff881f7109&gt;] :ib_qib:start_timer+0x73/0x89
     RSP &lt;ffffffff80425bd0&gt;
     &lt;0&gt;Kernel panic - not syncing: Fatal exception
     &lt;0&gt;Dumping qib trace buffer from panic
    qib_set_lid INFO: IB0:1 got a lid: 0xf8
    Done dumping qib trace buffer
    BUG: warning at kernel/panic.c:137/panic() (Tainted: G

The flaw is due to a missing state test when processing responses that
results in an add_timer() call when the same timer is already queued.
This code was executing in parallel with a QP destroy on another CPU
that had changed the state to reset, but the missing test caused to
response handling code to run on into the panic.

Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/nes: Don't generate async events for unregistered devices</title>
<updated>2011-02-03T23:55:26+00:00</updated>
<author>
<name>Maciej Sosnowski</name>
<email>maciej.sosnowski@intel.com</email>
</author>
<published>2011-02-03T23:55:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=25a54a6bb87dc966f6a3fc1f2ac6e88db1f5614c'/>
<id>25a54a6bb87dc966f6a3fc1f2ac6e88db1f5614c</id>
<content type='text'>
nes_port_ibevent() should not be called when the nes RDMA device is not
registered with the RDMA core.  Add missing checks of of_device_registered flag.

Signed-off-by: Maciej Sosnowski &lt;maciej.sosnowski@intel.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nes_port_ibevent() should not be called when the nes RDMA device is not
registered with the RDMA core.  Add missing checks of of_device_registered flag.

Signed-off-by: Maciej Sosnowski &lt;maciej.sosnowski@intel.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband</title>
<updated>2011-02-03T19:19:26+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-02-03T19:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9118626a30f8a3f58674623bebd3c34961e558af'/>
<id>9118626a30f8a3f58674623bebd3c34961e558af</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA: Update missed conversion of flush_scheduled_work()
  RDMA/ucma: Copy iWARP route information on queries
  RDMA/amso1100: Fix compile warnings
  RDMA/cxgb4: Set the correct device physical function for iWARP connections
  RDMA/cxgb4: Limit MAXBURST EQ context field to 256B
  IB/qib: Hold link for TX SERDES settings
  mlx4_core: Add ConnectX-3 device IDs
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA: Update missed conversion of flush_scheduled_work()
  RDMA/ucma: Copy iWARP route information on queries
  RDMA/amso1100: Fix compile warnings
  RDMA/cxgb4: Set the correct device physical function for iWARP connections
  RDMA/cxgb4: Limit MAXBURST EQ context field to 256B
  IB/qib: Hold link for TX SERDES settings
  mlx4_core: Add ConnectX-3 device IDs
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'amso1100', 'cma', 'cxgb4', 'misc', 'mlx4' and 'qib' into for-next</title>
<updated>2011-01-30T04:45:04+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-01-30T04:45:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e51c7b1ab05d4a6fe4d153b2769290d37e077479'/>
<id>e51c7b1ab05d4a6fe4d153b2769290d37e077479</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA: Update missed conversion of flush_scheduled_work()</title>
<updated>2011-01-29T00:39:08+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-01-24T11:06:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=96e61fa55e86fbd10c526567aacae5b484c4b63c'/>
<id>96e61fa55e86fbd10c526567aacae5b484c4b63c</id>
<content type='text'>
Commit f06267104dd9 ("RDMA: Update workqueue usage") introduced ib_wq
and removed the use of flush_scheduled_work(); however, during the merge
process one chunk was lost in ib_sa_remove_one().  Fix it.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit f06267104dd9 ("RDMA: Update workqueue usage") introduced ib_wq
and removed the use of flush_scheduled_work(); however, during the merge
process one chunk was lost in ib_sa_remove_one().  Fix it.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/ucma: Copy iWARP route information on queries</title>
<updated>2011-01-29T00:34:05+00:00</updated>
<author>
<name>Steve Wise</name>
<email>swise@opengridcomputing.com</email>
</author>
<published>2011-01-21T03:40:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e86f8b06f5fa884a84c22b2dcd0df37ed0a75827'/>
<id>e86f8b06f5fa884a84c22b2dcd0df37ed0a75827</id>
<content type='text'>
For iWARP rdma_cm ids, the "route" information is the L2 src and
next hop addresses.

Signed-off-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For iWARP rdma_cm ids, the "route" information is the L2 src and
next hop addresses.

Signed-off-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
