<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/block/nbd.c, branch v5.7</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>nbd: requeue command if the soecket is changed</title>
<updated>2020-03-12T14:01:24+00:00</updated>
<author>
<name>Hou Pu</name>
<email>houpu.main@gmail.com</email>
</author>
<published>2020-02-28T06:40:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2c272542baee2c3b9e8e3a260db81227ccefe8b5'/>
<id>2c272542baee2c3b9e8e3a260db81227ccefe8b5</id>
<content type='text'>
In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
it is allowed to reset timer when it fires if tag_set.timeout
is set to zero. If the server is shutdown and a new socket
is reconfigured, the request should be requeued to be processed by
new server instead of waiting for response from the old one.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Hou Pu &lt;houpu@bytedance.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
it is allowed to reset timer when it fires if tag_set.timeout
is set to zero. If the server is shutdown and a new socket
is reconfigured, the request should be requeued to be processed by
new server instead of waiting for response from the old one.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Hou Pu &lt;houpu@bytedance.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: enable replace socket if only one connection is configured</title>
<updated>2020-03-12T13:58:39+00:00</updated>
<author>
<name>Hou Pu</name>
<email>houpu.main@gmail.com</email>
</author>
<published>2020-02-28T06:40:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d970958b2d24e9a40b685ad82bf26a291d6f1c25'/>
<id>d970958b2d24e9a40b685ad82bf26a291d6f1c25</id>
<content type='text'>
Nbd server with multiple connections could be upgraded since
560bc4b (nbd: handle dead connections). But if only one conncection
is configured, after we take down nbd server, all inflight IO
would finally timeout and return error. We could requeue them
like what we do with multiple connections and wait for new socket
in submit path.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Hou Pu &lt;houpu@bytedance.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Nbd server with multiple connections could be upgraded since
560bc4b (nbd: handle dead connections). But if only one conncection
is configured, after we take down nbd server, all inflight IO
would finally timeout and return error. We could requeue them
like what we do with multiple connections and wait for new socket
in submit path.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Hou Pu &lt;houpu@bytedance.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: add a flush_workqueue in nbd_start_device</title>
<updated>2020-01-30T04:05:53+00:00</updated>
<author>
<name>Sun Ke</name>
<email>sunke32@huawei.com</email>
</author>
<published>2020-01-22T03:18:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5c0dd228b5fc30a3b732c7ae2657e0161ec7ed80'/>
<id>5c0dd228b5fc30a3b732c7ae2657e0161ec7ed80</id>
<content type='text'>
When kzalloc fail, may cause trying to destroy the
workqueue from inside the workqueue.

If num_connections is m (2 &lt; m), and NO.1 ~ NO.n
(1 &lt; n &lt; m) kzalloc are successful. The NO.(n + 1)
failed. Then, nbd_start_device will return ENOMEM
to nbd_start_device_ioctl, and nbd_start_device_ioctl
will return immediately without running flush_workqueue.
However, we still have n recv threads. If nbd_release
run first, recv threads may have to drop the last
config_refs and try to destroy the workqueue from
inside the workqueue.

To fix it, add a flush_workqueue in nbd_start_device.

Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Sun Ke &lt;sunke32@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When kzalloc fail, may cause trying to destroy the
workqueue from inside the workqueue.

If num_connections is m (2 &lt; m), and NO.1 ~ NO.n
(1 &lt; n &lt; m) kzalloc are successful. The NO.(n + 1)
failed. Then, nbd_start_device will return ENOMEM
to nbd_start_device_ioctl, and nbd_start_device_ioctl
will return immediately without running flush_workqueue.
However, we still have n recv threads. If nbd_release
run first, recv threads may have to drop the last
config_refs and try to destroy the workqueue from
inside the workqueue.

To fix it, add a flush_workqueue in nbd_start_device.

Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Sun Ke &lt;sunke32@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: fix shutdown and recv work deadlock v2</title>
<updated>2019-12-16T23:12:47+00:00</updated>
<author>
<name>Mike Christie</name>
<email>mchristi@redhat.com</email>
</author>
<published>2019-12-08T22:51:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1c05839aa973cfae8c3db964a21f9c0eef8fcc21'/>
<id>1c05839aa973cfae8c3db964a21f9c0eef8fcc21</id>
<content type='text'>
This fixes a regression added with:

commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
Author: Mike Christie &lt;mchristi@redhat.com&gt;
Date:   Sun Aug 4 14:10:06 2019 -0500

    nbd: fix max number of supported devs

where we can deadlock during device shutdown. The problem occurs if
the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has
returned and the userspace app has droppped its reference via closing
the device and running nbd_release. The recv_work nbd_config_put call
would then drop the refcount to zero and try to destroy the config which
would try to do destroy_workqueue from the recv work.

This patch just has nbd_start_device_ioctl do a flush_workqueue when it
wakes so we know after the ioctl returns running works have exited. This
also fixes a possible race where we could try to reuse the device while
old recv_works are still running.

Cc: stable@vger.kernel.org
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes a regression added with:

commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
Author: Mike Christie &lt;mchristi@redhat.com&gt;
Date:   Sun Aug 4 14:10:06 2019 -0500

    nbd: fix max number of supported devs

where we can deadlock during device shutdown. The problem occurs if
the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has
returned and the userspace app has droppped its reference via closing
the device and running nbd_release. The recv_work nbd_config_put call
would then drop the refcount to zero and try to destroy the config which
would try to do destroy_workqueue from the recv work.

This patch just has nbd_start_device_ioctl do a flush_workqueue when it
wakes so we know after the ioctl returns running works have exited. This
also fixes a possible race where we could try to reuse the device while
old recv_works are still running.

Cc: stable@vger.kernel.org
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block</title>
<updated>2019-11-25T19:18:03+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-11-25T19:18:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=323264eefba1ea288d5962c0a9e23ebd62107ca8'/>
<id>323264eefba1ea288d5962c0a9e23ebd62107ca8</id>
<content type='text'>
Pull additional block driver updates from Jens Axboe:
 "Here's another block driver update, done to avoid conflicts with the
  zoned changes coming next.

  This contains:

   - Prepare SCSI sd for zone open/close/finish support

   - Small NVMe pull request
        - hwmon support (Akinobu)
        - add new co-maintainer (Christoph)
        - work-around for a discard issue on non-conformant drives
          (Eduard)

   - Small nbd leak fix"

* tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block:
  nbd: prevent memory leak
  nvme: hwmon: add quirk to avoid changing temperature threshold
  nvme: hwmon: provide temperature min and max values for each sensor
  nvmet: add another maintainer
  nvme: Discard workaround for non-conformant devices
  nvme: Add hardware monitoring support
  scsi: sd_zbc: add zone open, close, and finish support
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull additional block driver updates from Jens Axboe:
 "Here's another block driver update, done to avoid conflicts with the
  zoned changes coming next.

  This contains:

   - Prepare SCSI sd for zone open/close/finish support

   - Small NVMe pull request
        - hwmon support (Akinobu)
        - add new co-maintainer (Christoph)
        - work-around for a discard issue on non-conformant drives
          (Eduard)

   - Small nbd leak fix"

* tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block:
  nbd: prevent memory leak
  nvme: hwmon: add quirk to avoid changing temperature threshold
  nvme: hwmon: provide temperature min and max values for each sensor
  nvmet: add another maintainer
  nvme: Discard workaround for non-conformant devices
  nvme: Add hardware monitoring support
  scsi: sd_zbc: add zone open, close, and finish support
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: prevent memory leak</title>
<updated>2019-11-21T18:33:41+00:00</updated>
<author>
<name>Navid Emamdoost</name>
<email>navid.emamdoost@gmail.com</email>
</author>
<published>2019-09-23T20:09:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=03bf73c315edca28f47451913177e14cd040a216'/>
<id>03bf73c315edca28f47451913177e14cd040a216</id>
<content type='text'>
In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the
reallocted memory is leak. The correct behaviour should be assigning the
reallocted memory to config-&gt;socks right after success.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Navid Emamdoost &lt;navid.emamdoost@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the
reallocted memory is leak. The correct behaviour should be assigning the
reallocted memory to config-&gt;socks right after success.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Navid Emamdoost &lt;navid.emamdoost@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd:fix memory leak in nbd_get_socket()</title>
<updated>2019-11-19T16:23:26+00:00</updated>
<author>
<name>Sun Ke</name>
<email>sunke32@huawei.com</email>
</author>
<published>2019-11-19T06:09:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dff10bbea4be47bdb615b036c834a275b7c68133'/>
<id>dff10bbea4be47bdb615b036c834a275b7c68133</id>
<content type='text'>
Before returning NULL, put the sock first.

Cc: stable@vger.kernel.org
Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup")
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Sun Ke &lt;sunke32@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before returning NULL, put the sock first.

Cc: stable@vger.kernel.org
Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup")
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Sun Ke &lt;sunke32@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: verify socket is supported during setup</title>
<updated>2019-10-25T20:37:21+00:00</updated>
<author>
<name>Mike Christie</name>
<email>mchristi@redhat.com</email>
</author>
<published>2019-10-17T21:27:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cf1b2326b734896734c6e167e41766f9cee7686a'/>
<id>cf1b2326b734896734c6e167e41766f9cee7686a</id>
<content type='text'>
nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.

This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.

Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Tested-by: Richard W.M. Jones &lt;rjones@redhat.com&gt;
Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.

This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.

Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs")
Tested-by: Richard W.M. Jones &lt;rjones@redhat.com&gt;
Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: handle racing with error'ed out commands</title>
<updated>2019-10-25T20:20:03+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2019-10-21T19:56:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7ce23e8e0a9cd38338fc8316ac5772666b565ca9'/>
<id>7ce23e8e0a9cd38338fc8316ac5772666b565ca9</id>
<content type='text'>
We hit the following warning in production

print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
 blk_mq_free_request+0xb7/0xf0
 blk_mq_complete_request+0x62/0xf0
 recv_work+0x29/0xa1 [nbd]
 process_one_work+0x1f5/0x3f0
 worker_thread+0x2d/0x3d0
 ? rescuer_thread+0x340/0x340
 kthread+0x111/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---

This was preceded by us timing out everything and shutting down the
sockets for the device.  The problem is we had a request in the queue at
the same time, so we completed the request twice.  This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.

Fix this by checking cmd-&gt;status in nbd_read_stat.  We only change this
under the cmd-&gt;lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.

Reviewed-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We hit the following warning in production

print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
 blk_mq_free_request+0xb7/0xf0
 blk_mq_complete_request+0x62/0xf0
 recv_work+0x29/0xa1 [nbd]
 process_one_work+0x1f5/0x3f0
 worker_thread+0x2d/0x3d0
 ? rescuer_thread+0x340/0x340
 kthread+0x111/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---

This was preceded by us timing out everything and shutting down the
sockets for the device.  The problem is we had a request in the queue at
the same time, so we completed the request twice.  This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.

Fix this by checking cmd-&gt;status in nbd_read_stat.  We only change this
under the cmd-&gt;lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.

Reviewed-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: protect cmd-&gt;status with cmd-&gt;lock</title>
<updated>2019-10-25T20:20:01+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2019-10-21T19:56:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=de6346ecbc8f5591ebd6c44ac164e8b8671d71d7'/>
<id>de6346ecbc8f5591ebd6c44ac164e8b8671d71d7</id>
<content type='text'>
We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.

For the clear_req cause this is initiated by the user, so again is safe.

Reviewed-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.

For the clear_req cause this is initiated by the user, so again is safe.

Reviewed-by: Mike Christie &lt;mchristi@redhat.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
