linux-stable.git/drivers/block/rbd.c, branch v4.6

rbd: report unsupported features to syslog

2016-04-28T08:07:43+00:00

... instead of just returning an error.

Signed-off-by: Ilya Dryomov 
Reviewed-by: Josh Durgin

rbd: fix rbd map vs notify races

2016-04-28T08:07:22+00:00

A while ago, commit 9875201e1049 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().

A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup().  Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
    Call Trace:
     [] rbd_watch_cb+0x34/0x180 [rbd]
     [] do_event_work+0x40/0xb0 [libceph]
     [] process_one_work+0x17b/0x470
     [] worker_thread+0x11b/0x400
     [] ? rescuer_thread+0x400/0x400
     [] kthread+0xcf/0xe0
     [] ? finish_task_switch+0x53/0x170
     [] ? kthread_create_on_node+0x140/0x140
     [] ret_from_fork+0x58/0x90
     [] ? kthread_create_on_node+0x140/0x140
    RIP  [] rbd_dev_refresh+0xfa/0x180 [rbd]

If an error occurs during rbd map, we have to error out, potentially
tearing down a watch.  Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:

    Assertion failure in rbd_dev_header_info() at line 4722:

     rbd_assert(rbd_image_format_valid(rbd_dev->image_format));

    Call Trace:
     [] ? rbd_parent_request_create+0x150/0x150
     [] rbd_dev_refresh+0x59/0x390
     [] rbd_watch_cb+0x69/0x290
     [] do_event_work+0x10f/0x1c0
     [] process_one_work+0x689/0x1a80
     [] ? process_one_work+0x5e7/0x1a80
     [] ? finish_task_switch+0x225/0x640
     [] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     [] worker_thread+0xd9/0x1320
     [] ? process_one_work+0x1a80/0x1a80
     [] kthread+0x21d/0x2e0
     [] ? kthread_stop+0x550/0x550
     [] ret_from_fork+0x22/0x40
     [] ? kthread_stop+0x550/0x550
    RIP  [] rbd_dev_header_info+0xa19/0x1e30

To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section.  The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.

Fixes: http://tracker.ceph.com/issues/15490

Signed-off-by: Ilya Dryomov 
Reviewed-by: Josh Durgin

rbd: use GFP_NOIO consistently for request allocations

2016-04-05T20:11:37+00:00

As of 5a60e87603c4c533492c515b7f62578189b03c9c, RBD object request
allocations are made via rbd_obj_request_create() with GFP_NOIO.
However, subsequent OSD request allocations in rbd_osd_req_create*()
use GFP_ATOMIC.

With heavy page cache usage (e.g. OSDs running on same host as krbd
client), rbd_osd_req_create() order-1 GFP_ATOMIC allocations have been
observed to fail, where direct reclaim would have allowed GFP_NOIO
allocations to succeed.

Cc: stable@vger.kernel.org # 3.18+
Suggested-by: Vlastimil Babka 
Suggested-by: Neil Brown 
Signed-off-by: David Disseldorp 
Signed-off-by: Ilya Dryomov

rbd: use KMEM_CACHE macro

2016-03-25T17:51:56+00:00

Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.

Signed-off-by: Geliang Tang 
Signed-off-by: Ilya Dryomov

libceph: enable large, variable-sized OSD requests

2016-03-25T17:51:43+00:00

Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests.  The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.

r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once.  ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc().  req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.

Signed-off-by: Ilya Dryomov

libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op

2016-03-25T17:51:42+00:00

This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.

Signed-off-by: Yan, Zheng 
Signed-off-by: Ilya Dryomov

rbd: delete an unnecessary check before rbd_dev_destroy()

2016-01-21T18:36:07+00:00

The rbd_dev_destroy() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
Signed-off-by: Ilya Dryomov

rbd: don't put snap_context twice in rbd_queue_workfn()

2015-12-04T13:29:18+00:00

Commit 4e752f0ab0e8 ("rbd: access snapshot context and mapping size
safely") moved ceph_get_snap_context() out of rbd_img_request_create()
and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
error path in rbd_queue_workfn().  However, rbd_img_request_create()
consumes a ref on snapc, so calling ceph_put_snap_context() after
a successful rbd_img_request_create() leads to an extra put.  Fix it.

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Ilya Dryomov 
Reviewed-by: Josh Durgin

rbd: remove duplicate calls to rbd_dev_mapping_clear()

2015-11-02T22:36:48+00:00

Commit d1cf5788450e ("rbd: set mapping info earlier") defined
rbd_dev_mapping_clear(), but, just a few days after, commit
f35a4dee14c3 ("rbd: set the mapping size and features later") moved
rbd_dev_mapping_set() calls and added another rbd_dev_mapping_clear()
call instead of moving the old one.  Around the same time, another
duplicate was introduced in rbd_dev_device_release() - kill both.

Signed-off-by: Ilya Dryomov

rbd: set device_type::release instead of device::release

2015-11-02T22:36:48+00:00

No point in providing an empty device_type::release callback and then
setting device::release for each rbd_dev dynamically.

Signed-off-by: Ilya Dryomov