linux.git/drivers/block, branch v4.4-rc4

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

2015-12-04T20:46:07+00:00

Pull Ceph fix from Sage Weil:
 "This addresses a refcounting bug that leads to a use-after-free"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: don't put snap_context twice in rbd_queue_workfn()

rbd: don't put snap_context twice in rbd_queue_workfn()

2015-12-04T13:29:18+00:00

Commit 4e752f0ab0e8 ("rbd: access snapshot context and mapping size
safely") moved ceph_get_snap_context() out of rbd_img_request_create()
and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
error path in rbd_queue_workfn().  However, rbd_img_request_create()
consumes a ref on snapc, so calling ceph_put_snap_context() after
a successful rbd_img_request_create() leads to an extra put.  Fix it.

Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Ilya Dryomov 
Reviewed-by: Josh Durgin

null_blk: change type of completion_nsec to unsigned long

2015-12-01T17:52:12+00:00

This commit at least doubles the maximum value for
completion_nsec. This helps in special cases where one wants/needs to
emulate an extremely slow I/O (for example to spot bugs).

Signed-off-by: Paolo Valente 
Signed-off-by: Arianna Avanzini 
Signed-off-by: Jens Axboe

null_blk: guarantee device restart in all irq modes

2015-12-01T17:52:10+00:00

In single-queue (block layer) mode,the function null_rq_prep_fn stops
the device if alloc_cmd fails. Then, once stopped, the device must be
restarted on the next command completion, so that the request(s) for
which alloc_cmd failed can be requeued. Otherwise the device hangs.

Unfortunately, device restart is currently performed only for delayed
completions, i.e., in irqmode==2. This fact causes hangs, for the
above reasons, with the other irqmodes in combination with single-queue
block layer.

This commits addresses this issue by making sure that, if stopped, the
device is properly restarted for all irqmodes on completions.

Signed-off-by: Paolo Valente 
Signed-off-by: Arianna AVanzini 
Signed-off-by: Jens Axboe

null_blk: set a separate timer for each command

2015-12-01T17:52:08+00:00

For the Timer IRQ mode (i.e., when command completions are delayed),
there is one timer for each CPU. Each of these timers
. has a completion queue associated with it, containing all the
  command completions to be executed when the timer fires;
. is set, and a new completion-to-execute is inserted into its
  completion queue, every time the dispatch code for a new command
  happens to be executed on the CPU related to the timer.

This implies that, if the dispatch of a new command happens to be
executed on a CPU whose timer has already been set, but has not yet
fired, then the timer is set again, to the completion time of the
newly arrived command. When the timer eventually fires, all its queued
completions are executed.

This way of handling delayed command completions entails the following
problem: if more than one command completion is inserted into the
queue of a timer before the timer fires, then the expiration time for
the timer is moved forward every time each of these completions is
enqueued. As a consequence, only the last completion enqueued enjoys a
correct execution time, while all previous completions are unjustly
delayed until the last completion is executed (and at that time they
are executed all together).

Specifically, if all the above completions are enqueued almost at the
same time, then the problem is negligible. On the opposite end, if
every completion is enqueued a while after the previous completion was
enqueued (in the extreme case, it is enqueued only right before the
timer would have expired), then every enqueued completion, except for
the last one, experiences an inflated delay, proportional to the number
of completions enqueued after it. In the end, commands, and thus I/O
requests, may be completed at an arbitrarily lower rate than the
desired one.

This commit addresses this issue by replacing per-CPU timers with
per-command timers, i.e., by associating an individual timer with each
command.

Signed-off-by: Paolo Valente 
Signed-off-by: Arianna Avanzini 
Signed-off-by: Jens Axboe

mtip32xx: use formatting capability of kthread_create_on_node

2015-11-20T15:46:50+00:00

kthread_create_on_node takes format+args, so there's no need to do the
pretty-printing in advance. Moreover, "mtip_svc_thd_99" (including its
'\0') only just fits in 16 bytes, so if index could ever go above 99
we'd have a stack buffer overflow.

Signed-off-by: Rasmus Villemoes 
Reviewed-by: Jeff Moyer 
Signed-off-by: Jens Axboe

null_blk: do not del gendisk with lightnvm

2015-11-19T22:15:56+00:00

The gendisk structure has not been initialized when using lightnvm.
Make sure to not delete it upon exit. Also make sure that we use the
appropriate disk_name at unregistration.

Signed-off-by: Matias Bjørling 
Signed-off-by: Jens Axboe

null_blk: use device addressing mode

2015-11-19T22:15:54+00:00

The linear addressing mode was removed in 7386af2. Make null_blk instead
expose the ppa format geometry and support the generic addressing mode.

Signed-off-by: Matias Bjørling 
Signed-off-by: Jens Axboe

null_blk: use ppa_cache pool

2015-11-19T22:15:53+00:00

Instead of using a page pool, we can save memory by only allocating room
for 64 entries for the ppa command. Introduce a ppa_cache to allocate only
the required memory for the ppa list.

Signed-off-by: Matias Bjørling 
Signed-off-by: Jens Axboe

null_blk: register as a LightNVM device

2015-11-16T22:22:28+00:00

Add support for registering as a LightNVM device. This allows us to
evaluate the performance of the LightNVM subsystem.

In /drivers/Makefile, LightNVM is moved above block device drivers
to make sure that the LightNVM media managers have been initialized
before drivers under /drivers/block are initialized.

Signed-off-by: Matias Bjørling 
Fix by Jens Axboe to remove unneeded slab cache and the following
memory leak.
Signed-off-by: Jens Axboe