linux-stable.git/block, branch linux-4.18.y

SCSI: fix queue cleanup race before queue initialization is done

2018-11-21T08:22:04+00:00

commit 8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a upstream.

c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has
already fixed this race, however the implied synchronize_rcu()
in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
performance regression.

Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
tried to quiesce queue for avoiding unnecessary synchronize_rcu()
only when queue initialization is done, because it is usual to see
lots of inexistent LUNs which need to be probed.

However, turns out it isn't safe to quiesce queue only when queue
initialization is done. Because when one SCSI command is completed,
the user of sending command can be waken up immediately, then the
scsi device may be removed, meantime the run queue in scsi_end_request()
is still in-progress, so kernel panic can be caused.

In Red Hat QE lab, there are several reports about this kind of kernel
panic triggered during kernel booting.

This patch tries to address the issue by grabing one queue usage
counter during freeing one request and the following run queue.

Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
Cc: Andrew Jones 
Cc: Bart Van Assche 
Cc: linux-scsi@vger.kernel.org
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: James E.J. Bottomley 
Cc: stable 
Cc: jianchao.wang 
Signed-off-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block, bfq: correctly charge and reset entity service in all cases

2018-11-13T19:12:24+00:00

[ Upstream commit cbeb869a3d1110450186b738199963c5e68c2a71 ]

BFQ schedules entities (which represent either per-process queues or
groups of queues) as a function of their timestamps. In particular, as
a function of their (virtual) finish times. The finish time of an
entity is computed as a function of the budget assigned to the entity,
assuming, tentatively, that the entity, once in service, will receive
an amount of service equal to its budget. Then, when the entity is
expired because it finishes to be served, this finish time is updated
as a function of the actual service received by the entity. This
allows the entity to be correctly charged with only the service
received, and then to be correctly re-scheduled.

Yet an entity may receive service also while not being the entity in
service (in the scheduling environment of its parent entity), for
several reasons. If the entity remains with no backlog while receiving
this 'unofficial' service, then it is expired. Also on such an
expiration, the finish time of the entity should be updated to account
for only the service actually received by the entity. Unfortunately,
such an update is not performed for an entity expiring without being
the entity in service.

In a similar vein, the service counter of the entity in service is
reset when the entity is expired, to be ready to be used for next
service cycle. This reset too should be performed also in case an
entity is expired because it remains empty after receiving service
while not being the entity in service. But in this case the reset is
not performed.

This commit performs the above update of the finish time and reset of
the service received, also for an entity expiring while not being the
entity in service.

Signed-off-by: Paolo Valente 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

block: make sure writesame bio is aligned with logical block size

2018-11-13T19:12:12+00:00

commit 34ffec60b27aa81d04e274e71e4c6ef740f75fc7 upstream.

Obviously the created writesame bio has to be aligned with logical block
size, and use bio_allowed_max_sectors() to retrieve this number.

Cc: stable@vger.kernel.org
Cc: Mike Snitzer 
Cc: Christoph Hellwig 
Cc: Xiao Ni 
Cc: Mariusz Dabrowski 
Fixes: b49a0871be31a745b2ef ("block: remove split code in blkdev_issue_{discard,write_same}")
Tested-by: Rui Salvaterra 
Signed-off-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: make sure discard bio is aligned with logical block size

2018-11-13T19:12:12+00:00

commit 1adfc5e4136f5967d591c399aff95b3b035f16b7 upstream.

Obviously the created discard bio has to be aligned with logical block size.

This patch introduces the helper of bio_allowed_max_sectors() for
this purpose.

Cc: stable@vger.kernel.org
Cc: Mike Snitzer 
Cc: Christoph Hellwig 
Cc: Xiao Ni 
Cc: Mariusz Dabrowski 
Fixes: 744889b7cbb56a6 ("block: don't deal with discard limit in blkdev_issue_discard()")
Fixes: a22c4d7e34402cc ("block: re-add discard_granularity and alignment checks")
Reported-by: Rui Salvaterra 
Tested-by: Rui Salvaterra 
Signed-off-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: don't deal with discard limit in blkdev_issue_discard()

2018-11-13T19:12:11+00:00

commit 744889b7cbb56a64f957e65ade7cb65fe3f35714 upstream.

blk_queue_split() does respect this limit via bio splitting, so no
need to do that in blkdev_issue_discard(), then we can align to
normal bio submit(bio_add_page() & submit_bio()).

More importantly, this patch fixes one issue introduced in a22c4d7e34402cc
("block: re-add discard_granularity and alignment checks"), in which
zero discard bio may be generated in case of zero alignment.

Fixes: a22c4d7e34402ccdf3 ("block: re-add discard_granularity and alignment checks")
Cc: stable@vger.kernel.org
Cc: Ming Lin 
Cc: Mike Snitzer 
Cc: Christoph Hellwig 
Cc: Xiao Ni 
Tested-by: Mariusz Dabrowski 
Signed-off-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: setup bounce bio_sets properly

2018-11-13T19:12:11+00:00

commit 52990a5fb0c991ecafebdab43138b5ed41376852 upstream.

We're only setting up the bounce bio sets if we happen
to need bouncing for regular HIGHMEM, not if we only need
it for ISA devices.

Protect the ISA bounce setup with a mutex, since it's
being invoked from driver init functions and can thus be
called in parallel.

Cc: stable@vger.kernel.org
Reported-by: Ondrej Zary 
Tested-by: Ondrej Zary 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blk-mq: I/O and timer unplugs are inverted in blktrace

2018-10-13T07:33:09+00:00

commit 587562d0c7cd6861f4f90a2eb811cccb1a376f5f upstream.

trace_block_unplug() takes true for explicit unplugs and false for
implicit unplugs.  schedule() unplugs are implicit and should be
reported as timer unplugs.  While correct in the legacy code, this has
been inverted in blk-mq since 4.11.

Cc: stable@vger.kernel.org
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Reviewed-by: Omar Sandoval 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"

2018-10-10T06:55:54+00:00

[ Upstream commit 6b06546206868f723f2061d703a3c3c378dcbf4c ]

This reverts commit 4c6994806f708559c2812b73501406e21ae5dcd0.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c6994806f70) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c6994806f70 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Reviewed-by: Josef Bacik 
Signed-off-by: Dennis Zhou 
Cc: Jiufei Xue 
Cc: Joseph Qi 
Cc: Tejun Heo 
Cc: Jens Axboe 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

block: fix deadline elevator drain for zoned block devices

2018-10-03T23:59:17+00:00

commit 854f31ccdd7964c9c2e68da234a3a8aedb51cf6b upstream.

When the deadline scheduler is used with a zoned block device, writes
to a zone will be dispatched one at a time. This causes the warning
message:

deadline: forced dispatching is broken (nr_sorted=X), please report this

to be displayed when switching to another elevator with the legacy I/O
path while write requests to a zone are being retained in the scheduler
queue.

Prevent this message from being displayed when executing
elv_drain_elevator() for a zoned block device. __blk_drain_queue() will
loop until all writes are dispatched and completed, resulting in the
desired elevator queue drain without extensive modifications to the
deadline code itself to handle forced-dispatch calls.

Signed-off-by: Damien Le Moal 
Fixes: 8dc8146f9c92 ("deadline-iosched: Introduce zone locking support")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()

2018-09-26T06:39:43+00:00

[ Upstream commit 1311326cf4755c7ffefd20f576144ecf46d9906b ]

SCSI probing may synchronously create and destroy a lot of request_queues
for non-existent devices. Any synchronize_rcu() in queue creation or
destroy path may introduce long latency during booting, see detailed
description in comment of blk_register_queue().

This patch removes one synchronize_rcu() inside blk_cleanup_queue()
for this case, commit c2856ae2f315d75(blk-mq: quiesce queue before freeing queue)
needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but
when queue isn't initialized, it isn't necessary to do that since
only pass-through requests are involved, no original issue in
scsi_execute() at all.

Without this patch and previous one, it may take more 20+ seconds for
virtio-scsi to complete disk probe. With the two patches, the time becomes
less than 100ms.

Fixes: c2856ae2f315d75 ("blk-mq: quiesce queue before freeing queue")
Reported-by: Andrew Jones 
Cc: Omar Sandoval 
Cc: Bart Van Assche 
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" 
Cc: Christoph Hellwig 
Tested-by: Andrew Jones 
Signed-off-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman