<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/block, branch v5.4-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>block: sed-opal: fix sparse warning: convert __be64 data</title>
<updated>2019-10-03T20:21:32+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2019-10-03T02:23:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a9eb49c964884654dd6394cb6abe7ceb021c9c96'/>
<id>a9eb49c964884654dd6394cb6abe7ceb021c9c96</id>
<content type='text'>
sparse warns about incorrect type when using __be64 data.
It is not being converted to CPU-endian but it should be.

Fixes these sparse warnings:

../block/sed-opal.c:375:20: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:375:20:    expected unsigned long long [usertype] align
../block/sed-opal.c:375:20:    got restricted __be64 const [usertype] alignment_granularity
../block/sed-opal.c:376:25: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:376:25:    expected unsigned long long [usertype] lowest_lba
../block/sed-opal.c:376:25:    got restricted __be64 const [usertype] lowest_aligned_lba

Fixes: 455a7b238cd6 ("block: Add Sed-opal library")
Cc: Scott Bauer &lt;scott.bauer@intel.com&gt;
Cc: Rafael Antognolli &lt;rafael.antognolli@intel.com&gt;
Cc: linux-block@vger.kernel.org
Reviewed-by: Jon Derrick &lt;jonathan.derrick@intel.com&gt;
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
sparse warns about incorrect type when using __be64 data.
It is not being converted to CPU-endian but it should be.

Fixes these sparse warnings:

../block/sed-opal.c:375:20: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:375:20:    expected unsigned long long [usertype] align
../block/sed-opal.c:375:20:    got restricted __be64 const [usertype] alignment_granularity
../block/sed-opal.c:376:25: warning: incorrect type in assignment (different base types)
../block/sed-opal.c:376:25:    expected unsigned long long [usertype] lowest_lba
../block/sed-opal.c:376:25:    got restricted __be64 const [usertype] lowest_aligned_lba

Fixes: 455a7b238cd6 ("block: Add Sed-opal library")
Cc: Scott Bauer &lt;scott.bauer@intel.com&gt;
Cc: Rafael Antognolli &lt;rafael.antognolli@intel.com&gt;
Cc: linux-block@vger.kernel.org
Reviewed-by: Jon Derrick &lt;jonathan.derrick@intel.com&gt;
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: sed-opal: fix sparse warning: obsolete array init.</title>
<updated>2019-10-03T20:21:30+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2019-10-03T02:23:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dc301025658aef326763765b0bebeaa44569ed20'/>
<id>dc301025658aef326763765b0bebeaa44569ed20</id>
<content type='text'>
Fix sparse warning: (missing '=')
../block/sed-opal.c:133:17: warning: obsolete array initializer, use C99 syntax

Fixes: ff91064ea37c ("block: sed-opal: check size of shadow mbr")
Cc: linux-block@vger.kernel.org
Cc: Jonas Rabenstein &lt;jonas.rabenstein@studium.uni-erlangen.de&gt;
Cc: David Kozub &lt;zub@linux.fjfi.cvut.cz&gt;
Reviewed-by: Scott Bauer &lt;sbauer@plzdonthack.me&gt;
Reviewed-by:  Revanth Rajashekar &lt;revanth.rajashekar@intel.com&gt;
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix sparse warning: (missing '=')
../block/sed-opal.c:133:17: warning: obsolete array initializer, use C99 syntax

Fixes: ff91064ea37c ("block: sed-opal: check size of shadow mbr")
Cc: linux-block@vger.kernel.org
Cc: Jonas Rabenstein &lt;jonas.rabenstein@studium.uni-erlangen.de&gt;
Cc: David Kozub &lt;zub@linux.fjfi.cvut.cz&gt;
Reviewed-by: Scott Bauer &lt;sbauer@plzdonthack.me&gt;
Reviewed-by:  Revanth Rajashekar &lt;revanth.rajashekar@intel.com&gt;
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: apply normal plugging for HDD</title>
<updated>2019-09-27T17:40:21+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-09-27T07:24:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3154df262db52f1d27e01020e871f4345ec2f9a8'/>
<id>3154df262db52f1d27e01020e871f4345ec2f9a8</id>
<content type='text'>
Some HDD drive may expose multiple hardware queues, such as MegraRaid.
Let's apply the normal plugging for such devices because sequential IO
may benefit a lot from plug merging.

Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some HDD drive may expose multiple hardware queues, such as MegraRaid.
Let's apply the normal plugging for such devices because sequential IO
may benefit a lot from plug merging.

Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: honor IO scheduler for multiqueue devices</title>
<updated>2019-09-27T17:38:28+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-09-27T07:24:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a12de1d42d74ef3c80e9fb9a2da94daaef747869'/>
<id>a12de1d42d74ef3c80e9fb9a2da94daaef747869</id>
<content type='text'>
If a device is using multiple queues, the IO scheduler may be bypassed.
This may hurt performance for some slow MQ devices, and it also breaks
zoned devices which depend on mq-deadline for respecting the write order
in one zone.

Don't bypass io scheduler if we have one setup.

This patch can double sequential write performance basically on MQ
scsi_debug when mq-deadline is applied.

Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Javier González &lt;javier@javigon.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a device is using multiple queues, the IO scheduler may be bypassed.
This may hurt performance for some slow MQ devices, and it also breaks
zoned devices which depend on mq-deadline for respecting the write order
in one zone.

Don't bypass io scheduler if we have one setup.

This patch can double sequential write performance basically on MQ
scsi_debug when mq-deadline is applied.

Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Javier González &lt;javier@javigon.com&gt;
Reviewed-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix null pointer dereference in blk_mq_rq_timed_out()</title>
<updated>2019-09-27T13:01:25+00:00</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2019-09-27T08:19:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8d6996630c03d7ceeabe2611378fea5ca1c3f1b3'/>
<id>8d6996630c03d7ceeabe2611378fea5ca1c3f1b3</id>
<content type='text'>
We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:

[  108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[  108.827059] PGD 0 P4D 0
[  108.827313] Oops: 0000 [#1] SMP PTI
[  108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[  108.829503] Workqueue: kblockd blk_mq_timeout_work
[  108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[  108.838191] Call Trace:
[  108.838406]  bt_iter+0x74/0x80
[  108.838665]  blk_mq_queue_tag_busy_iter+0x204/0x450
[  108.839074]  ? __switch_to_asm+0x34/0x70
[  108.839405]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.839823]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.840273]  ? syscall_return_via_sysret+0xf/0x7f
[  108.840732]  blk_mq_timeout_work+0x74/0x200
[  108.841151]  process_one_work+0x297/0x680
[  108.841550]  worker_thread+0x29c/0x6f0
[  108.841926]  ? rescuer_thread+0x580/0x580
[  108.842344]  kthread+0x16a/0x1a0
[  108.842666]  ? kthread_flush_work+0x170/0x170
[  108.843100]  ret_from_fork+0x35/0x40

The bug is caused by the race between timeout handle and completion for
flush request.

When timeout handle function blk_mq_rq_timed_out() try to read
'req-&gt;q-&gt;mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.

After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq-&gt;ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.

However, flush request has defined .end_io and rq-&gt;end_io() is still
called even if 'rq-&gt;ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.

We fix this problem by covering flush request with 'rq-&gt;ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().

Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: stable@vger.kernel.org # v4.18+
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Bob Liu &lt;bob.liu@oracle.com&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;

-------
v2:
 - move rq_status from struct request to struct blk_flush_queue
v3:
 - remove unnecessary '{}' pair.
v4:
 - let spinlock to protect 'fq-&gt;rq_status'
v5:
 - move rq_status after flush_running_idx member of struct blk_flush_queue
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
as following:

[  108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
[  108.827059] PGD 0 P4D 0
[  108.827313] Oops: 0000 [#1] SMP PTI
[  108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
[  108.829503] Workqueue: kblockd blk_mq_timeout_work
[  108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
[  108.838191] Call Trace:
[  108.838406]  bt_iter+0x74/0x80
[  108.838665]  blk_mq_queue_tag_busy_iter+0x204/0x450
[  108.839074]  ? __switch_to_asm+0x34/0x70
[  108.839405]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.839823]  ? blk_mq_stop_hw_queue+0x40/0x40
[  108.840273]  ? syscall_return_via_sysret+0xf/0x7f
[  108.840732]  blk_mq_timeout_work+0x74/0x200
[  108.841151]  process_one_work+0x297/0x680
[  108.841550]  worker_thread+0x29c/0x6f0
[  108.841926]  ? rescuer_thread+0x580/0x580
[  108.842344]  kthread+0x16a/0x1a0
[  108.842666]  ? kthread_flush_work+0x170/0x170
[  108.843100]  ret_from_fork+0x35/0x40

The bug is caused by the race between timeout handle and completion for
flush request.

When timeout handle function blk_mq_rq_timed_out() try to read
'req-&gt;q-&gt;mq_ops', the 'req' have completed and reinitiated by next
flush request, which would call blk_rq_init() to clear 'req' as 0.

After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"),
normal requests lifetime are protected by refcount. Until 'rq-&gt;ref'
drop to zero, the request can really be free. Thus, these requests
cannot been reused before timeout handle finish.

However, flush request has defined .end_io and rq-&gt;end_io() is still
called even if 'rq-&gt;ref' doesn't drop to zero. After that, the 'flush_rq'
can be reused by the next flush request handle, resulting in null
pointer deference BUG ON.

We fix this problem by covering flush request with 'rq-&gt;ref'.
If the refcount is not zero, flush_end_io() return and wait the
last holder recall it. To record the request status, we add a new
entry 'rq_status', which will be used in flush_end_io().

Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: stable@vger.kernel.org # v4.18+
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Bob Liu &lt;bob.liu@oracle.com&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;

-------
v2:
 - move rq_status from struct request to struct blk_flush_queue
v3:
 - remove unnecessary '{}' pair.
v4:
 - let spinlock to protect 'fq-&gt;rq_status'
v5:
 - move rq_status after flush_running_idx member of struct blk_flush_queue
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rq-qos: get rid of redundant wbt_update_limits()</title>
<updated>2019-09-27T07:13:10+00:00</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2019-09-17T12:04:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2af2783f2ea4f273598557b247e320509247df80'/>
<id>2af2783f2ea4f273598557b247e320509247df80</id>
<content type='text'>
We have updated limits after calling wbt_set_min_lat(). No need to
update again.

Reviewed-by: Bob Liu &lt;bob.liu@oracle.com&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have updated limits after calling wbt_set_min_lat(). No need to
update again.

Reviewed-by: Bob Liu &lt;bob.liu@oracle.com&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iocost: bump up default latency targets for hard disks</title>
<updated>2019-09-26T07:12:01+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-09-25T23:03:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7afcccafa59fb63b58f863a6c5e603a34625955b'/>
<id>7afcccafa59fb63b58f863a6c5e603a34625955b</id>
<content type='text'>
The default hard disk param sets latency targets at 50ms.  As the
default target percentiles are zero, these don't directly regulate
vrate; however, they're still used to calculate the period length -
100ms in this case.

This is excessively low.  A SATA drive with QD32 saturated with random
IOs can easily reach avg completion latency of several hundred msecs.
A period duration which is substantially lower than avg completion
latency can lead to wildly fluctuating vrate.

Let's bump up the default latency targets to 250ms so that the period
duration is sufficiently long.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The default hard disk param sets latency targets at 50ms.  As the
default target percentiles are zero, these don't directly regulate
vrate; however, they're still used to calculate the period length -
100ms in this case.

This is excessively low.  A SATA drive with QD32 saturated with random
IOs can easily reach avg completion latency of several hundred msecs.
A period duration which is substantially lower than avg completion
latency can lead to wildly fluctuating vrate.

Let's bump up the default latency targets to 250ms so that the period
duration is sufficiently long.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iocost: improve nr_lagging handling</title>
<updated>2019-09-26T07:12:00+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-09-25T23:03:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7cd806a9a953f234b9865c30028f47fd738ce375'/>
<id>7cd806a9a953f234b9865c30028f47fd738ce375</id>
<content type='text'>
Some IOs may span multiple periods.  As latencies are collected on
completion, the inbetween periods won't register them and may
incorrectly decide to increase vrate.  nr_lagging tracks these IOs to
avoid those situations.  Currently, whenever there are IOs which are
spanning from the previous period, busy_level is reset to 0 if
negative thus suppressing vrate increase.

This has the following two problems.

* When latency target percentiles aren't set, vrate adjustment should
  only be governed by queue depth depletion; however, the current code
  keeps nr_lagging active which pulls in latency results and can keep
  down vrate unexpectedly.

* When lagging condition is detected, it resets the entire negative
  busy_level.  This turned out to be way too aggressive on some
  devices which sometimes experience extended latencies on a small
  subset of commands.  In addition, a lagging IO will be accounted as
  latency target miss on completion anyway and resetting busy_level
  amplifies its impact unnecessarily.

This patch fixes the above two problems by disabling nr_lagging
counting when latency target percentiles aren't set and blocking vrate
increases when there are lagging IOs while leaving busy_level as-is.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some IOs may span multiple periods.  As latencies are collected on
completion, the inbetween periods won't register them and may
incorrectly decide to increase vrate.  nr_lagging tracks these IOs to
avoid those situations.  Currently, whenever there are IOs which are
spanning from the previous period, busy_level is reset to 0 if
negative thus suppressing vrate increase.

This has the following two problems.

* When latency target percentiles aren't set, vrate adjustment should
  only be governed by queue depth depletion; however, the current code
  keeps nr_lagging active which pulls in latency results and can keep
  down vrate unexpectedly.

* When lagging condition is detected, it resets the entire negative
  busy_level.  This turned out to be way too aggressive on some
  devices which sometimes experience extended latencies on a small
  subset of commands.  In addition, a lagging IO will be accounted as
  latency target miss on completion anyway and resetting busy_level
  amplifies its impact unnecessarily.

This patch fixes the above two problems by disabling nr_lagging
counting when latency target percentiles aren't set and blocking vrate
increases when there are lagging IOs while leaving busy_level as-is.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iocost: better trace vrate changes</title>
<updated>2019-09-26T07:11:58+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-09-25T23:02:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=25d41e4aadb0788b4fae8a8fca90f437b9ebd727'/>
<id>25d41e4aadb0788b4fae8a8fca90f437b9ebd727</id>
<content type='text'>
vrate_adj tracepoint traces vrate changes; however, it does so only
when busy_level is non-zero.  busy_level turning to zero can sometimes
be as interesting an event.  This patch also enables vrate_adj
tracepoint on other vrate related events - busy_level changes and
non-zero nr_lagging.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
vrate_adj tracepoint traces vrate changes; however, it does so only
when busy_level is non-zero.  busy_level turning to zero can sometimes
be as interesting an event.  This patch also enables vrate_adj
tracepoint on other vrate related events - busy_level changes and
non-zero nr_lagging.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: don't release queue's sysfs lock during switching elevator</title>
<updated>2019-09-26T06:45:51+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-09-23T15:12:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b89f625e28d44552083f43752f62d8621ded0a04'/>
<id>b89f625e28d44552083f43752f62d8621ded0a04</id>
<content type='text'>
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
release &amp; acquire sysfs_lock before registering/un-registering elevator
queue during switching elevator for avoiding potential deadlock from
showing &amp; storing 'queue/iosched' attributes and removing elevator's
kobject.

Turns out there isn't such deadlock because 'q-&gt;sysfs_lock' isn't
required in .show &amp; .store of queue/iosched's attributes, and just
elevator's sysfs lock is acquired in elv_iosched_store() and
elv_iosched_show(). So it is safe to hold queue's sysfs lock when
registering/un-registering elevator queue.

The biggest issue is that commit cecf5d87ff20 assumes that concurrent
write on 'queue/scheduler' can't happen. However, this assumption isn't
true, because kernfs_fop_write() only guarantees that concurrent write
aren't called on the same open file, but the write could be from
different open on the file. So we can't release &amp; re-acquire queue's
sysfs lock during switching elevator, otherwise use-after-free on
elevator could be triggered.

Fixes the issue by not releasing queue's sysfs lock during switching
elevator.

Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks")
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Greg KH &lt;gregkh@linuxfoundation.org&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
release &amp; acquire sysfs_lock before registering/un-registering elevator
queue during switching elevator for avoiding potential deadlock from
showing &amp; storing 'queue/iosched' attributes and removing elevator's
kobject.

Turns out there isn't such deadlock because 'q-&gt;sysfs_lock' isn't
required in .show &amp; .store of queue/iosched's attributes, and just
elevator's sysfs lock is acquired in elv_iosched_store() and
elv_iosched_show(). So it is safe to hold queue's sysfs lock when
registering/un-registering elevator queue.

The biggest issue is that commit cecf5d87ff20 assumes that concurrent
write on 'queue/scheduler' can't happen. However, this assumption isn't
true, because kernfs_fop_write() only guarantees that concurrent write
aren't called on the same open file, but the write could be from
different open on the file. So we can't release &amp; re-acquire queue's
sysfs lock during switching elevator, otherwise use-after-free on
elevator could be triggered.

Fixes the issue by not releasing queue's sysfs lock during switching
elevator.

Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks")
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Greg KH &lt;gregkh@linuxfoundation.org&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
