<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/block, branch linux-3.16.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>block: fix an integer overflow in logical block size</title>
<updated>2020-04-28T18:03:28+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2020-01-15T13:35:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f042797e7c9dd3d3bef34ad547c4516e3a191741'/>
<id>f042797e7c9dd3d3bef34ad547c4516e3a191741</id>
<content type='text'>
commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.

Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
  device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
  EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.

Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
  device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
  EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: make sure that line break can be printed</title>
<updated>2020-02-11T20:03:25+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-11-04T08:26:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e311c02df0d3d1c96d2702e1f628bb694d2c990a'/>
<id>e311c02df0d3d1c96d2702e1f628bb694d2c990a</id>
<content type='text'>
commit d2c9be89f8ebe7ebcc97676ac40f8dec1cf9b43a upstream.

8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.

Fixes: 8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d2c9be89f8ebe7ebcc97676ac40f8dec1cf9b43a upstream.

8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.

Fixes: 8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: avoid sysfs buffer overflow with too many CPU cores</title>
<updated>2020-02-11T20:03:24+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-11-02T08:02:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bd775ed7c79b77e1f0a159d2a4b3efecf5a5ff79'/>
<id>bd775ed7c79b77e1f0a159d2a4b3efecf5a5ff79</id>
<content type='text'>
commit 8962842ca5abdcf98e22ab3b2b45a103f0408b95 upstream.

It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(&gt;841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.

Use snprintf to avoid the potential buffer overflow.

This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.

Fixes: 676141e48af7("blk-mq: don't dump CPU -&gt; hw queue map on driver load")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8962842ca5abdcf98e22ab3b2b45a103f0408b95 upstream.

It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(&gt;841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.

Use snprintf to avoid the potential buffer overflow.

This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.

Fixes: 676141e48af7("blk-mq: don't dump CPU -&gt; hw queue map on driver load")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq: fix deadlock when reading cpu_list</title>
<updated>2020-02-11T20:03:24+00:00</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2015-09-26T17:09:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b843cd306a170d007f1438e67d7282095e6f8126'/>
<id>b843cd306a170d007f1438e67d7282095e6f8126</id>
<content type='text'>
commit 60de074ba1e8f327db19bc33d8530131ac01695d upstream.

CPU hotplug handling for blk-mq (blk_mq_queue_reinit) acquires
all_q_mutex in blk_mq_queue_reinit_notify() and then removes sysfs
entries by blk_mq_sysfs_unregister().  Removing sysfs entry needs to
be blocked until the active reference of the kernfs_node to be zero.

On the other hand, reading blk_mq_hw_sysfs_cpu sysfs entry (e.g.
/sys/block/nullb0/mq/0/cpu_list) acquires all_q_mutex in
blk_mq_hw_sysfs_cpus_show().

If these happen at the same time, a deadlock can happen.  Because one
can wait for the active reference to be zero with holding all_q_mutex,
and the other tries to acquire all_q_mutex with holding the active
reference.

The reason that all_q_mutex is acquired in blk_mq_hw_sysfs_cpus_show()
is to avoid reading an imcomplete hctx-&gt;cpumask.  Since reading sysfs
entry for blk-mq needs to acquire q-&gt;sysfs_lock, we can avoid deadlock
and reading an imcomplete hctx-&gt;cpumask by protecting q-&gt;sysfs_lock
while hctx-&gt;cpumask is being updated.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 60de074ba1e8f327db19bc33d8530131ac01695d upstream.

CPU hotplug handling for blk-mq (blk_mq_queue_reinit) acquires
all_q_mutex in blk_mq_queue_reinit_notify() and then removes sysfs
entries by blk_mq_sysfs_unregister().  Removing sysfs entry needs to
be blocked until the active reference of the kernfs_node to be zero.

On the other hand, reading blk_mq_hw_sysfs_cpu sysfs entry (e.g.
/sys/block/nullb0/mq/0/cpu_list) acquires all_q_mutex in
blk_mq_hw_sysfs_cpus_show().

If these happen at the same time, a deadlock can happen.  Because one
can wait for the active reference to be zero with holding all_q_mutex,
and the other tries to acquire all_q_mutex with holding the active
reference.

The reason that all_q_mutex is acquired in blk_mq_hw_sysfs_cpus_show()
is to avoid reading an imcomplete hctx-&gt;cpumask.  Since reading sysfs
entry for blk-mq needs to acquire q-&gt;sysfs_lock, we can avoid deadlock
and reading an imcomplete hctx-&gt;cpumask by protecting q-&gt;sysfs_lock
while hctx-&gt;cpumask is being updated.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sbitmap: fix improper use of smp_mb__before_atomic()</title>
<updated>2019-10-05T15:19:44+00:00</updated>
<author>
<name>Andrea Parri</name>
<email>andrea.parri@amarulasolutions.com</email>
</author>
<published>2019-05-20T17:23:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=25a7ef987b00844ea30e88cec82e78b23e8546b3'/>
<id>25a7ef987b00844ea30e88cec82e78b23e8546b3</id>
<content type='text'>
commit a0934fd2b1208458e55fc4b48f55889809fce666 upstream.

This barrier only applies to the read-modify-write operations; in
particular, it does not apply to the atomic_set() primitive.

Replace the barrier with an smp_mb().

Fixes: 6c0ca7ae292ad ("sbitmap: fix wakeup hang after sbq resize")
Reported-by: "Paul E. McKenney" &lt;paulmck@linux.ibm.com&gt;
Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Omar Sandoval &lt;osandov@fb.com&gt;
Cc: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: linux-block@vger.kernel.org
Cc: "Paul E. McKenney" &lt;paulmck@linux.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a0934fd2b1208458e55fc4b48f55889809fce666 upstream.

This barrier only applies to the read-modify-write operations; in
particular, it does not apply to the atomic_set() primitive.

Replace the barrier with an smp_mb().

Fixes: 6c0ca7ae292ad ("sbitmap: fix wakeup hang after sbq resize")
Reported-by: "Paul E. McKenney" &lt;paulmck@linux.ibm.com&gt;
Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Omar Sandoval &lt;osandov@fb.com&gt;
Cc: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: linux-block@vger.kernel.org
Cc: "Paul E. McKenney" &lt;paulmck@linux.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: do not leak memory in bio_copy_user_iov()</title>
<updated>2019-08-13T11:39:04+00:00</updated>
<author>
<name>Jérôme Glisse</name>
<email>jglisse@redhat.com</email>
</author>
<published>2019-04-10T20:27:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=06b4ee3a744ff3193d5fc172e6300c3917a734d8'/>
<id>06b4ee3a744ff3193d5fc172e6300c3917a734d8</id>
<content type='text'>
commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.

When bio_add_pc_page() fails in bio_copy_user_iov() we should free
the page we just allocated otherwise we are leaking it.

Cc: linux-block@vger.kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Chaitanya Kulkarni &lt;chaitanya.kulkarni@wdc.com&gt;
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a3761c3c91209b58b6f33bf69dd8bb8ec0c9d925 upstream.

When bio_add_pc_page() fails in bio_copy_user_iov() we should free
the page we just allocated otherwise we are leaking it.

Cc: linux-block@vger.kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Chaitanya Kulkarni &lt;chaitanya.kulkarni@wdc.com&gt;
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>partitions/aix: append null character to print data from disk</title>
<updated>2018-12-16T22:08:28+00:00</updated>
<author>
<name>Mauricio Faria de Oliveira</name>
<email>mfo@canonical.com</email>
</author>
<published>2018-07-26T01:46:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c87826868f5b09cc724fbc8396e1f19db756ebcd'/>
<id>c87826868f5b09cc724fbc8396e1f19db756ebcd</id>
<content type='text'>
commit d43fdae7bac2def8c4314b5a49822cb7f08a45f1 upstream.

Even if properly initialized, the lvname array (i.e., strings)
is read from disk, and might contain corrupt data (e.g., lack
the null terminating character for strings).

So, make sure the partition name string used in pr_warn() has
the null terminating character.

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Suggested-by: Daniel J. Axtens &lt;daniel.axtens@canonical.com&gt;
Signed-off-by: Mauricio Faria de Oliveira &lt;mfo@canonical.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit d43fdae7bac2def8c4314b5a49822cb7f08a45f1 upstream.

Even if properly initialized, the lvname array (i.e., strings)
is read from disk, and might contain corrupt data (e.g., lack
the null terminating character for strings).

So, make sure the partition name string used in pr_warn() has
the null terminating character.

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Suggested-by: Daniel J. Axtens &lt;daniel.axtens@canonical.com&gt;
Signed-off-by: Mauricio Faria de Oliveira &lt;mfo@canonical.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>partitions/aix: fix usage of uninitialized lv_info and lvname structures</title>
<updated>2018-12-16T22:08:28+00:00</updated>
<author>
<name>Mauricio Faria de Oliveira</name>
<email>mfo@canonical.com</email>
</author>
<published>2018-07-26T01:46:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=452a0aac138b603626c2af666d1626e49e3dbf75'/>
<id>452a0aac138b603626c2af666d1626e49e3dbf75</id>
<content type='text'>
commit 14cb2c8a6c5dae57ee3e2da10fa3db2b9087e39e upstream.

The if-block that sets a successful return value in aix_partition()
uses 'lvip[].pps_per_lv' and 'n[].name' potentially uninitialized.

For example, if 'numlvs' is zero or alloc_lvn() fails, neither is
initialized, but are used anyway if alloc_pvd() succeeds after it.

So, make the alloc_pvd() call conditional on their initialization.

This has been hit when attaching an apparently corrupted/stressed
AIX LUN, misleading the kernel to pr_warn() invalid data and hang.

    [...] partition (null) (11 pp's found) is not contiguous
    [...] partition (null) (2 pp's found) is not contiguous
    [...] partition (null) (3 pp's found) is not contiguous
    [...] partition (null) (64 pp's found) is not contiguous

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Signed-off-by: Mauricio Faria de Oliveira &lt;mfo@canonical.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 14cb2c8a6c5dae57ee3e2da10fa3db2b9087e39e upstream.

The if-block that sets a successful return value in aix_partition()
uses 'lvip[].pps_per_lv' and 'n[].name' potentially uninitialized.

For example, if 'numlvs' is zero or alloc_lvn() fails, neither is
initialized, but are used anyway if alloc_pvd() succeeds after it.

So, make the alloc_pvd() call conditional on their initialization.

This has been hit when attaching an apparently corrupted/stressed
AIX LUN, misleading the kernel to pr_warn() invalid data and hang.

    [...] partition (null) (11 pp's found) is not contiguous
    [...] partition (null) (2 pp's found) is not contiguous
    [...] partition (null) (3 pp's found) is not contiguous
    [...] partition (null) (64 pp's found) is not contiguous

Fixes: 6ceea22bbbc8 ("partitions: add aix lvm partition support files")
Signed-off-by: Mauricio Faria de Oliveira &lt;mfo@canonical.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: move bio_integrity_{intervals,bytes} into blkdev.h</title>
<updated>2018-12-16T22:08:26+00:00</updated>
<author>
<name>Greg Edwards</name>
<email>gedwards@ddn.com</email>
</author>
<published>2018-07-25T14:22:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ed1a4f65447b5475afaf2f13a74cddf332e3e5ad'/>
<id>ed1a4f65447b5475afaf2f13a74cddf332e3e5ad</id>
<content type='text'>
commit 359f642700f2ff05d9c94cd9216c97af7b8e9553 upstream.

This allows bio_integrity_bytes() to be called from drivers instead of
open coding it.

Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Greg Edwards &lt;gedwards@ddn.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16: bio_integrity_intervals() was called
 bio_integrity_hw_sectors() and had a different implementation.  Move it
 without renaming.]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 359f642700f2ff05d9c94cd9216c97af7b8e9553 upstream.

This allows bio_integrity_bytes() to be called from drivers instead of
open coding it.

Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Greg Edwards &lt;gedwards@ddn.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16: bio_integrity_intervals() was called
 bio_integrity_hw_sectors() and had a different implementation.  Move it
 without renaming.]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sbitmap: fix race in wait batch accounting</title>
<updated>2018-11-20T18:04:55+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-05-14T18:17:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f6c2bb725ab6cb8fb0b3feacb5a0cf60e7fd4942'/>
<id>f6c2bb725ab6cb8fb0b3feacb5a0cf60e7fd4942</id>
<content type='text'>
commit c854ab5773be1c1a0d3cef0c3a3261f2c48ab7f8 upstream.

If we have multiple callers of sbq_wake_up(), we can end up in a
situation where the wait_cnt will continually go more and more
negative. Consider the case where our wake batch is 1, hence
wait_cnt will start out as 1.

wait_cnt == 1

CPU0				CPU1
atomic_dec_return(), cnt == 0
				atomic_dec_return(), cnt == -1
				cmpxchg(-1, 0) (succeeds)
				[wait_cnt now 0]
cmpxchg(0, 1) (fails)

This ends up with wait_cnt being 0, we'll wakeup immediately
next time. Going through the same loop as above again, and
we'll have wait_cnt -1.

For the case where we have a larger wake batch, the only
difference is that the starting point will be higher. We'll
still end up with continually smaller batch wakeups, which
defeats the purpose of the rolling wakeups.

Always reset the wait_cnt to the batch value. Then it doesn't
matter who wins the race. But ensure that whomever does win
the race is the one that increments the ws index and wakes up
our batch count, loser gets to call __sbq_wake_up() again to
account his wakeups towards the next active wait state index.

Fixes: 6c0ca7ae292a ("sbitmap: fix wakeup hang after sbq resize")
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16:
 - Rename almost everything
 - Adjust filename, context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c854ab5773be1c1a0d3cef0c3a3261f2c48ab7f8 upstream.

If we have multiple callers of sbq_wake_up(), we can end up in a
situation where the wait_cnt will continually go more and more
negative. Consider the case where our wake batch is 1, hence
wait_cnt will start out as 1.

wait_cnt == 1

CPU0				CPU1
atomic_dec_return(), cnt == 0
				atomic_dec_return(), cnt == -1
				cmpxchg(-1, 0) (succeeds)
				[wait_cnt now 0]
cmpxchg(0, 1) (fails)

This ends up with wait_cnt being 0, we'll wakeup immediately
next time. Going through the same loop as above again, and
we'll have wait_cnt -1.

For the case where we have a larger wake batch, the only
difference is that the starting point will be higher. We'll
still end up with continually smaller batch wakeups, which
defeats the purpose of the rolling wakeups.

Always reset the wait_cnt to the batch value. Then it doesn't
matter who wins the race. But ensure that whomever does win
the race is the one that increments the ws index and wakes up
our batch count, loser gets to call __sbq_wake_up() again to
account his wakeups towards the next active wait state index.

Fixes: 6c0ca7ae292a ("sbitmap: fix wakeup hang after sbq resize")
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[bwh: Backported to 3.16:
 - Rename almost everything
 - Adjust filename, context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
