<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/block/genhd.c, branch linux-4.4.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>block: fix use-after-free in disk_part_iter_next</title>
<updated>2021-01-17T12:55:14+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2020-12-21T04:33:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a09652089d496a4ecb925e2cb95a6523f22ae538'/>
<id>a09652089d496a4ecb925e2cb95a6523f22ae538</id>
<content type='text'>
commit aebf5db917055b38f4945ed6d621d9f07a44ff30 upstream.

Make sure that bdgrab() is done on the 'block_device' instance before
referring to it for avoiding use-after-free.

Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit aebf5db917055b38f4945ed6d621d9f07a44ff30 upstream.

Make sure that bdgrab() is done on the 'block_device' instance before
referring to it for avoiding use-after-free.

Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix del_gendisk() vs blkdev_ioctl crash</title>
<updated>2017-04-27T07:09:34+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2015-12-29T22:02:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6ddbac9aa800b99761a05ba520d62a0d8063a5b8'/>
<id>6ddbac9aa800b99761a05ba520d62a0d8063a5b8</id>
<content type='text'>
commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.

When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.

The nvdimm_revalidate_disk() implementation depends on
disk-&gt;driverfs_dev to be valid at entry.  However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.

There's no reason for del_gendisk() to clear -&gt;driverfs_dev.  That
device is the parent of the disk.  It is guaranteed to not be freed
until the disk, as a child, drops its -&gt;parent reference.

We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)-&gt;parent, but lets fix it globally since
-&gt;driverfs_dev follows the lifetime of the parent.  Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [&lt;ffffffffa00340a8&gt;] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
 CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
 [..]
 Call Trace:
  [&lt;ffffffff8143e5c7&gt;] rescan_partitions+0x87/0x2c0
  [&lt;ffffffff810f37f9&gt;] ? __lock_is_held+0x49/0x70
  [&lt;ffffffff81438c62&gt;] __blkdev_reread_part+0x72/0xb0
  [&lt;ffffffff81438cc5&gt;] blkdev_reread_part+0x25/0x40
  [&lt;ffffffff8143982d&gt;] blkdev_ioctl+0x4fd/0x9c0
  [&lt;ffffffff811246c9&gt;] ? current_kernel_time64+0x69/0xd0
  [&lt;ffffffff812916dd&gt;] block_ioctl+0x3d/0x50
  [&lt;ffffffff81264c38&gt;] do_vfs_ioctl+0x308/0x560
  [&lt;ffffffff8115dbd1&gt;] ? __audit_syscall_entry+0xb1/0x100
  [&lt;ffffffff810031d6&gt;] ? do_audit_syscall_entry+0x66/0x70
  [&lt;ffffffff81264f09&gt;] SyS_ioctl+0x79/0x90
  [&lt;ffffffff81902672&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@fb.com&gt;
Reported-by: Robert Hu &lt;robert.hu@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ac34f15e0c6d2fd58480052b6985f6991fb53bcc upstream.

When tearing down a block device early in its lifetime, userspace may
still be performing discovery actions like blkdev_ioctl() to re-read
partitions.

The nvdimm_revalidate_disk() implementation depends on
disk-&gt;driverfs_dev to be valid at entry.  However, it is set to NULL in
del_gendisk() and fatally this is happening *before* the disk device is
deleted from userspace view.

There's no reason for del_gendisk() to clear -&gt;driverfs_dev.  That
device is the parent of the disk.  It is guaranteed to not be freed
until the disk, as a child, drops its -&gt;parent reference.

We could also fix this issue locally in nvdimm_revalidate_disk() by
using disk_to_dev(disk)-&gt;parent, but lets fix it globally since
-&gt;driverfs_dev follows the lifetime of the parent.  Longer term we
should probably just add a @parent parameter to add_disk(), and stop
carrying this pointer in the gendisk.

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [&lt;ffffffffa00340a8&gt;] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
 CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
 [..]
 Call Trace:
  [&lt;ffffffff8143e5c7&gt;] rescan_partitions+0x87/0x2c0
  [&lt;ffffffff810f37f9&gt;] ? __lock_is_held+0x49/0x70
  [&lt;ffffffff81438c62&gt;] __blkdev_reread_part+0x72/0xb0
  [&lt;ffffffff81438cc5&gt;] blkdev_reread_part+0x25/0x40
  [&lt;ffffffff8143982d&gt;] blkdev_ioctl+0x4fd/0x9c0
  [&lt;ffffffff811246c9&gt;] ? current_kernel_time64+0x69/0xd0
  [&lt;ffffffff812916dd&gt;] block_ioctl+0x3d/0x50
  [&lt;ffffffff81264c38&gt;] do_vfs_ioctl+0x308/0x560
  [&lt;ffffffff8115dbd1&gt;] ? __audit_syscall_entry+0xb1/0x100
  [&lt;ffffffff810031d6&gt;] ? do_audit_syscall_entry+0x66/0x70
  [&lt;ffffffff81264f09&gt;] SyS_ioctl+0x79/0x90
  [&lt;ffffffff81902672&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@fb.com&gt;
Reported-by: Robert Hu &lt;robert.hu@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix bdi vs gendisk lifetime mismatch</title>
<updated>2016-08-20T16:09:24+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2016-07-31T18:15:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0d301856de347a43fa87833dba61d3239211429f'/>
<id>0d301856de347a43fa87833dba61d3239211429f</id>
<content type='text'>
commit df08c32ce3be5be138c1dbfcba203314a3a7cd6f upstream.

The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live.  Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:

 WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'

 Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
  0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
  ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
  0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
 Call Trace:
  [&lt;ffffffff8134caec&gt;] dump_stack+0x63/0x87
  [&lt;ffffffff8108c351&gt;] __warn+0xd1/0xf0
  [&lt;ffffffff8108c3cf&gt;] warn_slowpath_fmt+0x5f/0x80
  [&lt;ffffffff812a0d34&gt;] sysfs_warn_dup+0x64/0x80
  [&lt;ffffffff812a0e1e&gt;] sysfs_create_dir_ns+0x7e/0x90
  [&lt;ffffffff8134faaa&gt;] kobject_add_internal+0xaa/0x320
  [&lt;ffffffff81358d4e&gt;] ? vsnprintf+0x34e/0x4d0
  [&lt;ffffffff8134ff55&gt;] kobject_add+0x75/0xd0
  [&lt;ffffffff816e66b2&gt;] ? mutex_lock+0x12/0x2f
  [&lt;ffffffff8148b0a5&gt;] device_add+0x125/0x610
  [&lt;ffffffff8148b788&gt;] device_create_groups_vargs+0xd8/0x100
  [&lt;ffffffff8148b7cc&gt;] device_create_vargs+0x1c/0x20
  [&lt;ffffffff811b775c&gt;] bdi_register+0x8c/0x180
  [&lt;ffffffff811b7877&gt;] bdi_register_dev+0x27/0x30
  [&lt;ffffffff813317f5&gt;] add_disk+0x175/0x4a0

Reported-by: Yi Zhang &lt;yizhan@redhat.com&gt;
Tested-by: Yi Zhang &lt;yizhan@redhat.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

Fixed up missing 0 return in bdi_register_owner().

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit df08c32ce3be5be138c1dbfcba203314a3a7cd6f upstream.

The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live.  Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:

 WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'

 Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
  0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
  ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
  0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
 Call Trace:
  [&lt;ffffffff8134caec&gt;] dump_stack+0x63/0x87
  [&lt;ffffffff8108c351&gt;] __warn+0xd1/0xf0
  [&lt;ffffffff8108c3cf&gt;] warn_slowpath_fmt+0x5f/0x80
  [&lt;ffffffff812a0d34&gt;] sysfs_warn_dup+0x64/0x80
  [&lt;ffffffff812a0e1e&gt;] sysfs_create_dir_ns+0x7e/0x90
  [&lt;ffffffff8134faaa&gt;] kobject_add_internal+0xaa/0x320
  [&lt;ffffffff81358d4e&gt;] ? vsnprintf+0x34e/0x4d0
  [&lt;ffffffff8134ff55&gt;] kobject_add+0x75/0xd0
  [&lt;ffffffff816e66b2&gt;] ? mutex_lock+0x12/0x2f
  [&lt;ffffffff8148b0a5&gt;] device_add+0x125/0x610
  [&lt;ffffffff8148b788&gt;] device_create_groups_vargs+0xd8/0x100
  [&lt;ffffffff8148b7cc&gt;] device_create_vargs+0x1c/0x20
  [&lt;ffffffff811b775c&gt;] bdi_register+0x8c/0x180
  [&lt;ffffffff811b7877&gt;] bdi_register_dev+0x27/0x30
  [&lt;ffffffff813317f5&gt;] add_disk+0x175/0x4a0

Reported-by: Yi Zhang &lt;yizhan@redhat.com&gt;
Tested-by: Yi Zhang &lt;yizhan@redhat.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

Fixed up missing 0 return in bdi_register_owner().

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix use-after-free in seq file</title>
<updated>2016-08-16T07:30:50+00:00</updated>
<author>
<name>Vegard Nossum</name>
<email>vegard.nossum@oracle.com</email>
</author>
<published>2016-07-29T08:40:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9a95c0cfc6f21b9ac66269d4782ea5a0f58cdf91'/>
<id>9a95c0cfc6f21b9ac66269d4782ea5a0f58cdf91</id>
<content type='text'>
commit 77da160530dd1dc94f6ae15a981f24e5f0021e84 upstream.

I got a KASAN report of use-after-free:

    ==================================================================
    BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
    Read of size 8 by task trinity-c1/315
    =============================================================================
    BUG kmalloc-32 (Not tainted): kasan: bad access detected
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
            ___slab_alloc+0x4f1/0x520
            __slab_alloc.isra.58+0x56/0x80
            kmem_cache_alloc_trace+0x260/0x2a0
            disk_seqf_start+0x66/0x110
            traverse+0x176/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a
    INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
            __slab_free+0x17a/0x2c0
            kfree+0x20a/0x220
            disk_seqf_stop+0x42/0x50
            traverse+0x3b5/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a

    CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
     ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
     ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
     ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
    Call Trace:
     [&lt;ffffffff81d6ce81&gt;] dump_stack+0x65/0x84
     [&lt;ffffffff8146c7bd&gt;] print_trailer+0x10d/0x1a0
     [&lt;ffffffff814704ff&gt;] object_err+0x2f/0x40
     [&lt;ffffffff814754d1&gt;] kasan_report_error+0x221/0x520
     [&lt;ffffffff8147590e&gt;] __asan_report_load8_noabort+0x3e/0x40
     [&lt;ffffffff83888161&gt;] klist_iter_exit+0x61/0x70
     [&lt;ffffffff82404389&gt;] class_dev_iter_exit+0x9/0x10
     [&lt;ffffffff81d2e8ea&gt;] disk_seqf_stop+0x3a/0x50
     [&lt;ffffffff8151f812&gt;] seq_read+0x4b2/0x11a0
     [&lt;ffffffff815f8fdc&gt;] proc_reg_read+0xbc/0x180
     [&lt;ffffffff814b24e4&gt;] do_loop_readv_writev+0x134/0x210
     [&lt;ffffffff814b4c45&gt;] do_readv_writev+0x565/0x660
     [&lt;ffffffff814b8a17&gt;] vfs_readv+0x67/0xa0
     [&lt;ffffffff814b8de6&gt;] do_preadv+0x126/0x170
     [&lt;ffffffff814b92ec&gt;] SyS_preadv+0xc/0x10

This problem can occur in the following situation:

open()
 - pread()
    - .seq_start()
       - iter = kmalloc() // succeeds
       - seqf-&gt;private = iter
    - .seq_stop()
       - kfree(seqf-&gt;private)
 - pread()
    - .seq_start()
       - iter = kmalloc() // fails
    - .seq_stop()
       - class_dev_iter_exit(seqf-&gt;private) // boom! old pointer

As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.

An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.

Signed-off-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 77da160530dd1dc94f6ae15a981f24e5f0021e84 upstream.

I got a KASAN report of use-after-free:

    ==================================================================
    BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
    Read of size 8 by task trinity-c1/315
    =============================================================================
    BUG kmalloc-32 (Not tainted): kasan: bad access detected
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
            ___slab_alloc+0x4f1/0x520
            __slab_alloc.isra.58+0x56/0x80
            kmem_cache_alloc_trace+0x260/0x2a0
            disk_seqf_start+0x66/0x110
            traverse+0x176/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a
    INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
            __slab_free+0x17a/0x2c0
            kfree+0x20a/0x220
            disk_seqf_stop+0x42/0x50
            traverse+0x3b5/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a

    CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
     ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
     ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
     ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
    Call Trace:
     [&lt;ffffffff81d6ce81&gt;] dump_stack+0x65/0x84
     [&lt;ffffffff8146c7bd&gt;] print_trailer+0x10d/0x1a0
     [&lt;ffffffff814704ff&gt;] object_err+0x2f/0x40
     [&lt;ffffffff814754d1&gt;] kasan_report_error+0x221/0x520
     [&lt;ffffffff8147590e&gt;] __asan_report_load8_noabort+0x3e/0x40
     [&lt;ffffffff83888161&gt;] klist_iter_exit+0x61/0x70
     [&lt;ffffffff82404389&gt;] class_dev_iter_exit+0x9/0x10
     [&lt;ffffffff81d2e8ea&gt;] disk_seqf_stop+0x3a/0x50
     [&lt;ffffffff8151f812&gt;] seq_read+0x4b2/0x11a0
     [&lt;ffffffff815f8fdc&gt;] proc_reg_read+0xbc/0x180
     [&lt;ffffffff814b24e4&gt;] do_loop_readv_writev+0x134/0x210
     [&lt;ffffffff814b4c45&gt;] do_readv_writev+0x565/0x660
     [&lt;ffffffff814b8a17&gt;] vfs_readv+0x67/0xa0
     [&lt;ffffffff814b8de6&gt;] do_preadv+0x126/0x170
     [&lt;ffffffff814b92ec&gt;] SyS_preadv+0xc/0x10

This problem can occur in the following situation:

open()
 - pread()
    - .seq_start()
       - iter = kmalloc() // succeeds
       - seqf-&gt;private = iter
    - .seq_stop()
       - kfree(seqf-&gt;private)
 - pread()
    - .seq_start()
       - iter = kmalloc() // fails
    - .seq_stop()
       - class_dev_iter_exit(seqf-&gt;private) // boom! old pointer

As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.

An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.

Signed-off-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>block: Inline blk_integrity in struct gendisk</title>
<updated>2015-10-21T20:42:42+00:00</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2015-10-21T17:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=25520d55cdb6ee289abc68f553d364d22478ff54'/>
<id>25520d55cdb6ee289abc68f553d364d22478ff54</id>
<content type='text'>
Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.

This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.

Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:

 - Moving the blk_integrity definition to genhd.h.

 - Inlining blk_integrity in struct gendisk.

 - Removing the dynamic allocation code.

 - Adding helper functions which allow gendisk to set up and tear down
   the integrity sysfs dir when a disk is added/deleted.

 - Adding a blk_integrity_revalidate() callback for updating the stable
   pages bdi setting.

 - The calls that depend on whether a device has an integrity profile or
   not now key off of the bi-&gt;profile pointer.

 - Simplifying the integrity support routines in DM (Mike Snitzer).

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reported-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.

This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.

Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:

 - Moving the blk_integrity definition to genhd.h.

 - Inlining blk_integrity in struct gendisk.

 - Removing the dynamic allocation code.

 - Adding helper functions which allow gendisk to set up and tear down
   the integrity sysfs dir when a disk is added/deleted.

 - Adding a blk_integrity_revalidate() callback for updating the stable
   pages bdi setting.

 - The calls that depend on whether a device has an integrity profile or
   not now key off of the bi-&gt;profile pointer.

 - Simplifying the integrity support routines in DM (Mike Snitzer).

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reported-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: partition: convert percpu ref</title>
<updated>2015-07-17T14:41:53+00:00</updated>
<author>
<name>Ming Lei</name>
<email>tom.leiming@gmail.com</email>
</author>
<published>2015-07-16T03:16:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6c71013ecb7e2bddbed9f5b95e7aed22c491daa9'/>
<id>6c71013ecb7e2bddbed9f5b95e7aed22c491daa9</id>
<content type='text'>
Percpu refcount is the perfect match for partition's case,
and the conversion is quite straight.

With the convertion, one pair of atomic inc/dec can be saved
for accounting block I/O, which is run in hot path of block I/O.

Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Percpu refcount is the perfect match for partition's case,
and the conversion is quite straight.

With the convertion, one pair of atomic inc/dec can be saved
for accounting block I/O, which is run in hot path of block I/O.

Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: partition: introduce hd_free_part()</title>
<updated>2015-07-17T14:41:53+00:00</updated>
<author>
<name>Ming Lei</name>
<email>tom.leiming@gmail.com</email>
</author>
<published>2015-07-16T03:16:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b54e5ed8f285d62c0d242c4ef9da90937994db02'/>
<id>b54e5ed8f285d62c0d242c4ef9da90937994db02</id>
<content type='text'>
So the helper can be used in both generic partition
case and part0 case.

Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
So the helper can be used in both generic partition
case and part0 case.

Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block</title>
<updated>2015-06-25T23:00:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-06-25T23:00:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e4bc13adfd016fc1036838170288b5680d1a98b0'/>
<id>e4bc13adfd016fc1036838170288b5680d1a98b0</id>
<content type='text'>
Pull cgroup writeback support from Jens Axboe:
 "This is the big pull request for adding cgroup writeback support.

  This code has been in development for a long time, and it has been
  simmering in for-next for a good chunk of this cycle too.  This is one
  of those problems that has been talked about for at least half a
  decade, finally there's a solution and code to go with it.

  Also see last weeks writeup on LWN:

        http://lwn.net/Articles/648292/"

* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
  writeback, blkio: add documentation for cgroup writeback support
  vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
  writeback: do foreign inode detection iff cgroup writeback is enabled
  v9fs: fix error handling in v9fs_session_init()
  bdi: fix wrong error return value in cgwb_create()
  buffer: remove unusued 'ret' variable
  writeback: disassociate inodes from dying bdi_writebacks
  writeback: implement foreign cgroup inode bdi_writeback switching
  writeback: add lockdep annotation to inode_to_wb()
  writeback: use unlocked_inode_to_wb transaction in inode_congested()
  writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
  writeback: implement [locked_]inode_to_wb_and_lock_list()
  writeback: implement foreign cgroup inode detection
  writeback: make writeback_control track the inode being written back
  writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
  mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
  writeback: implement memcg writeback domain based throttling
  writeback: reset wb_domain-&gt;dirty_limit[_tstmp] when memcg domain size changes
  writeback: implement memcg wb_domain
  writeback: update wb_over_bg_thresh() to use wb_domain aware operations
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull cgroup writeback support from Jens Axboe:
 "This is the big pull request for adding cgroup writeback support.

  This code has been in development for a long time, and it has been
  simmering in for-next for a good chunk of this cycle too.  This is one
  of those problems that has been talked about for at least half a
  decade, finally there's a solution and code to go with it.

  Also see last weeks writeup on LWN:

        http://lwn.net/Articles/648292/"

* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
  writeback, blkio: add documentation for cgroup writeback support
  vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
  writeback: do foreign inode detection iff cgroup writeback is enabled
  v9fs: fix error handling in v9fs_session_init()
  bdi: fix wrong error return value in cgwb_create()
  buffer: remove unusued 'ret' variable
  writeback: disassociate inodes from dying bdi_writebacks
  writeback: implement foreign cgroup inode bdi_writeback switching
  writeback: add lockdep annotation to inode_to_wb()
  writeback: use unlocked_inode_to_wb transaction in inode_congested()
  writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
  writeback: implement [locked_]inode_to_wb_and_lock_list()
  writeback: implement foreign cgroup inode detection
  writeback: make writeback_control track the inode being written back
  writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
  mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
  writeback: implement memcg writeback domain based throttling
  writeback: reset wb_domain-&gt;dirty_limit[_tstmp] when memcg domain size changes
  writeback: implement memcg wb_domain
  writeback: update wb_over_bg_thresh() to use wb_domain aware operations
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>block: fix ext_dev_lock lockdep report</title>
<updated>2015-06-11T15:01:40+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2015-06-11T03:47:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4d66e5e9b6d720d8463e11d027bd4ad91c8b1318'/>
<id>4d66e5e9b6d720d8463e11d027bd4ad91c8b1318</id>
<content type='text'>
 =================================
 [ INFO: inconsistent lock state ]
 4.1.0-rc7+ #217 Tainted: G           O
 ---------------------------------
 inconsistent {SOFTIRQ-ON-W} -&gt; {IN-SOFTIRQ-W} usage.
 swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
  (ext_devt_lock){+.?...}, at: [&lt;ffffffff8143a60c&gt;] blk_free_devt+0x3c/0x70
 {SOFTIRQ-ON-W} state was registered at:
   [&lt;ffffffff810bf6b1&gt;] __lock_acquire+0x461/0x1e70
   [&lt;ffffffff810c1947&gt;] lock_acquire+0xb7/0x290
   [&lt;ffffffff818ac3a8&gt;] _raw_spin_lock+0x38/0x50
   [&lt;ffffffff8143a07d&gt;] blk_alloc_devt+0x6d/0xd0  &lt;-- take the lock in process context
[..]
  [&lt;ffffffff810bf64e&gt;] __lock_acquire+0x3fe/0x1e70
  [&lt;ffffffff810c00ad&gt;] ? __lock_acquire+0xe5d/0x1e70
  [&lt;ffffffff810c1947&gt;] lock_acquire+0xb7/0x290
  [&lt;ffffffff8143a60c&gt;] ? blk_free_devt+0x3c/0x70
  [&lt;ffffffff818ac3a8&gt;] _raw_spin_lock+0x38/0x50
  [&lt;ffffffff8143a60c&gt;] ? blk_free_devt+0x3c/0x70
  [&lt;ffffffff8143a60c&gt;] blk_free_devt+0x3c/0x70    &lt;-- take the lock in softirq
  [&lt;ffffffff8143bfec&gt;] part_release+0x1c/0x50
  [&lt;ffffffff8158edf6&gt;] device_release+0x36/0xb0
  [&lt;ffffffff8145ac2b&gt;] kobject_cleanup+0x7b/0x1a0
  [&lt;ffffffff8145aad0&gt;] kobject_put+0x30/0x70
  [&lt;ffffffff8158f147&gt;] put_device+0x17/0x20
  [&lt;ffffffff8143c29c&gt;] delete_partition_rcu_cb+0x16c/0x180
  [&lt;ffffffff8143c130&gt;] ? read_dev_sector+0xa0/0xa0
  [&lt;ffffffff810e0e0f&gt;] rcu_process_callbacks+0x2ff/0xa90
  [&lt;ffffffff810e0dcf&gt;] ? rcu_process_callbacks+0x2bf/0xa90
  [&lt;ffffffff81067e2e&gt;] __do_softirq+0xde/0x600

Neil sees this in his tests and it also triggers on pmem driver unbind
for the libnvdimm tests.  This fix is on top of an initial fix by Keith
for incorrect usage of mutex_lock() in this path: 2da78092dda1 "block:
Fix dev_t minor allocation lifetime".  Both this and 2da78092dda1 are
candidates for -stable.

Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime")
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Reported-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 =================================
 [ INFO: inconsistent lock state ]
 4.1.0-rc7+ #217 Tainted: G           O
 ---------------------------------
 inconsistent {SOFTIRQ-ON-W} -&gt; {IN-SOFTIRQ-W} usage.
 swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
  (ext_devt_lock){+.?...}, at: [&lt;ffffffff8143a60c&gt;] blk_free_devt+0x3c/0x70
 {SOFTIRQ-ON-W} state was registered at:
   [&lt;ffffffff810bf6b1&gt;] __lock_acquire+0x461/0x1e70
   [&lt;ffffffff810c1947&gt;] lock_acquire+0xb7/0x290
   [&lt;ffffffff818ac3a8&gt;] _raw_spin_lock+0x38/0x50
   [&lt;ffffffff8143a07d&gt;] blk_alloc_devt+0x6d/0xd0  &lt;-- take the lock in process context
[..]
  [&lt;ffffffff810bf64e&gt;] __lock_acquire+0x3fe/0x1e70
  [&lt;ffffffff810c00ad&gt;] ? __lock_acquire+0xe5d/0x1e70
  [&lt;ffffffff810c1947&gt;] lock_acquire+0xb7/0x290
  [&lt;ffffffff8143a60c&gt;] ? blk_free_devt+0x3c/0x70
  [&lt;ffffffff818ac3a8&gt;] _raw_spin_lock+0x38/0x50
  [&lt;ffffffff8143a60c&gt;] ? blk_free_devt+0x3c/0x70
  [&lt;ffffffff8143a60c&gt;] blk_free_devt+0x3c/0x70    &lt;-- take the lock in softirq
  [&lt;ffffffff8143bfec&gt;] part_release+0x1c/0x50
  [&lt;ffffffff8158edf6&gt;] device_release+0x36/0xb0
  [&lt;ffffffff8145ac2b&gt;] kobject_cleanup+0x7b/0x1a0
  [&lt;ffffffff8145aad0&gt;] kobject_put+0x30/0x70
  [&lt;ffffffff8158f147&gt;] put_device+0x17/0x20
  [&lt;ffffffff8143c29c&gt;] delete_partition_rcu_cb+0x16c/0x180
  [&lt;ffffffff8143c130&gt;] ? read_dev_sector+0xa0/0xa0
  [&lt;ffffffff810e0e0f&gt;] rcu_process_callbacks+0x2ff/0xa90
  [&lt;ffffffff810e0dcf&gt;] ? rcu_process_callbacks+0x2bf/0xa90
  [&lt;ffffffff81067e2e&gt;] __do_softirq+0xde/0x600

Neil sees this in his tests and it also triggers on pmem driver unbind
for the libnvdimm tests.  This fix is on top of an initial fix by Keith
for incorrect usage of mutex_lock() in this path: 2da78092dda1 "block:
Fix dev_t minor allocation lifetime".  Both this and 2da78092dda1 are
candidates for -stable.

Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime")
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Reported-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>writeback: separate out include/linux/backing-dev-defs.h</title>
<updated>2015-06-02T14:33:34+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-05-22T21:13:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=66114cad64bf76a155fec1f0fff0de771cf909d5'/>
<id>66114cad64bf76a155fec1f0fff0de771cf909d5</id>
<content type='text'>
With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.

This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it.  c files
which need access to more backing-dev details now include
backing-dev.h directly.  This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.

v2: fs/fat build failure fixed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.

This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it.  c files
which need access to more backing-dev details now include
backing-dev.h directly.  This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.

v2: fs/fat build failure fixed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
