linux-stable.git/block, branch v3.2.84

block: fix use-after-free in seq file

2016-11-20T01:01:30+00:00

commit 77da160530dd1dc94f6ae15a981f24e5f0021e84 upstream.

I got a KASAN report of use-after-free:

    ==================================================================
    BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
    Read of size 8 by task trinity-c1/315
    =============================================================================
    BUG kmalloc-32 (Not tainted): kasan: bad access detected
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
            ___slab_alloc+0x4f1/0x520
            __slab_alloc.isra.58+0x56/0x80
            kmem_cache_alloc_trace+0x260/0x2a0
            disk_seqf_start+0x66/0x110
            traverse+0x176/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a
    INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
            __slab_free+0x17a/0x2c0
            kfree+0x20a/0x220
            disk_seqf_stop+0x42/0x50
            traverse+0x3b5/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a

    CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
     ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
     ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
     ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
    Call Trace:
     [] dump_stack+0x65/0x84
     [] print_trailer+0x10d/0x1a0
     [] object_err+0x2f/0x40
     [] kasan_report_error+0x221/0x520
     [] __asan_report_load8_noabort+0x3e/0x40
     [] klist_iter_exit+0x61/0x70
     [] class_dev_iter_exit+0x9/0x10
     [] disk_seqf_stop+0x3a/0x50
     [] seq_read+0x4b2/0x11a0
     [] proc_reg_read+0xbc/0x180
     [] do_loop_readv_writev+0x134/0x210
     [] do_readv_writev+0x565/0x660
     [] vfs_readv+0x67/0xa0
     [] do_preadv+0x126/0x170
     [] SyS_preadv+0xc/0x10

This problem can occur in the following situation:

open()
 - pread()
    - .seq_start()
       - iter = kmalloc() // succeeds
       - seqf->private = iter
    - .seq_stop()
       - kfree(seqf->private)
 - pread()
    - .seq_start()
       - iter = kmalloc() // fails
    - .seq_stop()
       - class_dev_iter_exit(seqf->private) // boom! old pointer

As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.

An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.

Signed-off-by: Vegard Nossum 
Acked-by: Tejun Heo 
Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings

genhd: check for int overflow in disk_expand_part_tbl()

2015-02-20T00:49:25+00:00

commit 5fabcb4c33fe11c7e3afdf805fde26c1a54d0953 upstream.

We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition()
with a user passed in partno value. If we pass in 0x7fffffff, the
new target in disk_expand_part_tbl() overflows the 'int' and we
access beyond the end of ptbl->part[] and even write to it when we
do the rcu_assign_pointer() to assign the new partition.

Reported-by: David Ramos 
Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings

scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND

2014-12-14T16:23:50+00:00

commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream.

When sg_scsi_ioctl() fails to prepare request to submit in
blk_rq_map_kern() we jump to a label where we just end up copying
(luckily zeroed-out) kernel buffer to userspace instead of reporting
error. Fix the problem by jumping to the right label.

CC: Jens Axboe 
CC: linux-scsi@vger.kernel.org
Coverity-id: 1226871
Signed-off-by: Jan Kara 

Fixed up the, now unused, out label.

Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings

block: fix alignment_offset math that assumes io_min is a power-of-2

2014-12-14T16:23:47+00:00

commit b8839b8c55f3fdd60dc36abcda7e0266aff7985c upstream.

The math in both blk_stack_limits() and queue_limit_alignment_offset()
assume that a block device's io_min (aka minimum_io_size) is always a
power-of-2.  Fix the math such that it works for non-power-of-2 io_min.

This issue (of alignment_offset != 0) became apparent when testing
dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
1280K.  Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
block size") unlocked the potential for alignment_offset != 0 due to
the dm-thin-pool's io_min possibly being a non-power-of-2.

Signed-off-by: Mike Snitzer 
Acked-by: Martin K. Petersen 
Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings

genhd: fix leftover might_sleep() in blk_free_devt()

2014-11-05T20:27:49+00:00

commit 46f341ffcfb5d8530f7d1e60f3be06cce6661b62 upstream.

Commit 2da78092 changed the locking from a mutex to a spinlock,
so we now longer sleep in this context. But there was a leftover
might_sleep() in there, which now triggers since we do the final
free from an RCU callback. Get rid of it.

Reported-by: Pontus Fuchs 
Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings

block: Fix dev_t minor allocation lifetime

2014-11-05T20:27:40+00:00

commit 2da78092dda13f1efd26edbbf99a567776913750 upstream.

Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.

Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.

Signed-off-by: Keith Busch 
Signed-off-by: Jens Axboe 
[bwh: Backported to 3.2:
 - Adjust filename
 - idr insertion API is different, and blk_alloc_devt() is preallocating
   a node in a different way]
Signed-off-by: Ben Hutchings

block: don't assume last put of shared tags is for the host

2014-09-13T22:41:37+00:00

commit d45b3279a5a2252cafcd665bbf2db8c9b31ef783 upstream.

There is no inherent reason why the last put of a tag structure must be
the one for the Scsi_Host, as device model objects can be held for
arbitrary periods.  Merge blk_free_tags and __blk_free_tags into a single
funtion that just release a references and get rid of the BUG() when the
host reference wasn't the last.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings

blktrace: fix accounting of partially completed requests

2014-04-30T15:23:20+00:00

commit af5040da01ef980670b3741b3e10733ee3e33566 upstream.

trace_block_rq_complete does not take into account that request can
be partially completed, so we can get the following incorrect output
of blkparser:

  C   R 232 + 240 [0]
  C   R 240 + 232 [0]
  C   R 248 + 224 [0]
  C   R 256 + 216 [0]

but should be:

  C   R 232 + 8 [0]
  C   R 240 + 8 [0]
  C   R 248 + 8 [0]
  C   R 256 + 8 [0]

Also, the whole output summary statistics of completed requests and
final throughput will be incorrect.

This patch takes into account real completion size of the request and
fixes wrong completion accounting.

Signed-off-by: Roman Pen 
CC: Steven Rostedt 
CC: Frederic Weisbecker 
CC: Ingo Molnar 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe 
[bwh: Backported to 3.2: drop change in blk-mq.c]
Signed-off-by: Ben Hutchings

block: add cond_resched() to potentially long running ioctl discard loop

2014-04-01T23:58:51+00:00

commit c8123f8c9cb517403b51aa41c3c46ff5e10b2c17 upstream.

When mkfs issues a full device discard and the device only
supports discards of a smallish size, we can loop in
blkdev_issue_discard() for a long time. If preempt isn't enabled,
this can turn into a softlock situation and the kernel will
start complaining.

Add an explicit cond_resched() at the end of the loop to avoid
that.

Signed-off-by: Jens Axboe 
Signed-off-by: Ben Hutchings

blk-core: Fix memory corruption if blkcg_init_queue fails

2014-01-03T04:33:18+00:00

commit fff4996b7db7955414ac74386efa5e07fd766b50 upstream.

If blkcg_init_queue fails, blk_alloc_queue_node doesn't call bdi_destroy
to clean up structures allocated by the backing dev.

------------[ cut here ]------------
WARNING: at lib/debugobjects.c:260 debug_print_object+0x85/0xa0()
ODEBUG: free active (active state 0) object type: percpu_counter hint:           (null)
Modules linked in: dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev ipt_MASQUERADE iptable_nat nf_nat_ipv4 msr nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lm85 hwmon_vid snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq freq_table mperf sata_svw serverworks kvm_amd ide_core ehci_pci ohci_hcd libata ehci_hcd kvm usbcore tg3 usb_common libphy k10temp pcspkr ptp i2c_piix4 i2c_core evdev microcode hwmon rtc_cmos pps_core e100 skge floppy mii processor button unix
CPU: 0 PID: 2739 Comm: lvchange Tainted: G        W
3.10.15-devel #14
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
 0000000000000009 ffff88023c3c1ae8 ffffffff813c8fd4 ffff88023c3c1b20
 ffffffff810399eb ffff88043d35cd58 ffffffff81651940 ffff88023c3c1bf8
 ffffffff82479d90 0000000000000005 ffff88023c3c1b80 ffffffff81039a67
Call Trace:
 [] dump_stack+0x19/0x1b
 [] warn_slowpath_common+0x6b/0xa0
 [] warn_slowpath_fmt+0x47/0x50
 [] ? debug_check_no_obj_freed+0xcf/0x250
 [] debug_print_object+0x85/0xa0
 [] debug_check_no_obj_freed+0x203/0x250
 [] kmem_cache_free+0x20c/0x3a0
 [] blk_alloc_queue_node+0x2a9/0x2c0
 [] blk_alloc_queue+0xe/0x10
 [] dm_create+0x1a3/0x530 [dm_mod]
 [] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [] dev_create+0x57/0x2b0 [dm_mod]
 [] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [] ctl_ioctl+0x268/0x500 [dm_mod]
 [] ? get_lock_stats+0x22/0x70
 [] dm_ctl_ioctl+0xe/0x20 [dm_mod]
 [] do_vfs_ioctl+0x2ed/0x520
 [] ? fget_light+0x377/0x4e0
 [] SyS_ioctl+0x4b/0x90
 [] system_call_fastpath+0x1a/0x1f
---[ end trace 4b5ff0d55673d986 ]---
------------[ cut here ]------------

This fix should be backported to stable kernels starting with 2.6.37. Note
that in the kernels prior to 3.5 the affected code is different, but the
bug is still there - bdi_init is called and bdi_destroy isn't.

Signed-off-by: Mikulas Patocka 
Acked-by: Tejun Heo 
Signed-off-by: Jens Axboe 
[bwh: Backported to 3.2: add bdi_destroy() to the single error path
 after the call to bdi_init()]
Signed-off-by: Ben Hutchings