linux-stable.git/drivers/scsi, branch v3.4.112

mvsas: Fix NULL pointer dereference in mvs_slot_task_free

2016-04-27T10:55:29+00:00

commit 2280521719e81919283b82902ac24058f87dfc1b upstream.

When pci_pool_alloc fails in mvs_task_prep then task->lldd_task stays
NULL but it's later used in mvs_abort_task as slot which is passed
to mvs_slot_task_free causing NULL pointer dereference.

Just return from mvs_slot_task_free when passed with NULL slot.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101891
Signed-off-by: Dāvis Mosāns 
Reviewed-by: Tomas Henzl 
Reviewed-by: Johannes Thumshirn 
Signed-off-by: James Bottomley 
Signed-off-by: Zefan Li

sg_start_req(): make sure that there's not too many elements in iovec

2016-03-21T01:17:54+00:00

commit 451a2886b6bf90e2fb378f7c46c655450fb96e81 upstream.

unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there.  If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.

X-Coverup: TINC (and there's no lumber cartel either)
Signed-off-by: Al Viro 
[lizf: Backported to 3.4: s/MAX_UIOVEC/UIO_MAXIOV]
Signed-off-by: Zefan Li

libfc: Fix fc_fcp_cleanup_each_cmd()

2016-03-21T01:17:52+00:00

commit 8f2777f53e3d5ad8ef2a176a4463a5c8e1a16431 upstream.

Since fc_fcp_cleanup_cmd() can sleep this function must not
be called while holding a spinlock. This patch avoids that
fc_fcp_cleanup_each_cmd() triggers the following bug:

BUG: scheduling while atomic: sg_reset/1512/0x00000202
1 lock held by sg_reset/1512:
 #0:  (&(&fsp->scsi_pkt_lock)->rlock){+.-...}, at: [] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Preemption disabled at:[] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Call Trace:
 [] dump_stack+0x4f/0x7b
 [] __schedule_bug+0x6c/0xd0
 [] __schedule+0x71a/0xa10
 [] schedule+0x32/0x80
 [] fc_seq_set_resp+0xac/0x100 [libfc]
 [] fc_exch_done+0x41/0x60 [libfc]
 [] fc_fcp_cleanup_each_cmd.isra.21+0xcf/0x150 [libfc]
 [] fc_eh_device_reset+0x1c3/0x270 [libfc]
 [] scsi_try_bus_device_reset+0x29/0x60
 [] scsi_ioctl_reset+0x258/0x2d0
 [] scsi_ioctl+0x150/0x440
 [] sd_ioctl+0xad/0x120
 [] blkdev_ioctl+0x1b6/0x810
 [] block_ioctl+0x38/0x40
 [] do_vfs_ioctl+0x2f8/0x530
 [] SyS_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x7a

Signed-off-by: Bart Van Assche 
Signed-off-by: Vasu Dev 
Signed-off-by: James Bottomley 
Signed-off-by: Zefan Li

libiscsi: Fix host busy blocking during connection teardown

2016-03-21T01:17:52+00:00

commit 660d0831d1494a6837b2f810d08b5be092c1f31d upstream.

In case of hw iscsi offload, an host can have N-number of active
connections. There can be IO's running on some connections which
make host->host_busy always TRUE. Now if logout from a connection
is tried then the code gets into an infinite loop as host->host_busy
is always TRUE.

 iscsi_conn_teardown(....)
 {
   .........
    /*
     * Block until all in-progress commands for this connection
     * time out or fail.
     */
     for (;;) {
      spin_lock_irqsave(session->host->host_lock, flags);
      if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 0 */
	      spin_unlock_irqrestore(session->host->host_lock, flags);
              break;
      }
     spin_unlock_irqrestore(session->host->host_lock, flags);
     msleep_interruptible(500);
     iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
                 "host_busy %d host_failed %d\n",
	          atomic_read(&session->host->host_busy),
	          session->host->host_failed);

	................
	...............
     }
  }

This is not an issue with software-iscsi/iser as each cxn is a separate
host.

Fix:
Acquiring eh_mutex in iscsi_conn_teardown() before setting
session->state = ISCSI_STATE_TERMINATE.

Signed-off-by: John Soni Jose 
Reviewed-by: Mike Christie 
Reviewed-by: Chris Leech 
Signed-off-by: James Bottomley 
[lizf: Backported to 3.4: adjust context]
Signed-of-by: Zefan Li

st: null pointer dereference panic caused by use after kref_put by st_open

2016-03-21T01:17:44+00:00

commit e7ac6c6666bec0a354758a1298d3231e4a635362 upstream.

Two SLES11 SP3 servers encountered similar crashes simultaneously
following some kind of SAN/tape target issue:

...
qla2xxx [0000:81:00.0]-801c:3: Abort command issued nexus=3:0:2 --  1 2002.
qla2xxx [0000:81:00.0]-801c:3: Abort command issued nexus=3:0:2 --  1 2002.
qla2xxx [0000:81:00.0]-8009:3: DEVICE RESET ISSUED nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800c:3: do_reset failed for cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800f:3: DEVICE RESET FAILED: Task management failed nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-8009:3: TARGET RESET ISSUED nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800c:3: do_reset failed for cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-800f:3: TARGET RESET FAILED: Task management failed nexus=3:0:2 cmd=ffff882f89c2c7c0.
qla2xxx [0000:81:00.0]-8012:3: BUS RESET ISSUED nexus=3:0:2.
qla2xxx [0000:81:00.0]-802b:3: BUS RESET SUCCEEDED nexus=3:0:2.
qla2xxx [0000:81:00.0]-505f:3: Link is operational (8 Gbps).
qla2xxx [0000:81:00.0]-8018:3: ADAPTER RESET ISSUED nexus=3:0:2.
qla2xxx [0000:81:00.0]-00af:3: Performing ISP error recovery - ha=ffff88bf04d18000.
 rport-3:0-0: blocked FC remote port time out: removing target and saving binding
qla2xxx [0000:81:00.0]-505f:3: Link is operational (8 Gbps).
qla2xxx [0000:81:00.0]-8017:3: ADAPTER RESET SUCCEEDED nexus=3:0:2.
 rport-2:0-0: blocked FC remote port time out: removing target and saving binding
sg_rq_end_io: device detached
BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
IP: [] __pm_runtime_idle+0x28/0x90
PGD 7e6586f067 PUD 7e5af06067 PMD 0 [1739975.390354] Oops: 0002 [#1] SMP
CPU 0
...
Supported: No, Proprietary modules are loaded [1739975.390463]
Pid: 27965, comm: ABCD Tainted: PF           X 3.0.101-0.29-default #1 HP ProLiant DL580 Gen8
RIP: 0010:[]  [] __pm_runtime_idle+0x28/0x90
RSP: 0018:ffff8839dc1e7c68  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff883f0592fc00 RCX: 0000000000000090
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000138
RBP: 0000000000000138 R08: 0000000000000010 R09: ffffffff81bd39d0
R10: 00000000000009c0 R11: ffffffff81025790 R12: 0000000000000001
R13: ffff883022212b80 R14: 0000000000000004 R15: ffff883022212b80
FS:  00007f8e54560720(0000) GS:ffff88407f800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002a8 CR3: 0000007e6ced6000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ABCD (pid: 27965, threadinfo ffff8839dc1e6000, task ffff883592e0c640)
Stack:
 ffff883f0592fc00 00000000fffffffa 0000000000000001 ffff883022212b80
 ffff883eff772400 ffffffffa03fa309 0000000000000000 0000000000000000
 ffffffffa04003a0 ffff883f063196c0 ffff887f0379a930 ffffffff8115ea1e
Call Trace:
 [] st_open+0x129/0x240 [st]
 [] chrdev_open+0x13e/0x200
 [] __dentry_open+0x198/0x310
 [] do_last+0x1f4/0x800
 [] path_openat+0xd9/0x420
 [] do_filp_open+0x4c/0xc0
 [] do_sys_open+0x17f/0x250
 [] system_call_fastpath+0x16/0x1b
 [<00007f8e4f617fd0>] 0x7f8e4f617fcf
Code: eb d3 90 48 83 ec 28 40 f6 c6 04 48 89 6c 24 08 4c 89 74 24 20 48 89 fd 48 89 1c 24 4c 89 64 24 10 41 89 f6 4c 89 6c 24 18 74 11  ff 8f 70 01 00 00 0f 94 c0 45 31 ed 84 c0 74 2b 4c 8d a5 a0
RIP  [] __pm_runtime_idle+0x28/0x90
 RSP 
CR2: 00000000000002a8

Analysis reveals the cause of the crash to be due to STp->device
being NULL. The pointer was NULLed via scsi_tape_put(STp) when it
calls scsi_tape_release(). In st_open() we jump to err_out after
scsi_block_when_processing_errors() completes and returns the
device as offline (sdev_state was SDEV_DEL):

1180 /* Open the device. Needs to take the BKL only because of incrementing the SCSI host
1181    module count. */
1182 static int st_open(struct inode *inode, struct file *filp)
1183 {
1184         int i, retval = (-EIO);
1185         int resumed = 0;
1186         struct scsi_tape *STp;
1187         struct st_partstat *STps;
1188         int dev = TAPE_NR(inode);
1189         char *name;
...
1217         if (scsi_autopm_get_device(STp->device) < 0) {
1218                 retval = -EIO;
1219                 goto err_out;
1220         }
1221         resumed = 1;
1222         if (!scsi_block_when_processing_errors(STp->device)) {
1223                 retval = (-ENXIO);
1224                 goto err_out;
1225         }
...
1264  err_out:
1265         normalize_buffer(STp->buffer);
1266         spin_lock(&st_use_lock);
1267         STp->in_use = 0;
1268         spin_unlock(&st_use_lock);
1269         scsi_tape_put(STp); <-- STp->device = 0 after this
1270         if (resumed)
1271                 scsi_autopm_put_device(STp->device);
1272         return retval;

The ref count for the struct scsi_tape had already been reduced
to 1 when the .remove method of the st module had been called.
The kref_put() in scsi_tape_put() caused scsi_tape_release()
to be called:

0266 static void scsi_tape_put(struct scsi_tape *STp)
0267 {
0268         struct scsi_device *sdev = STp->device;
0269
0270         mutex_lock(&st_ref_mutex);
0271         kref_put(&STp->kref, scsi_tape_release); <-- calls this
0272         scsi_device_put(sdev);
0273         mutex_unlock(&st_ref_mutex);
0274 }

In scsi_tape_release() the struct scsi_device in the struct
scsi_tape gets set to NULL:

4273 static void scsi_tape_release(struct kref *kref)
4274 {
4275         struct scsi_tape *tpnt = to_scsi_tape(kref);
4276         struct gendisk *disk = tpnt->disk;
4277
4278         tpnt->device = NULL; <<<---- where the dev is nulled
4279
4280         if (tpnt->buffer) {
4281                 normalize_buffer(tpnt->buffer);
4282                 kfree(tpnt->buffer->reserved_pages);
4283                 kfree(tpnt->buffer);
4284         }
4285
4286         disk->private_data = NULL;
4287         put_disk(disk);
4288         kfree(tpnt);
4289         return;
4290 }

Although the problem was reported on SLES11.3 the problem appears
in linux-next as well.

The crash is fixed by reordering the code so we no longer access
the struct scsi_tape after the kref_put() is done on it in st_open().

Signed-off-by: Shane Seymour 
Signed-off-by: Darren Lavender 
Reviewed-by: Johannes Thumshirn 
Acked-by: Kai Mäkisara 
Signed-off-by: James Bottomley 
Signed-off-by: Zefan Li

ipr: Increase default adapter init stage change timeout

2015-10-22T01:20:03+00:00

commit 45c44b5ff9caa743ed9c2bfd44307c536c9caf1e upstream.

Increase the default init stage change timeout from 15 seconds to 30 seconds.
This resolves issues we have seen with some adapters not transitioning
to the first init stage within 15 seconds, which results in adapter
initialization failures.

Signed-off-by: Brian King 
Signed-off-by: James Bottomley 
Signed-off-by: Zefan Li

sd: Disable support for 256 byte/sector disks

2015-09-18T01:20:42+00:00

commit 74856fbf441929918c49ff262ace9835048e4e6a upstream.

256 bytes per sector support has been broken since 2.6.X,
and no-one stepped up to fix this.
So disable support for it.

Signed-off-by: Mark Hounschell 
Signed-off-by: Hannes Reinecke 
Signed-off-by: James Bottomley 
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li

3w-9xxx: fix command completion race

2015-09-18T01:20:34+00:00

commit 118c855b5623f3e2e6204f02623d88c09e0c34de upstream.

The 3w-9xxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point.  Also remove the dma mapping helpers
which have another inherent race due to the request_id index.

Signed-off-by: Christoph Hellwig 
Acked-by: Adam Radford 
Signed-off-by: James Bottomley 
Signed-off-by: Zefan Li

3w-xxxx: fix command completion race

2015-09-18T01:20:34+00:00

commit 9cd9554615cba14f0877cc9972a6537ad2bdde61 upstream.

The 3w-xxxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point.  Also remove the dma mapping helpers
which have another inherent race due to the request_id index.

Signed-off-by: Christoph Hellwig 
Acked-by: Adam Radford 
Signed-off-by: James Bottomley 
Signed-off-by: Zefan Li

3w-sas: fix command completion race

2015-09-18T01:20:33+00:00

commit 579d69bc1fd56d5af5761969aa529d1d1c188300 upstream.

The 3w-sas driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point.  Also remove the dma mapping helpers
which have another inherent race due to the request_id index.

Signed-off-by: Christoph Hellwig 
Reported-by: Torsten Luettgert 
Tested-by: Bernd Kardatzki 
Acked-by: Adam Radford 
Signed-off-by: James Bottomley 
Signed-off-by: Zefan Li