linux-stable.git/drivers/scsi, branch v4.4.45

qla2xxx: Fix crash due to null pointer access

2017-01-26T07:23:48+00:00

commit fc1ffd6cb38a1c1af625b9833c41928039e733f5 upstream.

During code inspection, while investigating following stack trace
seen on one of the test setup, we found out there was possibility
of memory leak becuase driver was not unwinding the stack properly.

This issue has not been reproduced in a test environment or on a
customer setup.

Here's stack trace that was seen.

[1469877.797315] Call Trace:
[1469877.799940]  [] qla2x00_mem_alloc+0xb09/0x10c0 [qla2xxx]
[1469877.806980]  [] qla2x00_probe_one+0x86a/0x1b50 [qla2xxx]
[1469877.814013]  [] ? __pm_runtime_resume+0x51/0xa0
[1469877.820265]  [] ? _raw_spin_lock_irqsave+0x25/0x90
[1469877.826776]  [] ? _raw_spin_unlock_irqrestore+0x6d/0x80
[1469877.833720]  [] ? preempt_count_sub+0xb1/0x100
[1469877.839885]  [] ? _raw_spin_unlock_irqrestore+0x4c/0x80
[1469877.846830]  [] local_pci_probe+0x4c/0xb0
[1469877.852562]  [] ? preempt_count_sub+0xb1/0x100
[1469877.858727]  [] pci_call_probe+0x89/0xb0

Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
Reviewed-by: Christoph Hellwig 
[ bvanassche: Fixed spelling in patch description ]
Signed-off-by: Bart Van Assche 
Signed-off-by: Greg Kroah-Hartman

scsi: mvsas: fix command_active typo

2017-01-12T10:22:49+00:00

commit af15769ffab13d777e55fdef09d0762bf0c249c4 upstream.

gcc-7 notices that the condition in mvs_94xx_command_active looks
suspicious:

drivers/scsi/mvsas/mv_94xx.c: In function 'mvs_94xx_command_active':
drivers/scsi/mvsas/mv_94xx.c:671:15: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context]

This was introduced when the mv_printk() statement got added, and leads
to the condition being ignored. This is probably harmless.

Changing '&&' to '&' makes the code look reasonable, as we check the
command bit before setting and printing it.

Fixes: a4632aae8b66 ("[SCSI] mvsas: Add new macros and functions")
Signed-off-by: Arnd Bergmann 
Reviewed-by: Johannes Thumshirn 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

sg_write()/bsg_write() is not fit to be called under KERNEL_DS

2017-01-09T07:07:53+00:00

commit 128394eff343fc6d2f32172f03e24829539c5835 upstream.

Both damn things interpret userland pointers embedded into the payload;
worse, they are actually traversing those.  Leaving aside the bad
API design, this is very much _not_ safe to call with KERNEL_DS.
Bail out early if that happens.

Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

scsi: avoid a permanent stop of the scsi device's request queue

2017-01-09T07:07:48+00:00

commit d2a145252c52792bc59e4767b486b26c430af4bb upstream.

A race between scanning and fc_remote_port_delete() may result in a
permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
and unblocked after.  The reason is that blocking a device sets both the
SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED.  However,
scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
device to be ignored by scsi_target_unblock() and thus never have its
QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
running but has a stopped queue.

We actually have two places where SDEV_RUNNING is set: once in
scsi_add_lun() which respects the blocked flag and once in
scsi_sysfs_add_sdev() which doesn't.  Since the second set is entirely
spurious, simply remove it to fix the problem.

Reported-by: Zengxi Chen 
Signed-off-by: Wei Fang 
Reviewed-by: Ewan D. Milne 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map

2017-01-09T07:07:47+00:00

commit d5573584429254a14708cf8375c47092b5edaf2c upstream.

Signed-off-by: Sumit Saxena 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Tomas Henzl 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset

2017-01-09T07:07:47+00:00

commit 18e1c7f68a5814442abad849abe6eacbf02ffd7c upstream.

For SRIOV enabled firmware, if there is a OCR(online controller reset)
possibility driver set the convert flag to 1, which is not happening if
there are outstanding commands even after 180 seconds.  As driver does
not set convert flag to 1 and still making the OCR to run, VF(Virtual
function) driver is directly writing on to the register instead of
waiting for 30 seconds. Setting convert flag to 1 will cause VF driver
will wait for 30 secs before going for reset.

Signed-off-by: Kiran Kumar Kasturi 
Signed-off-by: Sumit Saxena 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Tomas Henzl 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: mpt3sas: Unblock device after controller reset

2016-12-02T08:09:02+00:00

commit 7ff723ad0f87feba43dda45fdae71206063dd7d4 upstream.

While issuing any ATA passthrough command to firmware the driver will
block the device. But it will unblock the device only if the I/O
completes through the ISR path. If a controller reset occurs before
command completion the device will remain in blocked state.

Make sure we unblock the device following a controller reset if an ATA
passthrough command was queued.

[mkp: clarified patch description]

Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag reset")
Signed-off-by: Suganath Prabu S 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: mpt3sas: Fix secure erase premature termination

2016-12-02T08:09:01+00:00

commit 18f6084a989ba1b38702f9af37a2e4049a924be6 upstream.

This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific
command which timed out, this leads to entire device halt
(scsi_state terminated) and premature termination of the secure erase.

Set device state to busy while ATA passthrough commands are in progress.

[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]

Signed-off-by: Andrey Grodzovsky 
Acked-by: Sreekanth Reddy 
Cc: 
Cc: Sathya Prakash 
Cc: Chaitra P B 
Cc: Suganath Prabu Subramani 
Cc: Sreekanth Reddy 
Cc: Hannes Reinecke 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk

2016-11-18T09:48:36+00:00

commit 6d3a56ed098566bc83d6c2afa74b4199c12ea074 upstream.

While merging mpt3sas & mpt2sas code, we added the is_warpdrive check
condition on the wrong line

---------------------------------------------------------------------------
 scsih_target_alloc(struct scsi_target *starget)
                        sas_target_priv_data->handle = raid_device->handle;
                        sas_target_priv_data->sas_address = raid_device->wwid;
                        sas_target_priv_data->flags |= MPT_TARGET_FLAGS_VOLUME;
-                       raid_device->starget = starget;
+                       sas_target_priv_data->raid_device = raid_device;
+                       if (ioc->is_warpdrive)
+                               raid_device->starget = starget;
                }
                spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
                return 0;
------------------------------------------------------------------------------

That check should be for the line sas_target_priv_data->raid_device =
raid_device;

Due to above hunk, we are not initializing raid_device's starget for
raid volumes, and so during raid disk deletion driver is not calling
scsi_remove_target() API as driver observes starget field of
raid_device's structure as NULL.

Signed-off-by: Sreekanth Reddy 
Fixes: 7786ab6aff9 ("mpt3sas: Ported WarpDrive product SSS6200 support")
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init

2016-11-18T09:48:35+00:00

commit a5dd506e1584e91f3e7500ab9a165aa1b49eabd4 upstream.

A system can get hung task timeouts if a qlogic board fails during
initialization (if the board breaks again or fails the init). The hang
involves the scsi scan.

In a nutshell, since commit beb9e315e6e0 ("qla2xxx: Prevent removal and
board_disable race"):

...it is possible to have freed ha (base_vha->hw) early by a call to
qla2x00_remove_one when pdev->enable_cnt equals zero:

       if (!atomic_read(&pdev->enable_cnt)) {
               scsi_host_put(base_vha->host);
               kfree(ha);
               pci_set_drvdata(pdev, NULL);
               return;

Almost always, the scsi_host_put above frees the vha structure
(attached to the end of the Scsi_Host we're putting) since it's the last
put, and life is good.  However, if we are entering this routine because
the adapter has broken sometime during initialization AND a scsi scan is
already in progress (and has done its own scsi_host_get), vha will not
be freed. What's worse, the scsi scan will access the freed ha structure
through qla2xxx_scan_finished:

        if (time > vha->hw->loop_reset_delay * HZ)
                return 1;

The scsi scan keeps checking to see if a scan is complete by calling
qla2xxx_scan_finished. There is a timeout value that limits the length
of time a scan can take (hw->loop_reset_delay, usually set to 5
seconds), but this definition is in the data structure (hw) that can get
freed early.

This can yield unpredictable results, the worst of which is that the
scsi scan can hang indefinitely. This happens when the freed structure
gets reused and loop_reset_delay gets overwritten with garbage, which
the scan obliviously uses as its timeout value.

The fix for this is simple: at the top of qla2xxx_scan_finished, check
for the UNLOADING bit in the vha structure (_vha is not freed at this
point).  If UNLOADING is set, we exit the scan for this adapter
immediately. After this last reference to the ha structure, we'll exit
the scan for this adapter, and continue on.

This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against a
RHEL7 driver variant, but the bug and patch are equally relevant to to
the upstream driver.

Fixes: beb9e315e6e0 ("qla2xxx: Prevent removal and board_disable race")
Signed-off-by: Bill Kuzeja 
Acked-by: Himanshu Madhani 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman