linux-stable.git/drivers/scsi, branch linux-4.15.y

scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure

2018-04-19T06:55:11+00:00

commit 6d6340672ba3a99c4cf7af79c2edf7aa25595c84 upstream.

The code that fixes the crashes in the following commit introduced a small
memory leak:

commit 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")

Fixing this requires a bit of reworking, which I've explained. Also provide
some code cleanup.

There is a small window in qla2x00_probe_one where if qla2x00_alloc_queues
fails, we end up never freeing req and rsp and leak 0xc0 and 0xc8 bytes
respectively (the sizes of req and rsp).

I originally put in checks to test for this condition which were based on
the incorrect assumption that if ha->rsp_q_map and ha->req_q_map were
allocated, then rsp and req were allocated as well. This is incorrect.
There is a window between these allocations:

       ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
                goto probe_hw_failed;

[if successful, both rsp and req allocated]

       base_vha = qla2x00_create_host(sht, ha);
                goto probe_hw_failed;

       ret = qla2x00_request_irqs(ha, rsp);
                goto probe_failed;

       if (qla2x00_alloc_queues(ha, req, rsp)) {
                goto probe_failed;

[if successful, now ha->rsp_q_map and ha->req_q_map allocated]

To simplify this, we should just set req and rsp to NULL after we free
them. Sounds simple enough? The problem is that req and rsp are pointers
defined in the qla2x00_probe_one and they are not always passed by reference
to the routines that free them.

Here are paths which can free req and rsp:

PATH 1:
qla2x00_probe_one
   ret = qla2x00_mem_alloc(ha, req_length, rsp_length, &req, &rsp);
   [req and rsp are passed by reference, but if this fails, we currently
    do not NULL out req and rsp. Easily fixed]

PATH 2:
qla2x00_probe_one
   failing in qla2x00_request_irqs or qla2x00_alloc_queues
      probe_failed:
         qla2x00_free_device(base_vha);
            qla2x00_free_req_que(ha, req)
            qla2x00_free_rsp_que(ha, rsp)

PATH 3:
qla2x00_probe_one:
   failing in qla2x00_mem_alloc or qla2x00_create_host
      probe_hw_failed:
         qla2x00_free_req_que(ha, req)
         qla2x00_free_rsp_que(ha, rsp)

PATH 1: This should currently work, but it doesn't because rsp and rsp are
not set to NULL in qla2x00_mem_alloc. Easily remedied.

PATH 2: req and rsp aren't passed in at all to qla2x00_free_device but are
derived from ha->req_q_map[0] and ha->rsp_q_map[0]. These are only set up if
qla2x00_alloc_queues succeeds.

In qla2x00_free_queues, we are protected from crashing if these don't exist
because req_qid_map and rsp_qid_map are only set on their allocation. We are
guarded in this way:

        for (cnt = 0; cnt < ha->max_req_queues; cnt++) {
                if (!test_bit(cnt, ha->req_qid_map))
                        continue;

PATH 3: This works. We haven't freed req or rsp yet (or they were never
allocated if qla2x00_mem_alloc failed), so we'll attempt to free them here.

To summarize, there are a few small changes to make this work correctly and
(and for some cleanup):

1) (For PATH 1) Set *rsp and *req to NULL in case of failure in
qla2x00_mem_alloc so these are correctly set to NULL back in
qla2x00_probe_one

2) After jumping to probe_failed: and calling qla2x00_free_device,
explicitly set rsp and req to NULL so further calls with these pointers do
not crash, i.e. the free queue calls in the probe_hw_failed section we fall
through to.

3) Fix return code check in the call to qla2x00_alloc_queues. We currently
drop the return code on the floor. The probe fails but the caller of the
probe doesn't have an error code, so it attaches to pci. This can result in
a crash on module shutdown.

4) Remove unnecessary NULL checks in qla2x00_free_req_que,
qla2x00_free_rsp_que, and the egregious NULL checks before kfrees and vfrees
in qla2x00_mem_free.

I tested this out running a scenario where the card breaks at various times
during initialization. I made sure I forced every error exit path in
qla2x00_probe_one.

Cc:  # v4.16
Fixes: 6a2cf8d3663e ("scsi: qla2xxx: Fix crashes in qla2x00_probe_one on probe failure")
Signed-off-by: Bill Kuzeja 
Acked-by: Himanshu Madhani 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: megaraid_sas: unload flag should be set after scsi_remove_host is called

2018-04-12T10:31:12+00:00

[ Upstream commit f3f7920b3910171b2999c7dc2335eb9f583e44f2 ]

Issue - Driver returns DID_NO_CONNECT when unload is in progress,
indicated using instance->unload flag. In case of dynamic unload of
driver, this flag is set before calling scsi_remove_host(). While doing
manual driver unload, user will see lots of prints for Sync Cache
command with DID_NO_CONNECT status.

Fix - Set the instance->unload flag after scsi_remove_host(). Allow
device removal process to be completed and do not block any command
before that.  SCSI commands (like SYNC_CACHE) are received (as part of
scsi_remove_host) by driver during unload will be submitted further down
to the drives.

Signed-off-by: Sumit Saxena 
Signed-off-by: Shivasharan S 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: megaraid_sas: Error handling for invalid ldcount provided by firmware in RAID map

2018-04-12T10:31:12+00:00

[ Upstream commit 7ada701d0d5e5c6d357e157a72b841db3e8d03f4 ]

Currently driver does not validate ldcount provided by firmware.  If the
value is invalid, fail RAID map validation accordingly.  This issue is
rare to hit in field and is fixed as part of code review.

Signed-off-by: Sumit Saxena 
Signed-off-by: Shivasharan S 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: libsas: initialize sas_phy status according to response of DISCOVER

2018-04-12T10:31:10+00:00

[ Upstream commit affc67788fe5dfffad5cda3d461db5cf2b2ff2b0 ]

The status of SAS PHY is in sas_phy->enabled. There is an issue that the
status of a remote SAS PHY may be initialized incorrectly: if disable
remote SAS PHY through sysfs interface (such as echo 0 >
/sys/class/sas_phy/phy-1:0:0/enable), then reboot the system, and we
will find the status of remote SAS PHY which is disabled before is
1 (cat /sys/class/sas_phy/phy-1:0:0/enable). But actually the status of
remote SAS PHY is disabled and the device attached is not found.

In SAS protocol, NEGOTIATED LOGICAL LINK RATE field of DISCOVER response
is 0x1 when remote SAS PHY is disabled. So initialize sas_phy->enabled
according to the value of NEGOTIATED LOGICAL LINK RATE field.

Signed-off-by: chenxiang 
Reviewed-by: John Garry 
Signed-off-by: Jason Yan 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: libsas: fix error when getting phy events

2018-04-12T10:31:10+00:00

[ Upstream commit 2b23d9509fd7174b362482cf5f3b5f9a2265bc33 ]

The intend purpose here was to goto out if smp_execute_task() returned
error. Obviously something got screwed up. We will never get these link
error statistics below:

~:/sys/class/sas_phy/phy-1:0:12 # cat invalid_dword_count
0
~:/sys/class/sas_phy/phy-1:0:12 # cat running_disparity_error_count
0
~:/sys/class/sas_phy/phy-1:0:12 # cat loss_of_dword_sync_count
0
~:/sys/class/sas_phy/phy-1:0:12 # cat phy_reset_problem_count
0

Obviously we should goto error handler if smp_execute_task() returns
non-zero.

Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
Signed-off-by: Jason Yan 
CC: John Garry 
CC: chenqilin 
CC: chenxiang 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: libsas: fix memory leak in sas_smp_get_phy_events()

2018-04-12T10:31:10+00:00

[ Upstream commit 4a491b1ab11ca0556d2fda1ff1301e862a2d44c4 ]

We've got a memory leak with the following producer:

while true;
do cat /sys/class/sas_phy/phy-1:0:12/invalid_dword_count >/dev/null;
done

The buffer req is allocated and not freed after we return. Fix it.

Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
Signed-off-by: Jason Yan 
CC: John Garry 
CC: chenqilin 
CC: chenxiang 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: libsas: Use dynamic alloced work to avoid sas event lost

2018-04-12T10:31:10+00:00

[ Upstream commit 1c393b970e0f4070e4376d45f89a2d19a5c895d0 ]

Now libsas hotplug work is static, every sas event type has its own
static work, LLDD driver queues the hotplug work into shost->work_q.  If
LLDD driver burst posts lots hotplug events to libsas, the hotplug
events may pending in the workqueue like

shost->work_q
new work[PORTE_BYTES_DMAED] --> |[PHYE_LOSS_OF_SIGNAL][PORTE_BYTES_DMAED] -> processing
                                |<-------wait worker to process-------->|

In this case, a new PORTE_BYTES_DMAED event coming, libsas try to queue
it to shost->work_q, but this work is already pending, so it would be
lost. Finally, libsas delete the related sas port and sas devices, but
LLDD driver expect libsas add the sas port and devices(last sas event).

This patch use dynamic allocated work to avoid this issue.

Signed-off-by: Yijing Wang 
CC: John Garry 
CC: Johannes Thumshirn 
CC: Ewan Milne 
CC: Christoph Hellwig 
CC: Tomas Henzl 
CC: Dan Williams 
Reviewed-by: Hannes Reinecke 
Signed-off-by: Jason Yan 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag.

2018-04-12T10:31:05+00:00

[ Upstream commit f49d4aed1315a7b766d855f1367142e682b0cc87 ]

1. In IO path, setting of "ATA command pending" flag early before device
   removal, invalid device handle etc., checks causes any new commands
   to be always returned with SAM_STAT_BUSY and when the driver removes
   the drive the SML issues SYNC Cache command and that command is
   always returned with SAM_STAT_BUSY and thus making SYNC Cache command
   to requeued.

2. If the driver gets an ATA PT command for a SATA drive then the driver
   set "ATA command pending" flag in device specific data structure not
   to allow any further commands until the ATA PT command is completed.
   However, after setting the flag if the driver decides to return the
   command back to upper layers without actually issuing to the firmware
   (i.e., returns from qcmd failure return paths) then the corresponding
   flag is not cleared and this prevents the driver from sending any new
   commands to the drive.

This patch fixes above two issues by setting of "ATA command pending"
flag after checking for whether device deleted, invalid device handle,
device busy with task management. And by setting "ATA command pending"
flag to false in all of the qcmd failure return paths after setting the
flag.

Signed-off-by: Chaitra P B 
Signed-off-by: Suganath Prabu S 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: libiscsi: Allow sd_shutdown on bad transport

2018-04-12T10:31:05+00:00

[ Upstream commit d754941225a7dbc61f6dd2173fa9498049f9a7ee ]

If, for any reason, userland shuts down iscsi transport interfaces
before proper logouts - like when logging in to LUNs manually, without
logging out on server shutdown, or when automated scripts can't
umount/logout from logged LUNs - kernel will hang forever on its
sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
still existent paths.

PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
 #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
 #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
 #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
 #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c

This happens because iscsi_eh_cmd_timed_out(), the transport layer
timeout helper, would tell the queue timeout function (scsi_times_out)
to reset the request timer over and over, until the session state is
back to logged in state. Unfortunately, during server shutdown, this
might never happen again.

Other option would be "not to handle" the issue in the transport
layer. That would trigger the error handler logic, which would also need
the session state to be logged in again.

Best option, for such case, is to tell upper layers that the command was
handled during the transport layer error handler helper, marking it as
DID_NO_CONNECT, which will allow completion and inform about the
problem.

After the session was marked as ISCSI_STATE_FAILED, due to the first
timeout during the server shutdown phase, all subsequent cmds will fail
to be queued, allowing upper logic to fail faster.

Signed-off-by: Rafael David Tinoco 
Reviewed-by: Lee Duncan 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

scsi: lpfc: Fix issues connecting with nvme initiator

2018-03-24T10:02:51+00:00

[ Upstream commit e06351a002214d152142906a546006e3446d1ef7 ]

In the lpfc discovery engine, when as a nvme target, where the driver
was performing mailbox io with the adapter for port login when a NVME
PRLI is received from the host. Rather than queue and eventually get
back to sending a response after the mailbox traffic, the driver
rejected the io with an error response.

Turns out this particular initiator didn't like the rejection values
(unable to process command/command in progress) so it never attempted a
retry of the PRLI. Thus the host never established nvme connectivity
with the lpfc target.

By changing the rejection values (to Logical Busy/nothing more), the
initiator accepted the response and would retry the PRLI, resulting in
nvme connectivity.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
Reviewed-by: Hannes Reinecke 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman