linux-stable.git/drivers/scsi, branch v5.7.7

scsi: lpfc: Avoid another null dereference in lpfc_sli4_hba_unset()

2020-06-30T19:36:09+00:00

[ Upstream commit 46da547e21d6cefceec3fb3dba5ebbca056627fc ]

Commit cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with
general hdw_queues per cpu") has introduced static checker warnings for
potential null dereferences in 'lpfc_sli4_hba_unset()' and commit 1ffdd2c0440d
("scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset") has
tried to fix it.  However, yet another potential null dereference is
remaining.  This commit fixes it.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Link: https://lore.kernel.org/r/20200623084122.30633-1-sjpark@amazon.com
Fixes: 1ffdd2c0440d ("scsi: lpfc: resolve static checker warning inlpfc_sli4_hba_unset")
Fixes: cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu")
Reviewed-by: James Smart 
Signed-off-by: SeongJae Park 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: qla2xxx: Keep initiator ports after RSCN

2020-06-30T19:35:57+00:00

commit 632f24f09d5b7c8a2f94932c3391ca957ae76cc4 upstream.

The driver performs SCR (state change registration) in all modes including
pure target mode.

For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the
port mentioned in the RSCN and fabric rescan is scheduled. During the
rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of the
port that caused the RSCN.

In target mode, the session deletion has an impact on ATIO handler,
qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O incoming
from the deleted session. qlt_handle_cmd_for_atio() and
qlt_handle_task_mgmt() return -EFAULT if they are not able to find session
of the command/TMF, and that results in invocation of qlt_send_busy():

  qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014
  qla_target(0): Unable to send command to target, sending BUSY status

Such response causes command timeout on the initiator. Error handler thread
on the initiator will be spawned to abort the commands:

  scsi 23:0:0:0: tag#0 abort scheduled
  scsi 23:0:0:0: tag#0 aborting command
  qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command.
  qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0 -- 0 2003.

Command abort is rejected by target and fails (2003), error handler then
tries to perform DEVICE RESET and TARGET RESET but they're also doomed to
fail because TMFs are ignored for the deleted sessions.

Then initiator makes BUS RESET that resets the link via
qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port up,
SAN switch detects that and sends RSCN to the target port and it fails
again the same way as described above. It never goes out of the loop.

The change breaks the RSCN loop by keeping initiator sessions mentioned in
RSCN payload in all modes, including dual and pure target mode.

Link: https://lore.kernel.org/r/20200605144435.27023-1-r.bolshakov@yadro.com
Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session")
Cc: Quinn Tran 
Cc: Arun Easi 
Cc: Nilesh Javali 
Cc: Bart Van Assche 
Cc: Daniel Wagner 
Cc: Himanshu Madhani 
Cc: Martin Wilck 
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Daniel Wagner 
Reviewed-by: Shyam Sundar 
Reviewed-by: Himanshu Madhani 
Signed-off-by: Roman Bolshakov 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Greg Kroah-Hartman

scsi: ufs-bsg: Fix runtime PM imbalance on error

2020-06-24T15:49:15+00:00

[ Upstream commit a1e17eb03e69bb61bd1b1a14610436b7b9be12d9 ]

When ufs_bsg_alloc_desc_buffer() returns an error code, a pairing runtime
PM usage counter decrement is needed to keep the counter balanced.

Link: https://lore.kernel.org/r/20200522045932.31795-1-dinghao.liu@zju.edu.cn
Fixes: 74e5e468b664 (scsi: ufs-bsg: Wake the device before sending raw upiu commands)
Reviewed-by: Avri Altman 
Signed-off-by: Dinghao Liu 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: acornscsi: Fix an error handling path in acornscsi_probe()

2020-06-24T15:49:09+00:00

[ Upstream commit 42c76c9848e13dbe0538d7ae0147a269dfa859cb ]

'ret' is known to be 0 at this point.  Explicitly return -ENOMEM if one of
the 'ecardm_iomap()' calls fail.

Link: https://lore.kernel.org/r/20200530081622.577888-1-christophe.jaillet@wanadoo.fr
Fixes: e95a1b656a98 ("[ARM] rpc: acornscsi: update to new style ecard driver")
Signed-off-by: Christophe JAILLET 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: ufs: Don't update urgent bkops level when toggling auto bkops

2020-06-24T15:49:02+00:00

[ Upstream commit be32acff43800c87dc5c707f5d47cc607b76b653 ]

Urgent bkops level is used to compare against actual bkops status read from
UFS device. Urgent bkops level is set during initialization and might be
updated in exception event handler during runtime. But it should not be
updated to the actual bkops status every time when auto bkops is toggled.
Otherwise, if urgent bkops level is updated to 0, auto bkops shall always
be kept enabled.

Link: https://lore.kernel.org/r/1590632686-17866-1-git-send-email-cang@codeaurora.org
Fixes: 24366c2afbb0 ("scsi: ufs: Recheck bkops level if bkops is disabled")
Reviewed-by: Stanley Chu 
Signed-off-by: Can Guo 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: iscsi: Fix reference count leak in iscsi_boot_create_kobj

2020-06-24T15:49:02+00:00

[ Upstream commit 0267ffce562c8bbf9b57ebe0e38445ad04972890 ]

kobject_init_and_add() takes reference even when it fails. If this
function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.

Link: https://lore.kernel.org/r/20200528201353.14849-1-wu000273@umn.edu
Reviewed-by: Lee Duncan 
Signed-off-by: Qiushi Wu 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes

2020-06-24T15:48:57+00:00

[ Upstream commit 22617e21633142dd2b81541cb3b95d6fb59aa85f ]

Fix unwinding of pm_runtime changes when bailing out of driver probe due to
a failure and also on removal of driver.

Link: https://lore.kernel.org/r/20200526100340.15032-1-vigneshr@ti.com
Fixes: 6979e56cec97 ("scsi: ufs: Add driver for TI wrapper for Cadence UFS IP")
Reported-by: Dinghao Liu 
Signed-off-by: Vignesh Raghavendra 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim

2020-06-24T15:48:57+00:00

[ Upstream commit 7e7cd796f2776d055351d80328f45633bbb0aae5 ]

iSCSI suffers from a deadlock in case a management command submitted via
the netlink socket sleeps on an allocation while holding the rx_queue_mutex
if that allocation causes a memory reclaim that writebacks to a failed
iSCSI device.  The recovery procedure can never make progress to recover
the failed disk or abort outstanding IO operations to complete the reclaim
(since rx_queue_mutex is locked), thus locking the system.

Nevertheless, just marking all allocations under rx_queue_mutex as GFP_NOIO
(or locking the userspace process with something like PF_MEMALLOC_NOIO) is
not enough, since the iSCSI command code relies on other subsystems that
try to grab locked mutexes, whose threads are GFP_IO, leading to the same
deadlock. One instance where this situation can be observed is in the
backtraces below, stitched from multiple bugs reports, involving the kobj
uevent sent when a session is created.

The root of the problem is not the fact that iSCSI does GFP_IO allocations,
that is acceptable. The actual problem is that rx_queue_mutex has a very
large granularity, covering every unrelated netlink command execution at
the same time as the error recovery path.

The proposed fix leverages the recently added mechanism to stop failed
connections from the kernel, by enabling it to execute even though a
management command from the netlink socket is being run (rx_queue_mutex is
held), provided that the command is known to be safe.  It splits the
rx_queue_mutex in two mutexes, one protecting from concurrent command
execution from the netlink socket, and one protecting stop_conn from racing
with other connection management operations that might conflict with it.

It is not very pretty, but it is the simplest way to resolve the deadlock.
I considered making it a lock per connection, but some external mutex would
still be needed to deal with iscsi_if_destroy_conn.

The patch was tested by forcing a memory shrinker (unrelated, but used
bufio/dm-verity) to reclaim iSCSI pages every time
ISCSI_UEVENT_CREATE_SESSION happens, which is reasonable to simulate
reclaims that might happen with GFP_KERNEL on that path.  Then, a faulty
hung target causes a connection to fail during intensive IO, at the same
time a new session is added by iscsid.

The following stacktraces are stiches from several bug reports, showing a
case where the deadlock can happen.

 iSCSI-write
         holding: rx_queue_mutex
         waiting: uevent_sock_mutex

         kobject_uevent_env+0x1bd/0x419
         kobject_uevent+0xb/0xd
         device_add+0x48a/0x678
         scsi_add_host_with_dma+0xc5/0x22d
         iscsi_host_add+0x53/0x55
         iscsi_sw_tcp_session_create+0xa6/0x129
         iscsi_if_rx+0x100/0x1247
         netlink_unicast+0x213/0x4f0
         netlink_sendmsg+0x230/0x3c0

 iscsi_fail iscsi_conn_failure
         waiting: rx_queue_mutex

         schedule_preempt_disabled+0x325/0x734
         __mutex_lock_slowpath+0x18b/0x230
         mutex_lock+0x22/0x40
         iscsi_conn_failure+0x42/0x149
         worker_thread+0x24a/0xbc0

 EventManager_
         holding: uevent_sock_mutex
         waiting: dm_bufio_client->lock

         dm_bufio_lock+0xe/0x10
         shrink+0x34/0xf7
         shrink_slab+0x177/0x5d0
         do_try_to_free_pages+0x129/0x470
         try_to_free_mem_cgroup_pages+0x14f/0x210
         memcg_kmem_newpage_charge+0xa6d/0x13b0
         __alloc_pages_nodemask+0x4a3/0x1a70
         fallback_alloc+0x1b2/0x36c
         __kmalloc_node_track_caller+0xb9/0x10d0
         __alloc_skb+0x83/0x2f0
         kobject_uevent_env+0x26b/0x419
         dm_kobject_uevent+0x70/0x79
         dev_suspend+0x1a9/0x1e7
         ctl_ioctl+0x3e9/0x411
         dm_ctl_ioctl+0x13/0x17
         do_vfs_ioctl+0xb3/0x460
         SyS_ioctl+0x5e/0x90

 MemcgReclaimerD"
         holding: dm_bufio_client->lock
         waiting: stuck io to finish (needs iscsi_fail thread to progress)

         schedule at ffffffffbd603618
         io_schedule at ffffffffbd603ba4
         do_io_schedule at ffffffffbdaf0d94
         __wait_on_bit at ffffffffbd6008a6
         out_of_line_wait_on_bit at ffffffffbd600960
         wait_on_bit.constprop.10 at ffffffffbdaf0f17
         __make_buffer_clean at ffffffffbdaf18ba
         __cleanup_old_buffer at ffffffffbdaf192f
         shrink at ffffffffbdaf19fd
         do_shrink_slab at ffffffffbd6ec000
         shrink_slab at ffffffffbd6ec24a
         do_try_to_free_pages at ffffffffbd6eda09
         try_to_free_mem_cgroup_pages at ffffffffbd6ede7e
         mem_cgroup_resize_limit at ffffffffbd7024c0
         mem_cgroup_write at ffffffffbd703149
         cgroup_file_write at ffffffffbd6d9c6e
         sys_write at ffffffffbd6662ea
         system_call_fastpath at ffffffffbdbc34a2

Link: https://lore.kernel.org/r/20200520022959.1912856-1-krisman@collabora.com
Reported-by: Khazhismel Kumykov 
Reviewed-by: Lee Duncan 
Signed-off-by: Gabriel Krisman Bertazi 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: ufs-qcom: Fix scheduling while atomic issue

2020-06-24T15:48:57+00:00

[ Upstream commit 3be60b564de49875e47974c37fabced893cd0931 ]

ufs_qcom_dump_dbg_regs() uses usleep_range, a sleeping function, but can be
called from atomic context in the following flow:

ufshcd_intr -> ufshcd_sl_intr -> ufshcd_check_errors ->
ufshcd_print_host_regs -> ufshcd_vops_dbg_register_dump ->
ufs_qcom_dump_dbg_regs

This causes a boot crash on the Lenovo Miix 630 when the interrupt is
handled on the idle thread.

Fix the issue by switching to udelay().

Link: https://lore.kernel.org/r/20200525204125.46171-1-jeffrey.l.hugo@gmail.com
Fixes: 9c46b8676271 ("scsi: ufs-qcom: dump additional testbus registers")
Reviewed-by: Bean Huo 
Reviewed-by: Avri Altman 
Signed-off-by: Jeffrey Hugo 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin

scsi: core: Fix incorrect usage of shost_for_each_device

2020-06-24T15:48:51+00:00

[ Upstream commit 4dea170f4fb225984b4f2f1cf0a41d485177b905 ]

shost_for_each_device(sdev, shost) \
	for ((sdev) = __scsi_iterate_devices((shost), NULL); \
	     (sdev); \
	     (sdev) = __scsi_iterate_devices((shost), (sdev)))

When terminating shost_for_each_device() iteration with break or return,
scsi_device_put() should be used to prevent stale scsi device references
from being left behind.

Link: https://lore.kernel.org/r/20200518074420.39275-1-yebin10@huawei.com
Reviewed-by: Bart Van Assche 
Signed-off-by: Ye Bin 
Signed-off-by: Martin K. Petersen 
Signed-off-by: Sasha Levin