linux-stable.git/drivers/scsi, branch v3.2.30

Fix 'Device not ready' issue on mpt2sas

2012-09-19T14:04:29+00:00

commit 14216561e164671ce147458653b1fea06a4ada1e upstream.

This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.

SAT-2 says (section 8.12.2)

        if the device is in the stopped state as the result of
        processing a START STOP UNIT command (see 9.11), then the SATL
        shall terminate the TEST UNIT READY command with CHECK CONDITION
        status with the sense key set to NOT READY and the additional
        sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
        REQUIRED;

mpt2sas internal SATL seems to implement this.  The result is very confusing
standby behaviour (using hdparm -y).  If you suspend a drive and then send
another command, usually it wakes up.  However, if the next command is a TEST
UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
the SATL rules for this, returning NOT READY to all subsequent commands.  This
means that the ordering of TEST UNIT READY is crucial: if you send TUR and
then a command, you get a NOT READY to both back.  If you send a command and
then a TUR, you get GOOD status because the preceeding command woke the drive.

This bit us badly because

commit 85ef06d1d252f6a2e73b678591ab71caad4667bb
Author: Tejun Heo 
Date:   Fri Jul 1 16:17:47 2011 +0200

    block: flush MEDIA_CHANGE from drivers on close(2)

Changed our ordering on TEST UNIT READY commands meaning that SATA drives
connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
SATL sees the suspend *before* the drives get awoken by the next ATA command)
resulting in lots of failed commands.

The standard is completely nuts forcing this inconsistent behaviour, but we
have to work around it.

The fix for this is twofold:

   1. Set the allow_restart flag so we wake the drive when we see it has been
      suspended

   2. Return all TEST UNIT READY status directly to the mid layer without any
      further error handling which prevents us causing error handling which
      may offline the device just because of a media check TUR.

Reported-by: Matthias Prager 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

megaraid_sas: Move poll_aen_lock initializer

2012-09-19T14:04:28+00:00

commit bd8d6dd43a77bfd2b8fef5b094b9d6095e169dee upstream.

The following patch moves the poll_aen_lock initializer from
megasas_probe_one() to megasas_init().  This prevents a crash when a user
loads the driver and tries to issue a poll() system call on the ioctl
interface with no adapters present.

Signed-off-by: Kashyap Desai 
Signed-off-by: Adam Radford 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

mpt2sas: Fix for Driver oops, when loading driver with max_queue_depth command line option to a very small value

2012-09-19T14:04:28+00:00

commit 338b131a3269881c7431234855c93c219b0979b6 upstream.

If the specified max_queue_depth setting is less than the
expected number of internal commands, then driver will calculate
the queue depth size to a negitive number. This negitive number
is actually a very large number because variable is unsigned
16bit integer. So, the driver will ask for a very large amount of
memory for message frames and resulting into oops as memory
allocation routines will not able to handle such a large request.

So, in order to limit this kind of oops, The driver need to set
the max_queue_depth to a scsi mid layer's can_queue value. Then
the overall message frames required for IO is minimum of either
(max_queue_depth plus internal commands) or the IOC global
credits.

Signed-off-by: Sreekanth Reddy 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

libsas: fix sas_discover_devices return code handling

2012-08-02T13:37:56+00:00

commit b17caa174a7e1fd2e17b26e210d4ee91c4c28b37 upstream.

commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues

The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports.  The result is random discovery failures
depending on configuration.

Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'.  Convert it to bool and stop
returning its result up the stack.

Tested-by: Dan Melnic 
Reported-by: Dan Melnic 
Signed-off-by: Dan Williams 
Reviewed-by: Jack Wang 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

libsas: continue revalidation

2012-08-02T13:37:56+00:00

commit 26f2f199ff150d8876b2641c41e60d1c92d2fb81 upstream.

Continue running revalidation until no more broadcast devices are
discovered.  Fixes cases where re-discovery completes too early in a
domain with multiple expanders with pending re-discovery events.
Servicing BCNs can get backed up behind error recovery.

Signed-off-by: Dan Williams 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)

2012-08-02T13:37:56+00:00

commit 57fc2e335fd3c2f898ee73570dc81426c28dc7b4 upstream.

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain.  When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh.  Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.

Reported-by: Tom Jackson 
Tested-by: Tom Jackson 
Signed-off-by: Dan Williams 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

fix hot unplug vs async scan race

2012-08-02T13:37:55+00:00

commit 3b661a92e869ebe2358de8f4b3230ad84f7fce51 upstream.

The following crash results from cases where the end_device has been
removed before scsi_sysfs_add_sdev has had a chance to run.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [] sysfs_create_dir+0x32/0xb6
 ...
 Call Trace:
  [] kobject_add_internal+0x120/0x1e3
  [] ? trace_hardirqs_on+0xd/0xf
  [] kobject_add_varg+0x41/0x50
  [] kobject_add+0x64/0x66
  [] device_add+0x12d/0x63a
  [] ? _raw_spin_unlock_irqrestore+0x47/0x56
  [] ? module_refcount+0x89/0xa0
  [] scsi_sysfs_add_sdev+0x4e/0x28a
  [] do_scan_async+0x9c/0x145

...teach scsi_sysfs_add_devices() to check for deleted devices() before
trying to add them, and teach scsi_remove_target() how to remove targets
that have not been added via device_add().

Reported-by: Dariusz Majchrzak 
Signed-off-by: Dan Williams 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

Avoid dangling pointer in scsi_requeue_command()

2012-08-02T13:37:55+00:00

commit 940f5d47e2f2e1fa00443921a0abf4822335b54d upstream.

When we call scsi_unprep_request() the command associated with the request
gets destroyed and therefore drops its reference on the device.  If this was
the only reference, the device may get released and we end up with a NULL
pointer deref when we call blk_requeue_request.

Reported-by: Mike Christie 
Signed-off-by: Bart Van Assche 
Reviewed-by: Mike Christie 
Reviewed-by: Tejun Heo 
[jejb: enhance commend and add commit log for stable]
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

Fix device removal NULL pointer dereference

2012-08-02T13:37:54+00:00

commit 67bd94130015c507011af37858989b199c52e1de upstream.

Use blk_queue_dead() to test whether the queue is dead instead
of !sdev. Since scsi_prep_fn() may be invoked concurrently with
__scsi_remove_device(), keep the queuedata (sdev) pointer in
__scsi_remove_device(). This patch fixes a kernel oops that
can be triggered by USB device removal. See also
http://www.spinics.net/lists/linux-scsi/msg56254.html.

Other changes included in this patch:
- Swap the blk_cleanup_queue() and kfree() calls in
  scsi_host_dev_release() to make that code easier to grasp.
- Remove the queue dead check from scsi_run_queue() since the
  queue state can change anyway at any point in that function
  where the queue lock is not held.
- Remove the queue dead check from the start of scsi_request_fn()
  since it is redundant with the scsi_device_online() check.

Reported-by: Jun'ichi Nomura 
Signed-off-by: Bart Van Assche 
Reviewed-by: Mike Christie 
Reviewed-by: Tejun Heo 
Signed-off-by: James Bottomley 
Signed-off-by: Ben Hutchings

Fix NULL dereferences in scsi_cmd_to_driver

2012-08-02T13:37:37+00:00

commit 222a806af830fda34ad1f6bc991cd226916de060 upstream.

Avoid crashing if the private_data pointer happens to be NULL. This has
been seen sometimes when a host reset happens, notably when there are
many LUNs:

host3: Assigned Port ID 0c1601
scsi host3: libfc: Host reset succeeded on port (0c1601)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000350
IP: [] scsi_send_eh_cmnd+0x58/0x3a0

Process scsi_eh_3 (pid: 4144, threadinfo ffff88030920c000, task ffff880326b160c0)
Stack:
 000000010372e6ba 0000000000000282 000027100920dca0 ffffffffa0038ee0
 0000000000000000 0000000000030003 ffff88030920dc80 ffff88030920dc80
 00000002000e0000 0000000a00004000 ffff8803242f7760 ffff88031326ed80
Call Trace:
 [] ? lock_timer_base+0x70/0x70
 [] scsi_eh_tur+0x3e/0xc0
 [] scsi_eh_test_devices+0x76/0x170
 [] scsi_eh_host_reset+0x85/0x160
 [] scsi_eh_ready_devs+0x91/0x110
 [] scsi_unjam_host+0xed/0x1f0
 [] scsi_error_handler+0x1a8/0x200
 [] ? scsi_unjam_host+0x1f0/0x1f0
 [] kthread+0x9e/0xb0
 [] kernel_thread_helper+0x4/0x10
 [] ? kthread_freezable_should_stop+0x70/0x70
 [] ? gs_change+0x13/0x13
Code: 25 28 00 00 00 48 89 45 c8 31 c0 48 8b 87 80 00 00 00 48 8d b5 60 ff ff ff 89 d1 48 89 fb 41 89 d6 4c 89 fa 48 8b 80 b8 00 00 00
 <48> 8b 80 50 03 00 00 48 8b 00 48 89 85 38 ff ff ff 48 8b 07 4c
RIP  [] scsi_send_eh_cmnd+0x58/0x3a0
 RSP 
CR2: 0000000000000350

Signed-off-by: Mark Rustad 
Tested-by: Marcus Dennis 
Signed-off-by: James Bottomley 
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings