<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/scsi/ipr.c, branch v4.16</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>scsi: ipr: fix incorrect indentation of assignment statement</title>
<updated>2017-12-05T03:56:03+00:00</updated>
<author>
<name>Colin Ian King</name>
<email>colin.king@canonical.com</email>
</author>
<published>2017-12-01T13:33:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b82378e682d7128a5d26a4c68fa748e13fdd996a'/>
<id>b82378e682d7128a5d26a4c68fa748e13fdd996a</id>
<content type='text'>
Remove one extraneous level of indentation on an assignment statement.

Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Acked-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove one extraneous level of indentation on an assignment statement.

Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Acked-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts</title>
<updated>2017-11-22T00:35:54+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-10-23T07:40:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=841b86f3289dbe858daeceec36423d4ea286fac2'/>
<id>841b86f3289dbe858daeceec36423d4ea286fac2</id>
<content type='text'>
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
        $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
        $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
        $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
        $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>scsi: ipr: Convert timers to use timer_setup()</title>
<updated>2017-11-01T18:27:07+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-08-18T23:53:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=738c6ec546aaba5acde6a4e1d2b7f56aec249d62'/>
<id>738c6ec546aaba5acde6a4e1d2b7f56aec249d62</id>
<content type='text'>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Brian King &lt;brking@us.ibm.com&gt;
Cc: "James E.J. Bottomley" &lt;jejb@linux.vnet.ibm.com&gt;
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Brian King &lt;brking@us.ibm.com&gt;
Cc: "James E.J. Bottomley" &lt;jejb@linux.vnet.ibm.com&gt;
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>scsi: ipr: Set no_report_opcodes for RAID arrays</title>
<updated>2017-08-23T02:23:36+00:00</updated>
<author>
<name>Brian King</name>
<email>brking@linux.vnet.ibm.com</email>
</author>
<published>2017-08-18T21:17:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=723cd772fde2344a9810eeaf5106787d535ec4a4'/>
<id>723cd772fde2344a9810eeaf5106787d535ec4a4</id>
<content type='text'>
Since ipr RAID arrays do not support the MAINTENANCE_IN /
MI_REPORT_SUPPORTED_OPERATION_CODES, set no_report_opcodes to prevent it
from being sent.

Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since ipr RAID arrays do not support the MAINTENANCE_IN /
MI_REPORT_SUPPORTED_OPERATION_CODES, set no_report_opcodes to prevent it
from being sent.

Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>scsi: ipr: Fix scsi-mq lockdep issue</title>
<updated>2017-08-08T15:49:51+00:00</updated>
<author>
<name>Brian King</name>
<email>brking@linux.vnet.ibm.com</email>
</author>
<published>2017-08-01T15:21:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b0e17a9b0df29590c45dfb296f541270a5941f41'/>
<id>b0e17a9b0df29590c45dfb296f541270a5941f41</id>
<content type='text'>
Fixes the following lockdep warning that can occur when scsi-mq is
enabled with ipr due to ipr calling scsi_unblock_requests from irq
context. The fix is to move the call to scsi_unblock_requests to ipr's
existing workqueue.

stack backtrace:
CPU: 28 PID: 0 Comm: swapper/28 Not tainted 4.13.0-rc2-gcc6x-gf74c89b #1
Call Trace:
[c000001fffe97550] [c000000000b50818] dump_stack+0xe8/0x160 (unreliable)
[c000001fffe97590] [c0000000001586d0] print_usage_bug+0x2d0/0x390
[c000001fffe97640] [c000000000158f34] mark_lock+0x7a4/0x8e0
[c000001fffe976f0] [c00000000015a000] __lock_acquire+0x6a0/0x1a70
[c000001fffe97860] [c00000000015befc] lock_acquire+0xec/0x2e0
[c000001fffe97930] [c000000000b71514] _raw_spin_lock+0x44/0x70
[c000001fffe97960] [c0000000005b60f4] blk_mq_sched_dispatch_requests+0xa4/0x2a0
[c000001fffe979c0] [c0000000005acac0] __blk_mq_run_hw_queue+0x100/0x2c0
[c000001fffe97a00] [c0000000005ad478] __blk_mq_delay_run_hw_queue+0x118/0x130
[c000001fffe97a40] [c0000000005ad61c] blk_mq_start_hw_queues+0x6c/0xa0
[c000001fffe97a80] [c000000000797aac] scsi_kick_queue+0x2c/0x60
[c000001fffe97aa0] [c000000000797cf0] scsi_run_queue+0x210/0x360
[c000001fffe97b10] [c00000000079b888] scsi_run_host_queues+0x48/0x80
[c000001fffe97b40] [c0000000007b6090] ipr_ioa_bringdown_done+0x70/0x1e0
[c000001fffe97bc0] [c0000000007bc860] ipr_reset_ioa_job+0x80/0xf0
[c000001fffe97bf0] [c0000000007b4d50] ipr_reset_timer_done+0xd0/0x100
[c000001fffe97c30] [c0000000001937bc] call_timer_fn+0xdc/0x4b0
[c000001fffe97cf0] [c000000000193d08] expire_timers+0x178/0x330
[c000001fffe97d60] [c0000000001940c8] run_timer_softirq+0xb8/0x120
[c000001fffe97de0] [c000000000b726a8] __do_softirq+0x168/0x6d8
[c000001fffe97ef0] [c0000000000df2c8] irq_exit+0x108/0x150
[c000001fffe97f10] [c000000000017bf4] __do_irq+0x2a4/0x4a0
[c000001fffe97f90] [c00000000002da50] call_do_irq+0x14/0x24
[c0000007fad93aa0] [c000000000017e8c] do_IRQ+0x9c/0x140
[c0000007fad93af0] [c000000000008b98] hardware_interrupt_common+0x138/0x140

Reported-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes the following lockdep warning that can occur when scsi-mq is
enabled with ipr due to ipr calling scsi_unblock_requests from irq
context. The fix is to move the call to scsi_unblock_requests to ipr's
existing workqueue.

stack backtrace:
CPU: 28 PID: 0 Comm: swapper/28 Not tainted 4.13.0-rc2-gcc6x-gf74c89b #1
Call Trace:
[c000001fffe97550] [c000000000b50818] dump_stack+0xe8/0x160 (unreliable)
[c000001fffe97590] [c0000000001586d0] print_usage_bug+0x2d0/0x390
[c000001fffe97640] [c000000000158f34] mark_lock+0x7a4/0x8e0
[c000001fffe976f0] [c00000000015a000] __lock_acquire+0x6a0/0x1a70
[c000001fffe97860] [c00000000015befc] lock_acquire+0xec/0x2e0
[c000001fffe97930] [c000000000b71514] _raw_spin_lock+0x44/0x70
[c000001fffe97960] [c0000000005b60f4] blk_mq_sched_dispatch_requests+0xa4/0x2a0
[c000001fffe979c0] [c0000000005acac0] __blk_mq_run_hw_queue+0x100/0x2c0
[c000001fffe97a00] [c0000000005ad478] __blk_mq_delay_run_hw_queue+0x118/0x130
[c000001fffe97a40] [c0000000005ad61c] blk_mq_start_hw_queues+0x6c/0xa0
[c000001fffe97a80] [c000000000797aac] scsi_kick_queue+0x2c/0x60
[c000001fffe97aa0] [c000000000797cf0] scsi_run_queue+0x210/0x360
[c000001fffe97b10] [c00000000079b888] scsi_run_host_queues+0x48/0x80
[c000001fffe97b40] [c0000000007b6090] ipr_ioa_bringdown_done+0x70/0x1e0
[c000001fffe97bc0] [c0000000007bc860] ipr_reset_ioa_job+0x80/0xf0
[c000001fffe97bf0] [c0000000007b4d50] ipr_reset_timer_done+0xd0/0x100
[c000001fffe97c30] [c0000000001937bc] call_timer_fn+0xdc/0x4b0
[c000001fffe97cf0] [c000000000193d08] expire_timers+0x178/0x330
[c000001fffe97d60] [c0000000001940c8] run_timer_softirq+0xb8/0x120
[c000001fffe97de0] [c000000000b726a8] __do_softirq+0x168/0x6d8
[c000001fffe97ef0] [c0000000000df2c8] irq_exit+0x108/0x150
[c000001fffe97f10] [c000000000017bf4] __do_irq+0x2a4/0x4a0
[c000001fffe97f90] [c00000000002da50] call_do_irq+0x14/0x24
[c0000007fad93aa0] [c000000000017e8c] do_IRQ+0x9c/0x140
[c0000007fad93af0] [c000000000008b98] hardware_interrupt_common+0x138/0x140

Reported-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi</title>
<updated>2017-05-04T19:19:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-05-04T19:19:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8d5e72dfdf0fa29a21143fd72746c6f43295ce9f'/>
<id>8d5e72dfdf0fa29a21143fd72746c6f43295ce9f</id>
<content type='text'>
Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates
  (hisi_sas, ufs, fnic, cxlflash, be2iscsi, ipr, stex). There's also the
  usual amount of cosmetic and spelling stuff"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (155 commits)
  scsi: qla4xxx: fix spelling mistake: "Tempalate" -&gt; "Template"
  scsi: stex: make S6flag static
  scsi: mac_esp: fix to pass correct device identity to free_irq()
  scsi: aacraid: pci_alloc_consistent() failures on ARM64
  scsi: ufs: make ufshcd_get_lists_status() register operation obvious
  scsi: ufs: use MASK_EE_STATUS
  scsi: mac_esp: Replace bogus memory barrier with spinlock
  scsi: fcoe: make fcoe_e_d_tov and fcoe_r_a_tov static
  scsi: sd_zbc: Do not write lock zones for reset
  scsi: sd_zbc: Remove superfluous assignments
  scsi: sd: sd_zbc: Rename sd_zbc_setup_write_cmnd
  scsi: Improve scsi_get_sense_info_fld
  scsi: sd: Cleanup sd_done sense data handling
  scsi: sd: Improve sd_completed_bytes
  scsi: sd: Fix function descriptions
  scsi: mpt3sas: remove redundant wmb
  scsi: mpt: Move scsi_remove_host() out of mptscsih_remove_host()
  scsi: sg: reset 'res_in_use' after unlinking reserved array
  scsi: mvumi: remove code handling zero scsi_sg_count(scmd) case
  scsi: fusion: fix spelling mistake: "Persistancy" -&gt; "Persistency"
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates
  (hisi_sas, ufs, fnic, cxlflash, be2iscsi, ipr, stex). There's also the
  usual amount of cosmetic and spelling stuff"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (155 commits)
  scsi: qla4xxx: fix spelling mistake: "Tempalate" -&gt; "Template"
  scsi: stex: make S6flag static
  scsi: mac_esp: fix to pass correct device identity to free_irq()
  scsi: aacraid: pci_alloc_consistent() failures on ARM64
  scsi: ufs: make ufshcd_get_lists_status() register operation obvious
  scsi: ufs: use MASK_EE_STATUS
  scsi: mac_esp: Replace bogus memory barrier with spinlock
  scsi: fcoe: make fcoe_e_d_tov and fcoe_r_a_tov static
  scsi: sd_zbc: Do not write lock zones for reset
  scsi: sd_zbc: Remove superfluous assignments
  scsi: sd: sd_zbc: Rename sd_zbc_setup_write_cmnd
  scsi: Improve scsi_get_sense_info_fld
  scsi: sd: Cleanup sd_done sense data handling
  scsi: sd: Improve sd_completed_bytes
  scsi: sd: Fix function descriptions
  scsi: mpt3sas: remove redundant wmb
  scsi: mpt: Move scsi_remove_host() out of mptscsih_remove_host()
  scsi: sg: reset 'res_in_use' after unlinking reserved array
  scsi: mvumi: remove code handling zero scsi_sg_count(scmd) case
  scsi: fusion: fix spelling mistake: "Persistancy" -&gt; "Persistency"
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixes</title>
<updated>2017-04-12T14:29:17+00:00</updated>
<author>
<name>James Bottomley</name>
<email>James.Bottomley@HansenPartnership.com</email>
</author>
<published>2017-04-12T14:29:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0e1bfea999daa27c801b19617a6ef8b8ec4adc75'/>
<id>0e1bfea999daa27c801b19617a6ef8b8ec4adc75</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION</title>
<updated>2017-04-12T01:40:19+00:00</updated>
<author>
<name>Mauricio Faria de Oliveira</name>
<email>mauricfo@linux.vnet.ibm.com</email>
</author>
<published>2017-04-11T14:46:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=785a470496d8e0a32e3d39f376984eb2c98ca5b3'/>
<id>785a470496d8e0a32e3d39f376984eb2c98ca5b3</id>
<content type='text'>
On a dual controller setup with multipath enabled, some MEDIUM ERRORs
caused both paths to be failed, thus I/O got queued/blocked since the
'queue_if_no_path' feature is enabled by default on IPR controllers.

This example disabled 'queue_if_no_path' so the I/O failure is seen at
the sg_dd program.  Notice that after the sg_dd test-case, both paths
are in 'failed' state, and both path/priority groups are in 'enabled'
state (not 'active') -- which would block I/O with 'queue_if_no_path'.

    # sg_dd if=/dev/dm-2 bs=4096 count=1 dio=1 verbose=4 blk_sgio=0
    &lt;...&gt;
    read(unix): count=4096, res=-1
    sg_dd: reading, skip=0 : Input/output error
    &lt;...&gt;

    # dmesg
    [...] sd 2:2:16:0: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [...] sd 2:2:16:0: [sds] Sense Key : Medium Error [current]
    [...] sd 2:2:16:0: [sds] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] sd 2:2:16:0: [sds] CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
    [...] blk_update_request: I/O error, dev sds, sector 0
    [...] device-mapper: multipath: Failing path 65:32.
    &lt;...&gt;
    [...] device-mapper: multipath: Failing path 65:224.

    # multipath -l
    1IBM_IPR-0_59C2AE0000001F80 dm-2 IBM     ,IPR-0   59C2AE00
    size=5.2T features='0' hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | `- 2:2:16:0 sds  65:32  failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      `- 1:2:7:0  sdae 65:224 failed undef running

This is not the desired behavior. The dm-multipath explicitly checks
for the MEDIUM ERROR case (and a few others) so not to fail the path
(e.g., I/O to other sectors could potentially happen without problems).
See dm-mpath.c :: do_end_io_bio() -&gt; noretry_error() !-&gt;! fail_path().

The problem trace is:

1) ipr_scsi_done()  // SENSE KEY/CHECK CONDITION detected, go to..
2) ipr_erp_start()  // ipr_is_gscsi() and masked_ioasc OK, go to..
3) ipr_gen_sense()  // masked_ioasc is IPR_IOASC_MED_DO_NOT_REALLOC,
                    // so set DID_PASSTHROUGH.

4) scsi_decide_disposition()  // check for DID_PASSTHROUGH and return
                              // early on, faking a DID_OK.. *instead*
                              // of reaching scsi_check_sense().

                              // Had it reached the latter, that would
                              // set host_byte to DID_MEDIUM_ERROR.

5) scsi_finish_command()
6) scsi_io_completion()
7) __scsi_error_from_host_byte()  // That would be converted to -ENODATA
&lt;...&gt;
8) dm_softirq_done()
9) multipath_end_io()
10) do_end_io()
11) noretry_error()  // And that is checked in dm-mpath :: noretry_error()
                     // which would cause fail_path() not to be called.

With this patch applied, the I/O is failed but the paths are not.  This
multipath device continues accepting more I/O requests without blocking.
(and notice the different host byte/driver byte handling per SCSI layer).

    # dmesg
    [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
    [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 40 00
    [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
    [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] blk_update_request: critical medium error, dev sdaf, sector 0
    [...] blk_update_request: critical medium error, dev dm-6, sector 0
    [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
    [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 10 00
    [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
    [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] blk_update_request: critical medium error, dev sdaf, sector 0
    [...] blk_update_request: critical medium error, dev dm-6, sector 0
    [...] Buffer I/O error on dev dm-6, logical block 0, async page read

    # multipath -l 1IBM_IPR-0_59C2AE0000001F80
    1IBM_IPR-0_59C2AE0000001F80 dm-6 IBM     ,IPR-0   59C2AE00
    size=5.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | `- 2:2:7:0  sdaf 65:240 active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      `- 1:2:7:0  sdh  8:112  active undef running

Signed-off-by: Mauricio Faria de Oliveira &lt;mauricfo@linux.vnet.ibm.com&gt;
Acked-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On a dual controller setup with multipath enabled, some MEDIUM ERRORs
caused both paths to be failed, thus I/O got queued/blocked since the
'queue_if_no_path' feature is enabled by default on IPR controllers.

This example disabled 'queue_if_no_path' so the I/O failure is seen at
the sg_dd program.  Notice that after the sg_dd test-case, both paths
are in 'failed' state, and both path/priority groups are in 'enabled'
state (not 'active') -- which would block I/O with 'queue_if_no_path'.

    # sg_dd if=/dev/dm-2 bs=4096 count=1 dio=1 verbose=4 blk_sgio=0
    &lt;...&gt;
    read(unix): count=4096, res=-1
    sg_dd: reading, skip=0 : Input/output error
    &lt;...&gt;

    # dmesg
    [...] sd 2:2:16:0: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [...] sd 2:2:16:0: [sds] Sense Key : Medium Error [current]
    [...] sd 2:2:16:0: [sds] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] sd 2:2:16:0: [sds] CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
    [...] blk_update_request: I/O error, dev sds, sector 0
    [...] device-mapper: multipath: Failing path 65:32.
    &lt;...&gt;
    [...] device-mapper: multipath: Failing path 65:224.

    # multipath -l
    1IBM_IPR-0_59C2AE0000001F80 dm-2 IBM     ,IPR-0   59C2AE00
    size=5.2T features='0' hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | `- 2:2:16:0 sds  65:32  failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      `- 1:2:7:0  sdae 65:224 failed undef running

This is not the desired behavior. The dm-multipath explicitly checks
for the MEDIUM ERROR case (and a few others) so not to fail the path
(e.g., I/O to other sectors could potentially happen without problems).
See dm-mpath.c :: do_end_io_bio() -&gt; noretry_error() !-&gt;! fail_path().

The problem trace is:

1) ipr_scsi_done()  // SENSE KEY/CHECK CONDITION detected, go to..
2) ipr_erp_start()  // ipr_is_gscsi() and masked_ioasc OK, go to..
3) ipr_gen_sense()  // masked_ioasc is IPR_IOASC_MED_DO_NOT_REALLOC,
                    // so set DID_PASSTHROUGH.

4) scsi_decide_disposition()  // check for DID_PASSTHROUGH and return
                              // early on, faking a DID_OK.. *instead*
                              // of reaching scsi_check_sense().

                              // Had it reached the latter, that would
                              // set host_byte to DID_MEDIUM_ERROR.

5) scsi_finish_command()
6) scsi_io_completion()
7) __scsi_error_from_host_byte()  // That would be converted to -ENODATA
&lt;...&gt;
8) dm_softirq_done()
9) multipath_end_io()
10) do_end_io()
11) noretry_error()  // And that is checked in dm-mpath :: noretry_error()
                     // which would cause fail_path() not to be called.

With this patch applied, the I/O is failed but the paths are not.  This
multipath device continues accepting more I/O requests without blocking.
(and notice the different host byte/driver byte handling per SCSI layer).

    # dmesg
    [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
    [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 40 00
    [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
    [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] blk_update_request: critical medium error, dev sdaf, sector 0
    [...] blk_update_request: critical medium error, dev dm-6, sector 0
    [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
    [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 10 00
    [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
    [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] blk_update_request: critical medium error, dev sdaf, sector 0
    [...] blk_update_request: critical medium error, dev dm-6, sector 0
    [...] Buffer I/O error on dev dm-6, logical block 0, async page read

    # multipath -l 1IBM_IPR-0_59C2AE0000001F80
    1IBM_IPR-0_59C2AE0000001F80 dm-6 IBM     ,IPR-0   59C2AE00
    size=5.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | `- 2:2:7:0  sdaf 65:240 active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      `- 1:2:7:0  sdh  8:112  active undef running

Signed-off-by: Mauricio Faria de Oliveira &lt;mauricfo@linux.vnet.ibm.com&gt;
Acked-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>scsi: ipr: Fix SATA EH hang</title>
<updated>2017-03-23T16:04:05+00:00</updated>
<author>
<name>Brian King</name>
<email>brking@linux.vnet.ibm.com</email>
</author>
<published>2017-03-15T21:58:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ef97d8ae12eb20d81bea5d16a0c1e33dcbe44492'/>
<id>ef97d8ae12eb20d81bea5d16a0c1e33dcbe44492</id>
<content type='text'>
This patch fixes a hang that can occur in ATA EH with ipr. With ipr's
usage of libata, commands should never end up on ap-&gt;eh_done_q. The
timeout function we use for ipr, even for SATA devices, is
scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH
is driven completely by ipr and SCSI. The SCSI EH thread ends up calling
ipr's eh_device_reset_handler, which then calls
ata_std_error_handler. This ends up calling ipr_sata_reset, which issues
a reset to the device. This should result in all pending commands
getting failed back and having ata_qc_complete called for them, which
should end up clearing ATA_QCFLAG_FAILED as qc-&gt;flags gets zeroed in
ata_qc_free.  This ensures that when we end up in ata_eh_finish, we
don't do anything more with the command.

On adapters that only support a single interrupt and when running with
two MSI-X vectors or less, the adapter firmware guarantees that
responses to all outstanding commands are sent back prior to sending the
response to the SATA reset command.  On newer adapters supporting
multiple HRRQs, however, this can no longer be guaranteed, since the
command responses and reset response may be processed on different
HRRQs.

If ipr returns from ipr_sata_reset before the outstanding command was
returned, this sends us down the path of __ata_eh_qc_complete which then
moves the associated scsi_cmd from the work_q in
scsi_eh_bus_device_reset to ap-&gt;eh_done_q, which then will sit there
forever and we will be wedged.

This patch fixes this up by ensuring that any outstanding commands are
flushed before returning from eh_device_reset_handler for a SATA device.

Reported-by: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Reviewed-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Tested-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch fixes a hang that can occur in ATA EH with ipr. With ipr's
usage of libata, commands should never end up on ap-&gt;eh_done_q. The
timeout function we use for ipr, even for SATA devices, is
scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH
is driven completely by ipr and SCSI. The SCSI EH thread ends up calling
ipr's eh_device_reset_handler, which then calls
ata_std_error_handler. This ends up calling ipr_sata_reset, which issues
a reset to the device. This should result in all pending commands
getting failed back and having ata_qc_complete called for them, which
should end up clearing ATA_QCFLAG_FAILED as qc-&gt;flags gets zeroed in
ata_qc_free.  This ensures that when we end up in ata_eh_finish, we
don't do anything more with the command.

On adapters that only support a single interrupt and when running with
two MSI-X vectors or less, the adapter firmware guarantees that
responses to all outstanding commands are sent back prior to sending the
response to the SATA reset command.  On newer adapters supporting
multiple HRRQs, however, this can no longer be guaranteed, since the
command responses and reset response may be processed on different
HRRQs.

If ipr returns from ipr_sata_reset before the outstanding command was
returned, this sends us down the path of __ata_eh_qc_complete which then
moves the associated scsi_cmd from the work_q in
scsi_eh_bus_device_reset to ap-&gt;eh_done_q, which then will sit there
forever and we will be wedged.

This patch fixes this up by ensuring that any outstanding commands are
flushed before returning from eh_device_reset_handler for a SATA device.

Reported-by: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Reviewed-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Tested-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>scsi: ipr: Error path locking fixes</title>
<updated>2017-03-23T16:04:05+00:00</updated>
<author>
<name>Brian King</name>
<email>brking@linux.vnet.ibm.com</email>
</author>
<published>2017-03-15T21:58:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f646f325a829d73abc708088d2b67cd11b775fdf'/>
<id>f646f325a829d73abc708088d2b67cd11b775fdf</id>
<content type='text'>
This patch closes up some potential race conditions observed in the
error handling paths in ipr while debugging an issue resulting in a hang
with SATA error handling. These patches ensure we are holding the
correct lock when adding and removing commands from the free and pending
queues in some error scenarios.

Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Reviewed-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Tested-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch closes up some potential race conditions observed in the
error handling paths in ipr while debugging an issue resulting in a hang
with SATA error handling. These patches ensure we are holding the
correct lock when adding and removing commands from the free and pending
queues in some error scenarios.

Signed-off-by: Brian King &lt;brking@linux.vnet.ibm.com&gt;
Reviewed-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Tested-by: Wendy Xiong &lt;wenxiong@linux.vnet.ibm.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
