[SCSI] scsi_lib: pause between error retries

During cable pull tests on our 16G FC adapter, we are seeing errors, typically reads to close targets, which fail due to CRC or framing errors caused by the cable being pull (return status DID_ERROR). The adapter detects the error on one of the first frames received, marks the FC exchange as dead (further frames go to bit bucket) and signals the host of the error. This action is so quick, and coupled with fast host CPUs, creates a scenario in which the midlayer sees the failure and retries the io almost immediately. We've seen link traces with the retry on the link while the original i/o is still being processed by the target. We're also seeing the time window for the "link to pull-apart" and the physical interface to report disconnected to be in the few millisecond range. Which means, we're encountering scenarios where the full retry count is exhausted (all with error) by the midlayer before the link disconnect state is detected. We looked at 8G FC behavior and occasionally see the same behavior, but as the link was slower, it rarely could exhaust all retries before the link reported disconnect. What is needed is a slight delay between io retries due to DID_ERROR to cover this error. It is inappropriate to put this delay in the driver, as the error is indistinguishable from other link-related errors, nor does the driver track whether the io is a retry or not. This is also easier than tracking between-io-error bursts that are seen in this scenario. The patch below updates the retry path so that it inserts a delay as if the target was busy. The busy delay is on the order of 6ms. This delay is sufficient to ensure the link down condition is reported before the retry count is exhausted (at most 1 retry is seen). Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
author: James Smart <james.smart@emulex.com> 2011-07-06 18:45:17 +0200
committer: James Bottomley <JBottomley@Parallels.com> 2011-07-27 12:06:01 +0200
commit: 573e5913536a1393362265cfea9e708aa10fdf16 (patch)
tree: 35973a3bc15593ad75b69c2fc7154e7ac27ca39c /drivers/scsi/scsi_lib.c
parent: [SCSI] mpt2sas: WarpDrive Infinite command retries due to wrong scsi command ... (diff)
download: linux-573e5913536a1393362265cfea9e708aa10fdf16.tar.xz
linux-573e5913536a1393362265cfea9e708aa10fdf16.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 28d9c9d6b4b4..fc3f168decb4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -137,6 +137,7 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 		host->host_blocked = host->max_host_blocked;
 		break;
 	case SCSI_MLQUEUE_DEVICE_BUSY:
+	case SCSI_MLQUEUE_EH_RETRY:
 		device->device_blocked = device->max_device_blocked;
 		break;
 	case SCSI_MLQUEUE_TARGET_BUSY:
author	James Smart <james.smart@emulex.com>	2011-07-06 18:45:17 +0200
committer	James Bottomley <JBottomley@Parallels.com>	2011-07-27 12:06:01 +0200
commit	573e5913536a1393362265cfea9e708aa10fdf16 (patch)
tree	35973a3bc15593ad75b69c2fc7154e7ac27ca39c /drivers/scsi/scsi_lib.c
parent	[SCSI] mpt2sas: WarpDrive Infinite command retries due to wrong scsi command ... (diff)
download	linux-573e5913536a1393362265cfea9e708aa10fdf16.tar.xz linux-573e5913536a1393362265cfea9e708aa10fdf16.zip