diff options
author | Sagi Grimberg <sagi@grimberg.me> | 2019-08-30 20:00:59 +0200 |
---|---|---|
committer | Sagi Grimberg <sagi@grimberg.me> | 2019-09-12 17:50:45 +0200 |
commit | 205da24343013e0bd62475800df79cd053f22326 (patch) | |
tree | 41207ad7d7fa959f7cb7a402fe4bc6dd11f2cc51 /drivers/nvme | |
parent | nvme: make nvme_report_ns_ids propagate error back (diff) | |
download | linux-205da24343013e0bd62475800df79cd053f22326.tar.xz linux-205da24343013e0bd62475800df79cd053f22326.zip |
nvme: fix ns removal hang when failing to revalidate due to a transient error
If a controller reset is racing with a namespace revalidation, the
revalidation (admin) I/O will surely fail, but we should not remove the
namespace as we will execute the I/O when the controller is back up.
Same for spurious allocation errors (return -ENOMEM).
Fix this by checking the specific error code in nvme_revalidate_disk and
if it is a transient error (for example non DNR nvme statuses or
a negative ENOMEM as allocation failure), do not remove the namespace as
it will either recover when the controller is back up and schedule
a subsequent scan, or the controller is going away and the namespaces
will be removed anyways.
This fixes a hang namespace scanning racing with a controller reset and
also sporious I/O errors in path failover coditions where the
controller reset is racing with the namespace scan work with multipath
enabled.
Reported-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Diffstat (limited to 'drivers/nvme')
-rw-r--r-- | drivers/nvme/host/core.c | 8 |
1 files changed, 7 insertions, 1 deletions
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f15a77dd3115..fad04282148d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1765,7 +1765,13 @@ static int nvme_revalidate_disk(struct gendisk *disk) free_id: kfree(id); out: - if (ret > 0) + /* + * Only fail the function if we got a fatal error back from the + * device, otherwise ignore the error and just move on. + */ + if (ret == -ENOMEM || (ret > 0 && !(ret & NVME_SC_DNR))) + ret = 0; + else if (ret > 0) ret = blk_status_to_errno(nvme_error_status(ret)); return ret; } |