From c06ba7b892a50b48522ad441a40053f483dfee9e Mon Sep 17 00:00:00 2001 From: Janne Grunau Date: Tue, 17 Jan 2023 19:25:00 +0100 Subject: nvme-apple: reset controller during shutdown This is a functional revert of c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable"). The commit broke suspend/resume since apple_nvme_reset_work() tries to disable the controller on resume. This does not work for the apple NVMe controller since register access only works while the co-processor firmware is running. Disabling the NVMe controller in the shutdown path is also required for shutting the co-processor down. The original code was appropriate for this hardware. Add a comment to prevent a similar breaking changes in the future. Fixes: c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable") Reported-by: Janne Grunau Link: https://lore.kernel.org/all/20230110174745.GA3576@jannau.net/ Signed-off-by: Janne Grunau [hch: updated with a more descriptive comment from Hector Martin] Signed-off-by: Christoph Hellwig --- drivers/nvme/host/apple.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'drivers/nvme/host') diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c index bf1c60edb7f9..146c9e63ce77 100644 --- a/drivers/nvme/host/apple.c +++ b/drivers/nvme/host/apple.c @@ -829,7 +829,23 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown) apple_nvme_remove_cq(anv); } - nvme_disable_ctrl(&anv->ctrl, shutdown); + /* + * Always disable the NVMe controller after shutdown. + * We need to do this to bring it back up later anyway, and we + * can't do it while the firmware is not running (e.g. in the + * resume reset path before RTKit is initialized), so for Apple + * controllers it makes sense to unconditionally do it here. + * Additionally, this sequence of events is reliable, while + * others (like disabling after bringing back the firmware on + * resume) seem to run into trouble under some circumstances. + * + * Both U-Boot and m1n1 also use this convention (i.e. an ANS + * NVMe controller is handed off with firmware shut down, in an + * NVMe disabled state, after a clean shutdown). + */ + if (shutdown) + nvme_disable_ctrl(&anv->ctrl, shutdown); + nvme_disable_ctrl(&anv->ctrl, false); } WRITE_ONCE(anv->ioq.enabled, false); -- cgit v1.2.3 From c0a4a1eafbd48e02829045bba3e6163c03037276 Mon Sep 17 00:00:00 2001 From: Janne Grunau Date: Tue, 17 Jan 2023 19:25:01 +0100 Subject: nvme-apple: only reset the controller when RTKit is running NVMe controller register access hangs indefinitely when the co-processor is not running. A missed reset is preferable over a hanging thread since it could be recoverable. Signed-off-by: Janne Grunau Signed-off-by: Christoph Hellwig --- drivers/nvme/host/apple.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'drivers/nvme/host') diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c index 146c9e63ce77..b317ce6c4ec3 100644 --- a/drivers/nvme/host/apple.c +++ b/drivers/nvme/host/apple.c @@ -1001,11 +1001,11 @@ static void apple_nvme_reset_work(struct work_struct *work) goto out; } - if (anv->ctrl.ctrl_config & NVME_CC_ENABLE) - apple_nvme_disable(anv, false); - /* RTKit must be shut down cleanly for the (soft)-reset to work */ if (apple_rtkit_is_running(anv->rtk)) { + /* reset the controller if it is enabled */ + if (anv->ctrl.ctrl_config & NVME_CC_ENABLE) + apple_nvme_disable(anv, false); dev_dbg(anv->dev, "Trying to shut down RTKit before reset."); ret = apple_rtkit_shutdown(anv->rtk); if (ret) -- cgit v1.2.3 From 1c5842085851f786eba24a39ecd02650ad892064 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Wed, 18 Jan 2023 08:44:16 -0800 Subject: nvme-pci: fix timeout request state check Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT. This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides. Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch Signed-off-by: Christoph Hellwig --- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/nvme/host') diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a2553b7d9bb8..1ff8843bc4b3 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1362,7 +1362,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req) else nvme_poll_irqdisable(nvmeq); - if (blk_mq_request_completed(req)) { + if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) { dev_warn(dev->ctrl.device, "I/O %d QID %d timeout, completion polled\n", req->tag, nvmeq->qid); -- cgit v1.2.3