summaryrefslogtreecommitdiffstats
path: root/drivers/nvme (follow)
Commit message (Collapse)AuthorAgeFilesLines
* nvme: relax APST default max latency to 100msKai-Heng Feng2017-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Christoph Hellwig suggests we should to make APST work out of the box. Hence relax the the default max latency to make them able to enter deepest power state on default. Here are id-ctrl excerpts from two high latency NVMes: vid : 0x14a4 ssvid : 0x1b4b mn : CX2-GB1024-Q11 NVMe LITEON 1024GB ps 3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- vid : 0x15b7 ssvid : 0x1b4b mn : A400 NVMe SanDisk 512GB ps 3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: only consider exit latency when choosing useful non-op power statesKai-Heng Feng2017-06-071-6/+9
| | | | | | | | | | | | | When a NVMe is in non-op states, the latency is exlat. The latency will be enlat + exlat only when the NVMe tries to transit from operational state right atfer it begins to transit to non-operational state, which should be a rare case. Therefore, as Andy Lutomirski suggests, use exlat only when deciding power states to trainsit to. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-fc: fix missing put reference on controller create failureJames Smart2017-06-071-0/+1
| | | | | | | | | The failure case, of a create controller request, called nvme_uninit_ctrl() but didn't do a put to allow the nvme controller to be deleted. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-fc: on lldd/transport io error, terminate associationJames Smart2017-06-071-2/+17
| | | | | | | | | | | | | | | | Per FC-NVME, when lldd or transport detects an i/o error, the connection must be terminated, which in turn requires the association to be termianted. Currently the transport simply creates a nvme completion status of transport error and returns the io. The FC-NVME spec makes the mandate as initiator and host, depending on the error, can get out of sync on outstanding io counts (sqhd/sqtail). Implement the association teardown on lldd or transport detected errors. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
* nvme-rdma: fast fail incoming requests while we reconnectSagi Grimberg2017-06-071-15/+29
| | | | | | | | | | | | | | | | | | | When we encounter an transport/controller errors, error recovery kicks in which performs: 1. stops io/admin queues 2. moves transport queues out of LIVE state 3. fast fail pending io 4. schedule periodic reconnects. But we also need to fast fail incoming IO taht enters after we already scheduled. Given that our queue is not LIVE anymore, simply restart the request queues to fail in .queue_rq Reported-by: Alex Turin <alex@vastdata.com> Reported-by: shahar.salzman <shahar.salzman@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org
* nvme-pci: fix multiple ctrl removal schedulingRakesh Pandit2017-06-071-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c5f6ce97c1210 tries to address multiple resets but fails as work_busy doesn't involve any synchronization and can fail. This is reproducible easily as can be seen by WARNING below which is triggered with line: WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING) Allowing multiple resets can result in multiple controller removal as well if different conditions inside nvme_reset_work fail and which might deadlock on device_release_driver. [ 480.327007] WARNING: CPU: 3 PID: 150 at drivers/nvme/host/pci.c:1900 nvme_reset_work+0x36c/0xec0 [ 480.327008] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast... [ 480.327044] btusb videobuf2_core ghash_clmulni_intel snd_hwdep cfg80211 acer_wmi hci_uart.. [ 480.327065] CPU: 3 PID: 150 Comm: kworker/u16:2 Not tainted 4.12.0-rc1+ #13 [ 480.327065] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016 [ 480.327066] Workqueue: nvme nvme_reset_work [ 480.327067] task: ffff880498ad8000 task.stack: ffffc90002218000 [ 480.327068] RIP: 0010:nvme_reset_work+0x36c/0xec0 [ 480.327069] RSP: 0018:ffffc9000221bdb8 EFLAGS: 00010246 [ 480.327070] RAX: 0000000000460000 RBX: ffff880498a98128 RCX: dead000000000200 [ 480.327070] RDX: 0000000000000001 RSI: ffff8804b1028020 RDI: ffff880498a98128 [ 480.327071] RBP: ffffc9000221be50 R08: 0000000000000000 R09: 0000000000000000 [ 480.327071] R10: ffffc90001963ce8 R11: 000000000000020d R12: ffff880498a98000 [ 480.327072] R13: ffff880498a53500 R14: ffff880498a98130 R15: ffff880498a98128 [ 480.327072] FS: 0000000000000000(0000) GS:ffff8804c1cc0000(0000) knlGS:0000000000000000 [ 480.327073] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 480.327074] CR2: 00007ffcf3c37f78 CR3: 0000000001e09000 CR4: 00000000003406e0 [ 480.327074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 480.327075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 480.327075] Call Trace: [ 480.327079] ? __switch_to+0x227/0x400 [ 480.327081] process_one_work+0x18c/0x3a0 [ 480.327082] worker_thread+0x4e/0x3b0 [ 480.327084] kthread+0x109/0x140 [ 480.327085] ? process_one_work+0x3a0/0x3a0 [ 480.327087] ? kthread_park+0x60/0x60 [ 480.327102] ret_from_fork+0x2c/0x40 [ 480.327103] Code: e8 5a dc ff ff 85 c0 41 89 c1 0f..... This patch addresses the problem by using state of controller to decide whether reset should be queued or not as state change is synchronizated using controller spinlock. Also cancel_work_sync is used to make sure remove cancels the reset_work and waits for it to finish. This patch also changes return value from -ENODEV to more appropriate -EBUSY if nvme_reset fails to change state. Fixes: c5f6ce97c1210 ("nvme: don't schedule multiple resets") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: fix hang in remove pathMing Lei2017-06-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to start admin queues too in nvme_kill_queues() for avoiding hang in remove path[1]. This patch is very similar with 806f026f9b901eaf(nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()). [1] hang stack trace [<ffffffff813c9716>] blk_execute_rq+0x56/0x80 [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0 [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90 [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200 [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50 [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0 [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0 [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60 [<ffffffff8156d951>] device_del+0x101/0x320 [<ffffffff8156db8a>] device_unregister+0x1a/0x60 [<ffffffff8156dc4c>] device_destroy+0x3c/0x50 [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0 [<ffffffff815d4858>] nvme_remove+0x78/0x110 [<ffffffff81452b69>] pci_device_remove+0x39/0xb0 [<ffffffff81572935>] device_release_driver_internal+0x155/0x210 [<ffffffff81572a02>] device_release_driver+0x12/0x20 [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70 [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0 [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0 [<ffffffff810c5ac9>] kthread+0x109/0x140 [<ffffffff8185800c>] ret_from_fork+0x2c/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions") Reported-by: Rakesh Pandit <rakesh@tuxera.com> Tested-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: Quirk APST on Intel 600P/P3100 devicesAndy Lutomirski2017-05-261-0/+2
| | | | | | | | | | | | They have known firmware bugs. A fix is apparently in the works -- once fixed firmware is available, someone from Intel (Hi, Keith!) can adjust the quirk accordingly. Cc: stable@vger.kernel.org # v4.11 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Mario Limonciello <mario_limonciello@dell.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: only setup block integrity if supported by the driverChristoph Hellwig2017-05-263-19/+33
| | | | | | | | | | | | Currently only the PCIe driver supports metadata, so we should not claim integrity support for the other drivers. This prevents nasty crashes with targets that advertise metadata support on fabrics. Also use the opportunity to factor out some code into a separate helper that isn't even compiled if CONFIG_BLK_DEV_INTEGRITY is disabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
* nvme: replace is_flags field in nvme_ctrl_ops with a flags fieldChristoph Hellwig2017-05-265-5/+6
| | | | | | | So that we can have more flags for transport-specific behavior. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
* nvme-pci: consistencly use ctrl->device for loggingChristoph Hellwig2017-05-261-6/+6
| | | | | | | | This is what most of the code already does and gives much more useful prefixes than the device embedded in the pci_dev. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
* nvme_fc: remove extra controller reference taken on reconnectJames Smart2017-05-221-3/+2
| | | | | | | | | | fix extra controller reference taken on reconnect by moving reference to initial controller create Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme_fc: correct nvme status set on abortJames Smart2017-05-221-2/+2
| | | | | | | | | | correct nvme status set on abort. Patch that changed status to being actual nvme status crossed in the night with the patch that added abort values. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme_fc: set logging level on resets/deletesJames Smart2017-05-221-10/+4
| | | | | | | | | | | | | | | | Per the review by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Looked at existing warn vs info vs err dev_xxx levels for the messages printed on reconnects and deletes: - Resets due to error and resets transitioned to deletes are dev_warn - Other reset/disconnect messages are dev_info - Removed chatty io queue related messages Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme_fc: revise comment on teardownJames Smart2017-05-221-4/+4
| | | | | | | | | | | | | | | | Per the recommendation by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html An extra reference was pointed out. There's no issue with the references, but rather a literal interpretation of what the comment is saying. Reword the comment to avoid confusion. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme_fc: Support ctrl_loss_tmoJames Smart2017-05-221-67/+49
| | | | | | | | | | | | | | | | | Sync with Sagi's recent addition of ctrl_loss_tmo in the core fabrics layer. Remove local connect limits and connect_attempts variable. Use fabrics new nr_connects variable and use of nvmf_should_reconnect() Refactor duplicate reconnect failure code. Addresses review comment by Sagi on controller reset support: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme_fc: get rid of local reconnect_delayJames Smart2017-05-221-6/+4
| | | | | | | | | | Remove the local copy of reconnect_delay. Use the value in the controller options directly. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: avoid to use blk_mq_abort_requeue_list()Ming Lei2017-05-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | NVMe may add request into requeue list simply and not kick off the requeue if hw queues are stopped. Then blk_mq_abort_requeue_list() is called in both nvme_kill_queues() and nvme_ns_remove() for dealing with this issue. Unfortunately blk_mq_abort_requeue_list() is absolutely a race maker, for example, one request may be requeued during the aborting. So this patch just calls blk_mq_kick_requeue_list() in nvme_kill_queues() to handle this issue like what nvme_start_queues() does. Now all requests in requeue list when queues are stopped will be handled by blk_mq_kick_requeue_list() when queues are restarted, either in nvme_start_queues() or in nvme_kill_queues(). Cc: stable@vger.kernel.org Reported-by: Zhang Yi <yizhan@redhat.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()Ming Lei2017-05-221-1/+7
| | | | | | | | | | | | | | | | | | | | | | | Inside nvme_kill_queues(), we have to start hw queues for draining requests in sw queues, .dispatch list and requeue list, so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues() which only run queues if queues are stopped, but the queues may have been started already, for example nvme_start_queues() is called in reset work function. blk_mq_start_hw_queues() run hw queues in current context, instead of running asynchronously like before. Given nvme_kill_queues() is run from either remove context or reset worker context, both are fine to run hw queue directly. And the mutex of namespaces_mutex isn't a problem too becasue nvme_start_freeze() runs hw queue in this way already. Cc: stable@vger.kernel.org Reported-by: Zhang Yi <yizhan@redhat.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-rdma: support devices with queue size < 32Marta Rybczynska2017-05-221-4/+14
| | | | | | | | | | | | | | | | | | | In the case of small NVMe-oF queue size (<32) we may enter a deadlock caused by the fact that the IB completions aren't sent waiting for 32 and the send queue will fill up. The error is seen as (using mlx5): [ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273): [ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12 This patch changes the way the signaling is done so that it depends on the queue depth now. The magic define has been removed completely. Cc: stable@vger.kernel.org Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Signed-off-by: Samuel Jones <sjones@kalray.eu> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2017-05-217-5/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A small collection of fixes that should go into this cycle. - a pull request from Christoph for NVMe, which ended up being manually applied to avoid pulling in newer bits in master. Mostly fibre channel fixes from James, but also a few fixes from Jon and Vijay - a pull request from Konrad, with just a single fix for xen-blkback from Gustavo. - a fuseblk bdi fix from Jan, fixing a regression in this series with the dynamic backing devices. - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull(). - a request leak fix for drbd from Lars, fixing a regression in the last series with the kref changes. This will go to stable as well" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: release the sq ref on rdma read errors nvmet-fc: remove target cpu scheduling flag nvme-fc: stop queues on error detection nvme-fc: require target or discovery role for fc-nvme targets nvme-fc: correct port role bits nvme: unmap CMB and remove sysfs file in reset path blktrace: fix integer parse fuseblk: Fix warning in super_setup_bdi_name() block: xen-blkback: add null check to avoid null pointer dereference drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
| * nvmet: release the sq ref on rdma read errorsVijay Immanuel2017-05-203-0/+8
| | | | | | | | | | | | | | | | | | | | | | On rdma read errors, release the sq ref that was taken when the req was initialized. This avoids a hang in nvmet_sq_destroy() when the queue is being freed. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvmet-fc: remove target cpu scheduling flagJames Smart2017-05-202-4/+1
| | | | | | | | | | | | | | | | | | Remove NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED. It's unnecessary. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-fc: stop queues on error detectionJames Smart2017-05-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per the recommendation by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Rather than waiting for reset work thread to stop queues and abort the ios, immediately stop the queues on error detection. Reset thread will restop the queues (as it's called on other paths), but it does not appear to have a side effect. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme-fc: require target or discovery role for fc-nvme targetsJames Smart2017-05-201-0/+6
| | | | | | | | | | | | | | | | | | | | In order to create an association, the remoteport must be serving either a target role or a discovery role. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme: unmap CMB and remove sysfs file in reset pathJon Derrick2017-05-201-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CMB doesn't get unmapped until removal while getting remapped on every reset. Add the unmapping and sysfs file removal to the reset path in nvme_pci_disable to match the mapping path in nvme_pci_enable. Fixes: 202021c1a ("nvme : Add sysfs entry for NVMe CMBs when appropriate") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Reviewed-By: Stephen Bates <sbates@raithlin.com> Cc: <stable@vger.kernel.org> # 4.9+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2017-05-111-4/+5
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A smaller collection of fixes that should go into -rc1. This contains: - A fix from Christoph, fixing a regression with the WRITE_SAME and partial completions. Caused a BUG() on ppc. - Fixup for __blk_mq_stop_hw_queues(), it should be static. From Colin. - Removal of dmesg error messages on elevator switching, when invoked from sysfs. From me. - Fix for blk-stat, using this_cpu_ptr() in a section only protected by rcu_read_lock(). This breaks when PREEMPT_RCU is enabled. From me. - Two fixes for BFQ from Paolo, one fixing a crash and one updating the documentation. - An error handling lightnvm memory leak, from Rakesh. - The previous blk-mq hot unplug lock reversal depends on the CPU hotplug rework that isn't in mainline yet. This caused a lockdep splat when people unplugged CPUs with blk-mq devices. From Wanpeng. - A regression fix for DIF/DIX on blk-mq. From Wen" * 'for-linus' of git://git.kernel.dk/linux-block: block: handle partial completions for special payload requests blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op blk-stat: don't use this_cpu_ptr() in a preemptable section elevator: remove redundant warnings on IO scheduler switch block, bfq: stress that low_latency must be off to get max throughput block, bfq: use pointer entity->sched_data only if set nvme: lightnvm: fix memory leak blk-mq: make __blk_mq_stop_hw_queues static lightnvm: remove unused rq parameter of nvme_nvm_rqtocmd() to kill warning block/mq: fix potential deadlock during cpu hotplug
| * nvme: lightnvm: fix memory leakRakesh Pandit2017-05-101-1/+2
| | | | | | | | | | | | | | | | | | | | Free up kmalloc allocated memory if failure happens while handling L2P table transfer in nvme_nvm_get_l2p_tbl. Fixes: 8e79b5cb ("lightnvm: move block provisioning to targets") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * lightnvm: remove unused rq parameter of nvme_nvm_rqtocmd() to kill warningGeert Uytterhoeven2017-05-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With gcc 4.1.2: drivers/nvme/host/lightnvm.c: In function ‘nvme_nvm_submit_io’: drivers/nvme/host/lightnvm.c:498: warning: ‘rq’ is used uninitialized in this function Indeed, since commit 2e13f33a2464fc3a ("lightnvm: create cmd before allocating request"), the request is passed to nvme_nvm_rqtocmd() before it is allocated. Fortunately, as of commit 91276162de9476b8 ("lightnvm: refactor end_io functions for sync"), nvme_nvm_rqtocmd () no longer uses the passed request, so this warning is a false positive. Drop the unused parameter to clean up the code and kill the warning. Fixes: 2e13f33a2464fc3a ("lightnvm: create cmd before allocating request") Fixes: 91276162de9476b8 ("lightnvm: refactor end_io functions for sync") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <axboe@fb.com>
* | Merge tag 'pci-v4.12-changes' of ↵Linus Torvalds2017-05-091-17/+13
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ...
| * nvme/pci: Switch to pci_request_irq()Christoph Hellwig2017-04-181-17/+13
| | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
* | lightnvm: create cmd before allocating requestJavier González2017-05-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Create nvme command before allocating a request using nvme_alloc_request, which uses the command direction. Up until now, the command has been zeroized, so all commands have been allocated as a read operation. Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | blk-mq: update ->init_request and ->exit_request prototypesChristoph Hellwig2017-05-024-41/+39
| | | | | | | | | | | | | | | | | | | | | | | | Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into ↵Jens Axboe2017-04-275-370/+748
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for-4.12/post-merge Christoph writes: "A couple more updates for 4.12. The biggest pile is fc and lpfc updates from James, but there are various small fixes and cleanups as well." Fixes up a few merge issues, and also a warning in lpfc_nvmet_rcv_unsol_abort() if CONFIG_NVME_TARGET_FC isn't enabled. Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme-scsi: remove nvme_trans_security_protocolChristoph Hellwig2017-04-271-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | This function just returns the same error code and sense data as the default statement in the switch in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
| * | nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_ioChristoph Hellwig2017-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
| * | nvme-scsi: Consider LBA format in IO splitting calculationJon Derrick2017-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current command submission code uses a sector-based value when considering the maximum number of blocks per command. With a 4k-formatted namespace and a command exceeding max hardware limits, this calculation doesn't split IOs which should be split and fails in the nvme layer. This patch fixes that calculation and enables IO splitting in these circumstances. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Jens Axboe <axboe@fb.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twiceEwan D. Milne2017-04-251-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Do not call nvmf_free_options() from the nvme_fc_ctlr destructor if nvme_fc_create_ctrl() returns an error, because nvmf_create_ctrl() frees the options when an error is returned. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet-fcloop: mark two symbols staticChristoph Hellwig2017-04-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | nvmet-fc: properly endian swap sq_headChristoph Hellwig2017-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | nvmet-fc: mark the sqhd field as __le16Christoph Hellwig2017-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That's what it's used as. Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | nvmet-fc: fix endianess annoations for nvmet_fc_format_rsp_hdrChristoph Hellwig2017-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The passed in desc_len is a big endian value, so mark it as such. Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | nvmet-fc: mark nvmet_fc_handle_fcp_rqst staticChristoph Hellwig2017-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | nvme-fc: mark two symbols staticChristoph Hellwig2017-04-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | nvme_fc: add controller reset supportJames Smart2017-04-241-315/+616
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch actually does quite a few things. When looking to add controller reset support, the organization modeled after rdma was very fragmented. rdma duplicates the reset and teardown paths and does different things to the block layer on the two paths. The code to build up the controller is also duplicated between the initial creation and the reset/error recovery paths. So I decided to make this sane. I reorganized the controller creation and teardown so that there is a connect path and a disconnect path. Initial creation obviously uses the connect path. Controller teardown will use the disconnect path, followed last access code. Controller reset will use the disconnect path to stop operation, and then the connect path to re-establish the controller. Along the way, several things were fixed - aens were not properly set up. They are allocated differently from the per-request structure on the blk queues. - aens were oddly torn down. the prior patch corrected to abort, but we still need to dma unmap and free relative elements. - missed a few ref counting points: in aen completion and on i/o's that fail - controller initial create failure paths were still confused vs teardown before converting to ref counting vs after we convert to refcounting. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme_fc: add aen abort to teardownJames Smart2017-04-241-52/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add abort support for aens. Commonized the op abort to apply to aen or real ios (caused some reorg/routine movement). Abort path sets termination flag in prep for next patch that will be watching i/o abort completion before proceeding with controller teardown. Now that we're aborting aens, the "exit" code that simply cleared out their context no longer applies. Also clarified how we detect an AEN vs a normal io - by a flag, not by whether a rq exists or the a rqno is out of range. Note: saw some interesting cases where if the queues are stopped and we're waiting for the aborts, the core layer can call the complete_rq callback for the io. So the io completion synchronizes link side completion with possible blk layer completion under error. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme_fc: fix command id checkJames Smart2017-04-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code validates the command_id in the response to the original sqe command. But prior code was using the rq->rqno as the sqe command id. The core layer overwrites what the transport set there originally. Use the actual sqe content. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | nvme: Add nvme_core.force_apst to ignore the NO_APST quirkAndy Lutomirski2017-04-251-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're probably going to be stuck quirking APST off on an over-broad range of devices for 4.11. Let's make it easy to override the quirk for testing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | nvme: Display raw APST configuration via DYNAMIC_DEBUGAndy Lutomirski2017-04-251-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Debugging APST is currently a bit of a pain. This gives optional simple log messages that describe the APST state. The easiest way to use this is probably with the nvme_core.dyndbg=+p module parameter. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | nvme: Fix APST commentAndy Lutomirski2017-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | There was a typo in the description of the timeout heuristic. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>