summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-blockLinus Torvalds2016-12-13225-3091/+14275
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...
| * blk-stat: fix a few cases of missing batch flushingJens Axboe2016-12-091-0/+8
| | | | | | | | | | | | | | | | Everytime we need to read ->nr_samples, we should have flushed the batch first. The non-mq read path also needs to flush the batch. Signed-off-by: Jens Axboe <axboe@fb.com>
| * blk-flush: run the queue when inserting blk-mq flushJens Axboe2016-12-091-1/+1
| | | | | | | | | | | | | | | | | | Currently we pass in to run the queue async, but don't flag the queue to be run. We don't need to run it async here, but we should run it. So fixup the parameters. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
| * elevator: make the rqhash helpers exportedJens Axboe2016-12-092-4/+9
| | | | | | | | | | Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
| * blk-mq: abstract out blk_mq_dispatch_rq_list() helperJens Axboe2016-12-092-38/+48
| | | | | | | | | | | | | | | | Takes a list of requests, and dispatches it. Moves any residual requests to the dispatch list. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
| * blk-mq: add blk_mq_start_stopped_hw_queue()Jens Axboe2016-12-092-7/+13
| | | | | | | | | | | | | | | | We have a variant for all hardware queues, but not one for a single hardware queue. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
| * block: improve handling of the magic discard payloadChristoph Hellwig2016-12-0913-138/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating a single unused biovec for discard requests, send them down without any payload. Instead we allow the driver to add a "special" payload using a biovec embedded into struct request (unioned over other fields never used while in the driver), and overloading the number of segments for this case. This has a couple of advantages: - we don't have to allocate the bio_vec - the amount of special casing for discard requests in the block layer is significantly reduced - using this same scheme for other request types is trivial, which will be important for implementing the new WRITE_ZEROES op on devices where it actually requires a payload (e.g. SCSI) - we can get rid of playing games with the request length, as we'll never touch it and completions will work just fine - it will allow us to support ranged discard operations in the future by merging non-contiguous discard bios into a single request - last but not least it removes a lot of code This patch is the common base for my WIP series for ranges discards and to remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES, so it would be good to get it in quickly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * blk-wbt: don't throttle discard or write zeroesChristoph Hellwig2016-12-091-3/+2
| | | | | | | | | | | | | | | | Both of these are metadata only commands that are not issued by the writeback code and not directly relevant to the writeback bandwith. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nbd: use dev_err_ratelimited in io pathJosef Bacik2016-12-081-11/+12
| | | | | | | | | | | | | | | | | | While doing stress tests we noticed that we'd get a lot of dmesg spam if we suddenly disconnected the nbd device out of band. Rate limit the messages in the io path in order to deal with this. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nbd: reset the setup task for NBD_CLEAR_SOCKJosef Bacik2016-12-081-0/+1
| | | | | | | | | | | | | | | | | | If an app exits before running NBD_DO_IT but after adding sockets we can end up not being allowed to do a new nbd device. Fix this by making NBD_CLEAR_SOCK reset the setup_task. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * Merge branch 'nvmf-4.10' of git://git.infradead.org/nvme-fabrics into ↵Jens Axboe2016-12-0624-34/+7313
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for-4.10/block Sagi writes: The major addition here is the nvme FC transport implementation from James. What else: - some cleanups and memory leak fixes in the host side fabrics code from Bart - possible rcu violation fix from Sasha - logging change from Max - small include cleanup
| | * nvme-fabrics: Add FC LLDD loopback driver to test FC-NVMEJames Smart2016-12-064-0/+1164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add FC LLDD loopback driver to test FC host and target transport within nvme-fabrics To aid in the development and testing of the lower-level api of the FC transport, this loopback driver has been created to act as if it were a FC hba driver supporting both the host interfaces as well as the target interfaces with the nvme FC transport. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fabrics: Add target support for FC transportJames Smart2016-12-064-0/+2302
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implements the FC-NVME T11 definition of how nvme fabric capsules are performed on an FC fabric. Utilizes a lower-layer API to FC host adapters to send/receive FC-4 LS operations and perform the FCP transactions necessary to perform and FCP IO request for NVME. The T11 definitions for FC-4 Link Services are implemented which create NVMeOF connections. Implements the hooks with nvmet layer to pass NVME commands to it for processing and posting of data/response base to the host via the different connections. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fabrics: Add host support for FC transportJames Smart2016-12-064-0/+2607
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implements the FC-NVME T11 definition of how nvme fabric capsules are performed on an FC fabric. Utilizes a lower-layer API to FC host adapters to send/receive FC-4 LS operations and FCP operations that comprise NVME over FC operation. The T11 definitions for FC-4 Link Services are implemented which create NVMeOF connections. Implements the hooks with blk-mq to then submit admin and io requests to the different connections. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fabrics: Add FC transport LLDD api definitionsJames Smart2016-12-062-0/+852
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Host: - LLDD registration with the host transport - registering host ports (local ports) and target ports seen on fabric (remote ports) - Data structures and call points for FC-4 LS's and FCP IO requests Target: - LLDD registration with the target transport - registering nvme subsystem ports (target ports) - Data structures and call points for reception of FC-4 LS's and FCP IO requests, and callbacks to perform data and rsp transfers for the io. Add to MAINTAINERS file Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fabrics: Add FC transport FC-NVME definitionsJames Smart2016-12-062-0/+274
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Formats for Cmd, Data, Rsp IUs - Formats FC-4 LS definitions - Add to MAINTAINERS file Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fabrics: Add FC transport error codes to nvme.hJames Smart2016-12-061-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * Add type 0x28 NVME type code to scsi fc headersJames Smart2016-12-061-0/+2
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fabrics: patch target code in prep for FC transport supportJames Smart2016-12-061-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add FC transport type decoding - Add FC address family decoding Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-fabrics: set sqe.command_id in core not transportsJames Smart2016-12-064-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, core.c sets command_id only on rd/wr commands, leaving it to the transport to set it again to ensure the request had a command id. Move location of set in core so applies to all commands. Remove transport sets. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * parser: add u64 number parserJames Smart2016-12-062-0/+48
| | | | | | | | | | | | | | | | | | | | | Will be used by the nvme-fabrics FC transport in parsing options Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvme-rdma: align to generic ib_event logging helperMax Gurtovoy2016-12-061-1/+3
| | | | | | | | | | | | | | | | | | Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet-rdma: align to generic ib_event logging helperMax Gurtovoy2016-12-061-1/+2
| | | | | | | | | | | | | | | Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme-rdma: remove redundant defineSagi Grimberg2016-12-061-1/+0
| | | | | | | | | | | | | | | Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvme-fabrics: Adjust source code indentationBart Van Assche2016-12-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adjust indentation such that arguments are aligned. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvme/scsi: Remove set-but-not-used variablesBart Van Assche2016-12-061-9/+2
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvmet: Fix possible infinite loop triggered on hot namespace removalSolganik Alexander2016-12-063-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When removing a namespace we delete it from the subsystem namespaces list with list_del_init which allows us to know if it is enabled or not. The problem is that list_del_init initialize the list next and does not respect the RCU list-traversal we do on the IO path for locating a namespace. Instead we need to use list_del_rcu which is allowed to run concurrently with the _rcu list-traversal primitives (keeps list next intact) and guarantees concurrent nvmet_find_naespace forward progress. By changing that, we cannot rely on ns->dev_link for knowing if the namspace is enabled, so add enabled indicator entry to nvmet_ns for that. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com> Cc: <stable@vger.kernel.org> # v4.8+
| | * nvme-fabrics: Fix a memory leak in an nvmf_create_ctrl() error pathBart Van Assche2016-12-061-2/+1
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvme-fabrics: Fix memory leaks in nvmf_parse_options()Bart Van Assche2016-12-061-0/+2
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvme-rdma: force queue size to respect controller capabilitySamuel Jones2016-12-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Queue size needs to respect the Maximum Queue Entries Supported advertised by the controller in its Capability register. Signed-off-by: Samuel Jones <sjones@kalray.eu> Reviewed-by: Christoph Hellwig <hch@lst.de> [sagig: fixed queue_size adjustment according to Daniel Verkamp <daniel.verkamp@intel.com> comment] Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| | * nvmet-rdma: Fix REJ status codeBart Van Assche2016-12-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | nvmet_sq_init() returns a value <= 0. nvmet_rdma_cm_reject() expects a second argument that is a NVME_RDMA_CM_* constant. Hence this patch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimbeg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | blk-mq: blk_account_io_start() takes a boolJens Axboe2016-12-051-1/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | block: fix unintended fallthrough in generic_make_request_checks()Nicolai Stange2016-12-051-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit e73c23ff736e ("block: add async variant of blkdev_issue_zeroout") messages like the following show up: EXT4-fs (dm-1): Delayed block allocation failed for inode 2368848 at logical offset 0 with max blocks 1 with error 95 EXT4-fs (dm-1): This should not happen!! Data will be lost Due to the following fallthrough introduced with commit 2d253440b5af ("block: Define zoned block device operations"), generic_make_request_checks() would accept a REQ_OP_WRITE_SAME bio only if the block device supports "write same" *and* is a zoned one: switch (bio_op(bio)) { [...] case REQ_OP_WRITE_SAME: if (!bdev_write_same(bio->bi_bdev)) goto not_supported; case REQ_OP_ZONE_REPORT: case REQ_OP_ZONE_RESET: if (!bdev_is_zoned(bio->bi_bdev)) goto not_supported; break; [...] } Thus, although the bio setup as done by __blkdev_issue_write_same() from commit e73c23ff736e ("block: add async variant of blkdev_issue_zeroout") would succeed, its actual submission would not, resulting in the EOPNOTSUPP == 95. Fix this by removing the fallthrough which, due to the lack of an explicit comment, seems to be unintended anyway. Fixes: e73c23ff736e ("block: add async variant of blkdev_issue_zeroout") Fixes: 2d253440b5af ("block: Define zoned block device operations") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nbd: fix 64-bit divisionJens Axboe2016-12-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We have this: ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined! ERROR: "__divdi3" [drivers/block/nbd.ko] undefined! nbd.c:(.text+0x247c72): undefined reference to `__divdi3' due to a recent commit, that did 64-bit division. Use the proper divider function so that 32-bit compiles don't break. Fixes: ef77b515243b ("nbd: use loff_t for blocksize and nbd_set_size args") Signed-off-by: Jens Axboe <axboe@fb.com>
| * nbd: use loff_t for blocksize and nbd_set_size argsJosef Bacik2016-12-031-4/+4
| | | | | | | | | | | | | | | | | | | | If we have large devices (say like the 40t drive I was trying to test with) we will end up overflowing the int arguments to nbd_set_size and not get the right size for our device. Fix this by using loff_t everywhere so I don't have to think about this again. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * blk-stat: fix a typoShaohua Li2016-12-031-1/+1
| | | | | | | | | | | | Signed-off-by: Shaohua Li <shli@fb.com> Fixes: cf43e6be865a ("block: add scalable completion tracking of requests") Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: factor out req_set_nomergeRitesh Harjani2016-12-011-9/+10
| | | | | | | | | | | | | | | | | | | | | | Factor out common code for setting REQ_NOMERGE flag which is being used out at certain places and make it a helper instead, req_set_nomerge(). Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org> Get rid of the inline. Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: protect iterate_bdevs() against concurrent closeRabin Vincent2016-12-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a block device is closed while iterate_bdevs() is handling it, the following NULL pointer dereference occurs because bdev->b_disk is NULL in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in turn called by the mapping_cap_writeback_dirty() call in __filemap_fdatawrite_range()): BUG: unable to handle kernel NULL pointer dereference at 0000000000000508 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 PGD 9e62067 PUD 9ee8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000 RIP: 0010:[<ffffffff81314790>] [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38 FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0 Stack: ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001 ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f Call Trace: [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90 [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30 [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20 [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130 [<ffffffff811b2763>] sys_sync+0x63/0x90 [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d RIP [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP <ffff880009f5fe68> CR2: 0000000000000508 ---[ end trace 2487336ceb3de62d ]--- The crash is easily reproducible by running the following command, if an msleep(100) is inserted before the call to func() in iterate_devs(): while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done Fix it by holding the bd_mutex across the func() call and only calling func() if the bdev is opened. Cc: stable@vger.kernel.org Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices") Reported-and-tested-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: mtip32xx: set error code on failurePan Bian2016-12-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix bug https://bugzilla.kernel.org/show_bug.cgi?id=188531. In function mtip_block_initialize(), variable rv takes the return value, and its value should be negative on errors. rv is initialized as 0 and is not reset when the call to ida_pre_get() fails. So 0 may be returned. The return value 0 indicates that there is no error, which may be inconsistent with the execution status. This patch fixes the bug by explicitly assigning -ENOMEM to rv on the branch that ida_pre_get() fails. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvmet: add support for the Write Zeroes commandChaitanya Kulkarni2016-12-012-1/+31
| | | | | | | | | | | | | | | | | | | | | | Add support for handling write zeroes command on target. Call into __blkdev_issue_zeroout, which the block layer expands into the best suitable variant of zeroing the LBAs. Allow write zeroes operation to deallocate the LBAs when calling __blkdev_issue_zeroout. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme: add support for the Write Zeroes commandChaitanya Kulkarni2016-12-011-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block device, if the device supports optional command bit set for write zeroes. Add support to setup write zeroes command. Set maximum possible write zeroes sectors in one write zeroes command according to nvme write zeroes command definition. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * nvme.h: add Write Zeroes definitionsChaitanya Kulkarni2016-12-011-0/+20
| | | | | | | | | | | | | | | | | | Add the command structure, optional command set support (ONCS) bit and a new error code for the Write Zeroes command. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: add support for REQ_OP_WRITE_ZEROESChaitanya Kulkarni2016-12-0111-19/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new block layer operation to zero out a range of LBAs. This allows to implement zeroing for devices that don't use either discard with a predictable zero pattern or WRITE SAME of zeroes. The prominent example of that is NVMe with the Write Zeroes command, but in the future, this should also help with improving the way zeroing discards work. For this operation, suitable entry is exported in sysfs which indicate the number of maximum bytes allowed in one write zeroes operation by the device. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: add async variant of blkdev_issue_zerooutChaitanya Kulkarni2016-12-012-34/+84
| | | | | | | | | | | | | | | | | | | | Similar to __blkdev_issue_discard this variant allows submitting the final bio asynchronously and chaining multiple ranges into a single completion. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * block: Check partition alignment on zoned block devicesDamien Le Moal2016-12-011-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of a zoned block device. However, the first and last zones reported for a partition make sense only if the partition start sector and size are aligned on the device zone size. The same applies for zone reset. Resetting the first or the last zone of a partition straddling zones may impact neighboring partitions. Finally, if a partition start sector is not at the beginning of a sequential zone, it will be impossible to write to the first sectors of the partition on a host-managed device. Avoid all these problems and incoherencies by ignoring partitions that are not zone aligned. Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the correct disk zoning type (host-aware, host-managed or none) but bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone size is unknown). So test this as a way to ensure that a zoned block device is being handled as such. As a result, for a host-aware devices, unaligned zone partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the disk will be treated as a regular block device (as it should). If zoned block device support is enabled, only aligned partitions will be accepted. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * lightnvm: transform target get/set bad blockJavier González2016-11-294-11/+85
| | | | | | | | | | | | | | | | | | | | Since targets are given a virtual target device, it is necessary to translate all communication between targets and the backend device. Implement the translation layer for get/set bad block table. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * lightnvm: use target nvm on target-specific ops.Javier González2016-11-294-16/+21
| | | | | | | | | | | | | | | | | | On target-specific operations pass on nvm_tgt_dev instead of the generic nvm device. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * lightnvm: introduce max_phys_sects helper functionJavier González2016-11-292-0/+9
| | | | | | | | | | | | | | | | | | | | Target devices do not have access to the device driver operations. Introduce a helper function that exposes the max. number of physical sectors supported by the underlying device. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * lightnvm: introduce helpers for generic ops in rrpcJavier González2016-11-294-19/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | Avoid calling media manager and device-specific operations directly from rrpc. Create helper functions on lightnvm's core instead. Signed-off-by: Javier González <javier@cnexlabs.com> Made it work with null_blk as well. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * lightnvm: eliminate nvm_lun abstraction in mmJavier González2016-11-297-164/+419
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to naturally support multi-target instances on an Open-Channel SSD, targets should own the LUNs they get blocks from and manage provisioning internally. This is done in several steps. Since targets own the LUNs the are instantiated on top of and manage the free block list internally, there is no need for a LUN abstraction in the media manager. LUNs are intrinsically managed as in the physical layout (ch:0,lun:0, ..., ch:0,lun:n, ch:1,lun:0, ch:1,lun:n, ..., ch:m,lun:0, ch:m,lun:n) and given to the targets based on the target creation ioctl. This simplifies LUN management and clears the path for a partition manager to sit directly underneath LightNVM targets. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>