summaryrefslogtreecommitdiffstats
path: root/drivers/md/dm.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* dm: store dm_target_io in bio front_padMikulas Patocka2012-10-121-59/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | Use the recently-added bio front_pad field to allocate struct dm_target_io. Prior to this patch, dm_target_io was allocated from a mempool. For each dm_target_io, there is exactly one bio allocated from a bioset. This patch merges these two allocations into one allocation: we create a bioset with front_pad equal to the size of dm_target_io so that every bio allocated from the bioset has sizeof(struct dm_target_io) bytes before it. We allocate a bio and use the bytes before the bio as dm_target_io. _tio_cache is removed and the tio_pool mempool is now only used for request-based devices. This idea was introduced by Kent Overstreet. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: tj@kernel.org Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Bill Pemberton <wfp5p@viridian.itc.virginia.edu> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-blockLinus Torvalds2012-10-111-52/+22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block IO update from Jens Axboe: "Core block IO bits for 3.7. Not a huge round this time, it contains: - First series from Kent cleaning up and generalizing bio allocation and freeing. - WRITE_SAME support from Martin. - Mikulas patches to prevent O_DIRECT crashes when someone changes the block size of a device. - Make bio_split() work on data-less bio's (like trim/discards). - A few other minor fixups." Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew Morton. It is due to the VM no longer using a prio-tree (see commit 6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree"). So make set_blocksize() use mapping_mapped() instead of open-coding the internal VM knowledge that has changed. * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits) block: makes bio_split support bio without data scatterlist: refactor the sg_nents scatterlist: add sg_nents fs: fix include/percpu-rwsem.h export error percpu-rw-semaphore: fix documentation typos fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared blockdev: turn a rw semaphore into a percpu rw semaphore Fix a crash when block device is read and block size is changed at the same time block: fix request_queue->flags initialization block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() block: ioctl to zero block ranges block: Make blkdev_issue_zeroout use WRITE SAME block: Implement support for WRITE SAME block: Consolidate command flag and queue limit checks for merges block: Clean up special command handling logic block/blk-tag.c: Remove useless kfree block: remove the duplicated setting for congestion_threshold block: reject invalid queue attribute values block: Add bio_clone_bioset(), bio_clone_kmalloc() block: Consolidate bio_alloc_bioset(), bio_kmalloc() ...
| * block: Add bio_clone_bioset(), bio_clone_kmalloc()Kent Overstreet2012-09-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, there was bio_clone() but it only allocated from the fs bio set; as a result various users were open coding it and using __bio_clone(). This changes bio_clone() to become bio_clone_bioset(), and then we add bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of the functionality the last patch adedd. This will also help in a later patch changing how bio cloning works. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: Boaz Harrosh <bharrosh@panasas.com> CC: Jeff Garzik <jeff@garzik.org> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * dm: Use bioset's front_pad for dm_rq_clone_bio_infoKent Overstreet2012-09-091-28/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, dm_rq_clone_bio_info needed to be freed by the bio's destructor to avoid a memory leak in the blk_rq_prep_clone() error path. This gets rid of a memory allocation and means we can kill dm_rq_bio_destructor. The _rq_bio_info_cache kmem cache is unused now and needs to be deleted, but due to the way io_pool is used and overloaded this looks not quite trivial so I'm leaving it for a later patch. v6: Fix comment on struct dm_rq_clone_bio_info, per Tejun Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Alasdair Kergon <agk@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Ues bi_pool for bio_integrity_alloc()Kent Overstreet2012-09-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that bios keep track of where they were allocated from, bio_integrity_alloc_bioset() becomes redundant. Remove bio_integrity_alloc_bioset() and drop bio_set argument from the related functions and make them use bio->bi_pool. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Generalized bio pool freeingKent Overstreet2012-09-091-20/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the old code, when you allocate a bio from a bio pool you have to implement your own destructor that knows how to find the bio pool the bio was originally allocated from. This adds a new field to struct bio (bi_pool) and changes bio_alloc_bioset() to use it. This makes various bio destructors unnecessary, so they're then deleted. v6: Explain the temporary if statement in bio_put Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: Nicholas Bellinger <nab@linux-iscsi.org> CC: Lars Ellenberg <lars.ellenberg@linbit.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | dm: retain table limits when swapping to new table with no devicesMike Snitzer2012-09-271-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a safety net that will re-use the DM device's existing limits in the event that DM device has a temporary table that doesn't have any component devices. This is to reduce the chance that requests not respecting the hardware limits will reach the device. DM recalculates queue limits based only on devices which currently exist in the table. This creates a problem in the event all devices are temporarily removed such as all paths being lost in multipath. DM will reset the limits to the maximum permissible, which can then assemble requests which exceed the limits of the paths when the paths are restored. The request will fail the blk_rq_check_limits() test when sent to a path with lower limits, and will be retried without end by multipath. This became a much bigger issue after v3.6 commit fe86cdcef ("block: do not artificially constrain max_sectors for stacking drivers"). Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | dm: handle requests beyond end of device instead of using BUG_ONMike Snitzer2012-09-271-18/+38
|/ | | | | | | | | | | | | | | | | | | | | | The access beyond the end of device BUG_ON that was introduced to dm_request_fn via commit 29e4013de7ad950280e4b2208 ("dm: implement REQ_FLUSH/FUA support for request-based dm") was an overly drastic (but simple) response to this situation. I have received a report that this BUG_ON was hit and now think it would be better to use dm_kill_unmapped_request() to fail the clone and original request with -EIO. map_request() will assign the valid target returned by dm_table_find_target to tio->ti. But when the target isn't valid tio->ti is never assigned (because map_request isn't called); so add a check for tio->ti != NULL to dm_done(). Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: introduce split_discard_requestsMikulas Patocka2012-07-271-1/+4
| | | | | | | | | | | | This patch introduces a new variable split_discard_requests. It can be set by targets so that discard requests are split on max_io_len boundaries. When split_discard_requests is not set, discard requests are only split on boundaries between targets, as was the case before this patch. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: support non power of two target max_io_lenMike Snitzer2012-07-271-8/+27
| | | | | | | | | | | | | Remove the restriction that limits a target's specified maximum incoming I/O size to be a power of 2. Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'. Change it from sector_t to uint32_t, which is plenty big enough, and introduce a wrapper function dm_set_target_max_io_len() to set it. Use sector_div() to process it now that it is not necessarily a power of 2. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: clear bi_end_io on remapping failureHannes Reinecke2012-03-281-0/+1
| | | | | | | As a precaution, set bi_end_io to NULL when failing to remap. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* fs: move code out of buffer.cAl Viro2012-01-041-1/+0
| | | | | | | | | | | | | | | Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export kill_bdev as well, so brd doesn't have to open code it. Reduce buffer_head.h requirement accordingly. Removed a rather large comment from invalidate_bdev, as it looked a bit obsolete to bother moving. The small comment replacing it says enough. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-blockLinus Torvalds2011-11-051-18/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits) block: don't call blk_drain_queue() if elevator is not up blk-throttle: use queue_is_locked() instead of lockdep_is_held() blk-throttle: Take blkcg->lock while traversing blkcg->policy_list blk-throttle: Free up policy node associated with deleted rule block: warn if tag is greater than real_max_depth. block: make gendisk hold a reference to its queue blk-flush: move the queue kick into blk-flush: fix invalid BUG_ON in blk_insert_flush block: Remove the control of complete cpu from bio. block: fix a typo in the blk-cgroup.h file block: initialize the bounce pool if high memory may be added later block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown block: drop @tsk from attempt_plug_merge() and explain sync rules block: make get_request[_wait]() fail if queue is dead block: reorganize throtl_get_tg() and blk_throtl_bio() block: reorganize queue draining block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() block: pass around REQ_* flags instead of broken down booleans during request alloc/free block: move blk_throtl prototypes to block/blk.h block: fix genhd refcounting in blkio_policy_parse_and_set() ... Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion and making the request functions be of type "void" instead of "int" in - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c} - drivers/staging/zram/zram_drv.c
| * block: remove support for bio remapping from ->make_requestChristoph Hellwig2011-09-121-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: rename __make_request() to blk_queue_bio()Jens Axboe2011-09-121-1/+1
| | | | | | | | | | | | Now that it's exported, lets put it in a more sane namespace. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: export __make_requestChristoph Hellwig2011-09-121-12/+1
| | | | | | | | | | | | | | | | Avoid the hacks need for request based device mappers currently by simply exporting the symbol instead of trying to get it through the back door. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | dm: export dm get mdAlasdair G Kergon2011-10-311-0/+1
| | | | | | | | | | | | Export dm_get_md() for the new thin provisioning target to use. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | dm table: add immutable featureAlasdair G Kergon2011-10-311-0/+9
| | | | | | | | | | | | | | | | | | | | Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed with any other target type, and once loaded into a device, it cannot be replaced with a table containing a different type. The thin provisioning pool device will use this. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | dm: remove superfluous smp_mbNamhyung Kim2011-10-311-1/+0
| | | | | | | | | | | | | | | | Since set_current_state() contains a memory barrier in it, an additional barrier isn't needed. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | dm: use local printk ratelimitNamhyung Kim2011-10-311-0/+10
|/ | | | | | | | | printk_ratelimit() shares global ratelimiting state with all other subsystems, so its usage is discouraged. Instead, define and use dm's local state. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm table: set flush capability based on underlying devicesMike Snitzer2011-08-021-1/+0
| | | | | | | | | | | | | | | | DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities regardless of whether or not a given DM device's underlying devices also advertised a need for them. Block's flush-merge changes from 2.6.39 have proven to be more costly for DM devices. Performance regressions have been reported even when DM's underlying devices do not advertise that they have a write cache. Fix the performance regressions by configuring a DM device's flushing capabilities based on those of the underlying devices' capabilities. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: ignore merge_bvec for snapshots when safeMikulas Patocka2011-08-021-0/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate whether the device can accept bios larger than the size its merge function returns. When set, use this to send large bios to snapshots which can split them if necessary. Snapshot I/O may be significantly fragmented and this approach seems to improve peformance. Before the patch, dm_set_device_limits restricted bio size to page size if the underlying device had a merge function and the target didn't provide a merge function. After the patch, dm_set_device_limits restricts bio size to page size if the underlying device has a merge function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't provide a merge function. The snapshot target can't provide a merge function because when the merge function is called, it is impossible to determine where the bio will be remapped. Previously this led us to impose a 4k limit, which we can now remove if the snapshot store is located on a device without a merge function. Together with another patch for optimizing full chunk writes, it improves performance from 29MB/s to 40MB/s when writing to the filesystem on snapshot store. If the snapshot store is placed on a non-dm device with a merge function (such as md-raid), device mapper still limits all bios to page size. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm table: fix discard supportMike Snitzer2011-08-021-1/+2
| | | | | | | | | | | | | | | | | | | | Remove 'discards_supported' from the dm_table structure. The same information can be easily discovered from the table's target(s) in dm_table_supports_discards(). Before this fix dm_table_supports_discards() would skip checking the individual targets' 'discards_supported' flag if any one target in the table didn't set num_discard_requests > 0. Now the per-target 'discards_supported' flag is effective at insuring the final DM device advertises discard support. But, to be clear, targets that don't support discards (!num_discard_requests) will not receive discard requests. Also DMWARN if a target sets 'discards_supported' override but forgets to set 'num_discard_requests'. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: fix idr leak on module removalAlasdair G Kergon2011-08-021-2/+8
| | | | | | | Destroy _minor_idr when unloading the core dm module. (Found by kmemleak.) Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* block: fix non-atomic access to genhd inflight structuresShaohua Li2011-03-221-3/+4
| | | | | | | | After the stack plugging introduction, these are called lockless. Ensure that the counters are updated atomically. Signed-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: Require subsystems to explicitly allocate bio_set integrity mempoolMartin K. Petersen2011-03-171-3/+9
| | | | | | | | | | | | | | | | | | MD and DM create a new bio_set for every metadevice. Each bio_set has an integrity mempool attached regardless of whether the metadevice is capable of passing integrity metadata. This is a waste of memory. Instead we defer the allocation decision to MD and DM since we know at metadevice creation time whether integrity passthrough is needed or not. Automatic integrity mempool allocation can then be removed from bioset_create() and we make an explicit integrity allocation for the fs_bio_set. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Acked-by: Mike Snitzer <snizer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: remove per-queue pluggingJens Axboe2011-03-101-28/+5
| | | | | | | | Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* dm: remove superfluous irq disablement in dm_request_fnKiyoshi Ueda2011-01-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | This patch changes spin_lock_irq() to spin_lock() in dm_request_fn(). This patch is just a clean-up and no functional change. The spin_lock_irq() was leftover from the early request-based dm code, where map_request() used to enable interrupts. Since current map_request() never enables interrupts, we can change it to spin_lock() to match the prior spin_unlock(). Auditing through the dm and block-layer code called from map_request(), I confirmed all functions save/restore interrupt status, so no function returning with interrupts enabled. Also I haven't observed any problem on my test environment which uses scsi and lpfc driver after heavy I/O testing with occasional path down/up. Added BUG_ON() to detect breakage in future. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: use non reentrant workqueues if equivalentTejun Heo2011-01-131-1/+2
| | | | | | | | | | | | kmirrord_wq, kcopyd_work and md->wq are created per dm instance and serve only a single work item from the dm instance, so non-reentrant workqueues would provide the same ordering guarantees as ordered ones while allowing CPU affinity and use of the workqueues for other purposes. Switch them to non-reentrant workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: convert workqueues to alloc_orderedTejun Heo2011-01-131-1/+1
| | | | | | | | | | Convert all create[_singlethread]_work() users to the new alloc[_ordered]_workqueue(). This conversion is mechanical and doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: remove dm_mutex after bkl conversionMilan Broz2011-01-131-5/+4
| | | | | | | | | | | | | | | This patch replaces dm_mutex with _minor_lock in dm_blk_close() and then removes it. During the BKL conversion, commit 6e9624b8caec290d28b4c6d9ec75749df6372b87 (block: push down BKL into .open and .release) pushed lock_kernel() down into dm_blk_open/close calls. Commit 2a48fc0ab24241755dc93bfd4f01d68efab47f5a (block: autoconvert trivial BKL users to private mutex) converted it to a local mutex, but _minor_lock is sufficient. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: dont take i_mutex to change device sizeMike Snitzer2011-01-131-2/+3
| | | | | | | | | | | | | | | | | | | No longer needlessly hold md->bdev->bd_inode->i_mutex when changing the size of a DM device. This additional locking is unnecessary because i_size_write() is already protected by the existing critical section in dm_swap_table(). DM already has a reference on md->bdev so the associated bd_inode may be changed without lifetime concerns. A negative side-effect of having held md->bdev->bd_inode->i_mutex was that a concurrent DM device resize and flush (via fsync) would deadlock. Dropping md->bdev->bd_inode->i_mutex eliminates this potential for deadlock. The following reproducer no longer deadlocks: https://www.redhat.com/archives/dm-devel/2009-July/msg00284.html Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
* block: trace event block fix unassigned fieldJeff Moyer2011-01-071-1/+1
| | | | | | | | | | | | | | | | The "error" field in block_bio_complete is not assigned, leaving the memory area uninitialized (keeping garbage data). Pass an additional tracepoint argument to this event to initialize this field. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Li Zefan <lizf@cn.fujitsu.com> CC: Alan.Brunelle@hp.com Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: Rename "block_remap" tracepoint to "block_bio_remap" to clarify the ↵Mike Snitzer2010-11-161-2/+2
| | | | | | | | | event. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-10-231-317/+81
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...
| * dm: convey that all flushes are processed as emptyMike Snitzer2010-09-101-19/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename __clone_and_map_flush to __clone_and_map_empty_flush for added clarity. Simplify logic associated with REQ_FLUSH conditionals. Introduce a BUG_ON() and add a few more helpful comments to the code so that it is clear that all flushes are empty. Cleanup __split_and_process_bio() so that an empty flush isn't processed by a 'sector_count' focused while loop. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * dm: fix locking context in queue_io()Kiyoshi Ueda2010-09-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | Now queue_io() is called from dec_pending(), which may be called with interrupts disabled, so queue_io() must not enable interrupts unconditionally and must save/restore the current interrupts status. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * dm: relax ordering of bio-based flush implementationTejun Heo2010-09-101-112/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA doesn't mandate any ordering against other bio's. This patch relaxes ordering around flushes. * A flush bio is no longer deferred to workqueue directly. It's processed like other bio's but __split_and_process_bio() uses md->flush_bio as the clone source. md->flush_bio is initialized to empty flush during md initialization and shared for all flushes. * As a flush bio now travels through the same execution path as other bio's, there's no need for dedicated error handling path either. It can use the same error handling path in dec_pending(). Dedicated error handling removed along with md->flush_error. * When dec_pending() detects that a flush has completed, it checks whether the original bio has data. If so, the bio is queued to the deferred list w/ REQ_FLUSH cleared; otherwise, it's completed. * As flush sequencing is handled in the usual issue/completion path, dm_wq_work() no longer needs to handle flushes differently. Now its only responsibility is re-issuing deferred bio's the same way as _dm_request() would. REQ_FLUSH handling logic including process_flush() is dropped. * There's no reason for queue_io() and dm_wq_work() write lock dm->io_lock. queue_io() now only uses md->deferred_lock and dm_wq_work() read locks dm->io_lock. * bio's no longer need to be queued on the deferred list while a flush is in progress making DMF_QUEUE_IO_TO_THREAD unncessary. Drop it. This avoids stalling the device during flushes and simplifies the implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * dm: implement REQ_FLUSH/FUA support for request-based dmTejun Heo2010-09-101-184/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts request-based dm to support the new REQ_FLUSH/FUA. The original request-based flush implementation depended on request_queue blocking other requests while a barrier sequence is in progress, which is no longer true for the new REQ_FLUSH/FUA. In general, request-based dm doesn't have infrastructure for cloning one source request to multiple targets, but the original flush implementation had a special mostly independent path which can issue flushes to multiple targets and sequence them. However, the capability isn't currently in use and adds a lot of complexity. Moreoever, it's unlikely to be useful in its current form as it doesn't make sense to be able to send out flushes to multiple targets when write requests can't be. This patch rips out special flush code path and deals handles REQ_FLUSH/FUA requests the same way as other requests. The only special treatment is that REQ_FLUSH requests use the block address 0 when finding target, which is enough for now. * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as suggested by Mike Snitzer Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * dm: implement REQ_FLUSH/FUA support for bio-based dmTejun Heo2010-09-101-62/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts bio-based dm to support REQ_FLUSH/FUA instead of now deprecated REQ_HARDBARRIER. * -EOPNOTSUPP handling logic dropped. * Preflush is handled as before but postflush is dropped and replaced with passing down REQ_FUA to member request_queues. This replaces one array wide cache flush w/ member specific FUA writes. * __split_and_process_bio() now calls __clone_and_map_flush() directly for flushes and guarantees all FLUSH bio's going to targets are zero ` length. * It's now guaranteed that all FLUSH bio's which are passed onto dm targets are zero length. bio_empty_barrier() tests are replaced with REQ_FLUSH tests. * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes. * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely enough to be marked with unlikely(). * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA capability. * Request based dm isn't converted yet. dm_init_request_based_queue() resets flush support to 0 for now. To avoid disturbing request based dm code, dm->flush_error is added for bio based dm while requested based dm continues to use dm->barrier_error. Lightly tested linear, stripe, raid1, snap and crypt targets. Please proceed with caution as I'm not familiar with the code base. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush()Tejun Heo2010-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler blk_queue_flush(). blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a device has write cache and can flush it, it should set REQ_FLUSH. If the device can handle FUA writes, it should also set REQ_FUA. All blk_queue_ordered() users are converted. * ORDERED_DRAIN is mapped to 0 which is the default value. * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH. * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | block: autoconvert trivial BKL users to private mutexArnd Bergmann2010-10-051-5/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block device drivers have all gained new lock_kernel calls from a recent pushdown, and some of the drivers were already using the BKL before. This turns the BKL into a set of per-driver mutexes. Still need to check whether this is safe to do. file=$1 name=$2 if grep -q lock_kernel ${file} ; then if grep -q 'include.*linux.mutex.h' ${file} ; then sed -i '/include.*<linux\/smp_lock.h>/d' ${file} else sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file} fi sed -i ${file} \ -e "/^#include.*linux.mutex.h/,$ { 1,/^\(static\|int\|long\)/ { /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex); } }" \ -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \ -e '/[ ]*cycle_kernel_lock();/d' else sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \ -e '/cycle_kernel_lock()/d' fi Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* dm: split discard requests on target boundariesMike Snitzer2010-08-121-24/+23
| | | | | | | | | | | | Update __clone_and_map_discard to loop across all targets in a DM device's table when it processes a discard bio. If a discard crosses a target boundary it must be split accordingly. Update __issue_target_requests and __issue_target_request to allow a cloned discard bio to have a custom start sector and size. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: factor out max_io_len_target_boundaryMike Snitzer2010-08-121-8/+18
| | | | | | | | | | | | Split max_io_len_target_boundary out of max_io_len so that the discard support can make use of it without duplicating max_io_len code. Avoiding max_io_len's split_io logic enables DM's discard support to submit the entire discard request to a target. But discards must still be split on target boundaries. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: use common __issue_target_request for flush and discard supportMike Snitzer2010-08-121-8/+22
| | | | | | | | | | | | Rename __flush_target to __issue_target_request now that it is used to issue both flush and discard requests. Introduce __issue_target_requests as a convenient wrapper to __issue_target_request 'num_flush_requests' or 'num_discard_requests' times per target. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: linear support discardMike Snitzer2010-08-121-12/+53
| | | | | | | | | | | | | | | | | Allow discards to be passed through to linear mappings if at least one underlying device supports it. Discards will be forwarded only to devices that support them. A target that supports discards should set num_discard_requests to indicate how many times each discard request must be submitted to it. Verify table's underlying devices support discards prior to setting the associated DM device as capable of discards (via QUEUE_FLAG_DISCARD). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: rename map_info flush_request to target_request_nrMike Snitzer2010-08-121-9/+9
| | | | | | | | 'target_request_nr' is a more generic name that reflects the fact that it will be used for both flush and discard support. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: do not initialise full request queue when bio basedMike Snitzer2010-08-121-25/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change bio-based mapped devices no longer to have a fully initialized request_queue (request_fn, elevator, etc). This means bio-based DM devices no longer register elevator sysfs attributes ('iosched/' tree or 'scheduler' other than "none"). In contrast, a request-based DM device will continue to have a full request_queue and will register elevator sysfs attributes. Therefore a user can determine a DM device's type by checking if elevator sysfs attributes exist. First allocate a minimalist request_queue structure for a DM device (needed for both bio and request-based DM). Initialization of a full request_queue is deferred until it is known that the DM device is request-based, at the end of the table load sequence. Factor DM device's request_queue initialization: - common to both request-based and bio-based into dm_init_md_queue(). - specific to request-based into dm_init_request_based_queue(). The md->type_lock mutex is used to protect md->queue, in addition to md->type, during table_load(). A DM device's first table_load will establish the immutable md->type. But md->queue initialization, based on md->type, may fail at that time (because blk_init_allocated_queue cannot allocate memory). Therefore any subsequent table_load must (re)try dm_setup_md_queue independently of establishing md->type. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm ioctl: make bio or request based device type immutableMike Snitzer2010-08-121-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Determine whether a mapped device is bio-based or request-based when loading its first (inactive) table and don't allow that to be changed later. This patch performs different device initialisation in each of the two cases. (We don't think it's necessary to add code to support changing between the two types.) Allowed md->type transitions: DM_TYPE_NONE to DM_TYPE_BIO_BASED DM_TYPE_NONE to DM_TYPE_REQUEST_BASED We now prevent table_load from replacing the inactive table with a conflicting type of table even after an explicit table_clear. Introduce 'type_lock' into the struct mapped_device to protect md->type and to prepare for the next patch that will change the queue initialization and allocate memory while md->type_lock is held. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> drivers/md/dm-ioctl.c | 15 +++++++++++++++ drivers/md/dm.c | 37 ++++++++++++++++++++++++++++++------- drivers/md/dm.h | 5 +++++ include/linux/dm-ioctl.h | 4 ++-- 4 files changed, 52 insertions(+), 9 deletions(-)
* dm: skip second flush on bio unsupported errorMikulas Patocka2010-08-121-2/+13
| | | | | | | | | | | | | | | | | When processing barriers, skip the second flush if processing the bio failed with -EOPNOTSUPP. This can happen with discard+barrier requests. If the device doesn't support discard, there would be two useless SYNCHRONIZE CACHE commands. The first dm_flush cannot be so easily optimized out, so we leave it there. Previously, -EOPNOTSUPP could be received in dec_pending only with empty barriers and we ignored that error, assuming the device not supporting cache flushes has cache always consistent. With the addition of discard barriers, this -EOPNOTSUPP can also be generated by discards and we must record it in md->barrier_error for process_barrier. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>