summaryrefslogtreecommitdiffstats
path: root/drivers/md (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'dm-3.8-fixes' of ↵Linus Torvalds2012-12-2228-433/+473
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull dm update from Alasdair G Kergon: "Miscellaneous device-mapper fixes, cleanups and performance improvements. Of particular note: - Disable broken WRITE SAME support in all targets except linear and striped. Use it when kcopyd is zeroing blocks. - Remove several mempools from targets by moving the data into the bio's new front_pad area(which dm calls 'per_bio_data'). - Fix a race in thin provisioning if discards are misused. - Prevent userspace from interfering with the ioctl parameters and use kmalloc for the data buffer if it's small instead of vmalloc. - Throttle some annoying error messages when I/O fails." * tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits) dm stripe: add WRITE SAME support dm: remove map_info dm snapshot: do not use map_context dm thin: dont use map_context dm raid1: dont use map_context dm flakey: dont use map_context dm raid1: rename read_record to bio_record dm: move target request nr to dm_target_io dm snapshot: use per_bio_data dm verity: use per_bio_data dm raid1: use per_bio_data dm: introduce per_bio_data dm kcopyd: add WRITE SAME support to dm_kcopyd_zero dm linear: add WRITE SAME support dm: add WRITE SAME support dm: prepare to support WRITE SAME dm ioctl: use kmalloc if possible dm ioctl: remove PF_MEMALLOC dm persistent data: improve improve space map block alloc failure message dm thin: use DMERR_LIMIT for errors ...
| * dm stripe: add WRITE SAME supportMike Snitzer2012-12-211-4/+6
| | | | | | | | | | | | | | | | Rename stripe_map_discard to stripe_map_range and reuse it for WRITE SAME bio processing. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: remove map_infoMikulas Patocka2012-12-2113-50/+29
| | | | | | | | | | | | | | | | This patch removes map_info from bio-based device mapper targets. map_info is still used for request-based targets. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: do not use map_contextMikulas Patocka2012-12-211-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Eliminate struct map_info from dm-snap. map_info->ptr was used in dm-snap to indicate if the bio was tracked. If map_info->ptr was non-NULL, the bio was linked in tracked_chunk_hash. This patch removes the use of map_info->ptr. We determine if the bio was tracked based on hlist_unhashed(&c->node). If hlist_unhashed is true, the bio is not tracked, if it is false, the bio is tracked. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: dont use map_contextMikulas Patocka2012-12-211-36/+13
| | | | | | | | | | | | | | | | | | | | This patch removes endio_hook_pool from dm-thin and uses per-bio data instead. This patch removes any use of map_info in preparation for the next patch that removes map_info from bio-based device mapper. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm raid1: dont use map_contextMikulas Patocka2012-12-211-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't use map_info any more in dm-raid1. map_info was used for writes to hold the region number. For this purpose we add a new field dm_bio_details to dm_raid1_bio_record. map_info was used for reads to hold a pointer to dm_raid1_bio_record (if the pointer was non-NULL, bio details were saved; if the pointer was NULL, bio details were not saved). We use dm_raid1_bio_record.details->bi_bdev for this purpose. If bi_bdev is NULL, details were not saved, if bi_bdev is non-NULL, details were saved. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm flakey: dont use map_contextMikulas Patocka2012-12-211-4/+11
| | | | | | | | | | | | | | Replace map_info with a per-bio structure "struct per_bio_data" in dm-flakey. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm raid1: rename read_record to bio_recordMikulas Patocka2012-12-211-11/+11
| | | | | | | | | | | | | | | | | | | | Rename struct read_record to bio_record in dm-raid1. In the following patch, the structure will be used for both read and write bios, so rename it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: move target request nr to dm_target_ioMikulas Patocka2012-12-213-4/+5
| | | | | | | | | | | | | | | | | | | | This patch moves target_request_nr from map_info to dm_target_io and makes it accessible with dm_bio_get_target_request_nr. This patch is a preparation for the next patch that removes map_info. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: use per_bio_dataMikulas Patocka2012-12-211-35/+8
| | | | | | | | | | | | | | Replace tracked_chunk_pool with per_bio_data in dm-snap. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm verity: use per_bio_dataMikulas Patocka2012-12-211-18/+4
| | | | | | | | | | | | | | Replace io_mempool with per_bio_data in dm-verity. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm raid1: use per_bio_dataMikulas Patocka2012-12-211-34/+5
| | | | | | | | | | | | | | Replace read_record_pool with per_bio_data in dm-raid1. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: introduce per_bio_dataMikulas Patocka2012-12-213-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a field per_bio_data_size in struct dm_target. Targets can set this field in the constructor. If a target sets this field to a non-zero value, "per_bio_data_size" bytes of auxiliary data are allocated for each bio submitted to the target. These data can be used for any purpose by the target and help us improve performance by removing some per-target mempools. Per-bio data is accessed with dm_per_bio_data. The argument data_size must be the same as the value per_bio_data_size in dm_target. If the target has a pointer to per_bio_data, it can get a pointer to the bio with dm_bio_from_per_bio_data() function (data_size must be the same as the value passed to dm_per_bio_data). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm kcopyd: add WRITE SAME support to dm_kcopyd_zeroMike Snitzer2012-12-213-10/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add WRITE SAME support to dm-io and make it accessible to dm_kcopyd_zero(). dm_kcopyd_zero() provides an asynchronous interface whereas the blkdev_issue_write_same() interface is synchronous. WRITE SAME is a SCSI command that can be leveraged for more efficient zeroing of a specified logical extent of a device which supports it. Only a single zeroed logical block is transfered to the target for each WRITE SAME and the target then writes that same block across the specified extent. The dm thin target uses this. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm linear: add WRITE SAME supportMike Snitzer2012-12-211-1/+2
| | | | | | | | | | | | | | | | The linear target can already support WRITE SAME requests so signal this by setting num_write_same_requests to 1. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: add WRITE SAME supportMike Snitzer2012-12-211-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | WRITE SAME bios have a payload that contain a single page. When cloning WRITE SAME bios DM has no need to modify the bi_io_vec attributes (and doing so would be detrimental). DM need only alter the start and end of the WRITE SAME bio accordingly. Rather than duplicate __clone_and_map_discard, factor out a common function that is also used by __clone_and_map_write_same. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: prepare to support WRITE SAMEMike Snitzer2012-12-211-1/+29
| | | | | | | | | | | | | | | | | | | | | | Allow targets to opt in to WRITE SAME support by setting 'num_write_same_requests' in the dm_target structure. A dm device will only advertise WRITE SAME support if all its targets and all its underlying devices support it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: use kmalloc if possibleMikulas Patocka2012-12-211-13/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the parameter buffer is small enough, try to allocate it with kmalloc() rather than vmalloc(). vmalloc is noticeably slower than kmalloc because it has to manipulate page tables. In my tests, on PA-RISC this patch speeds up activation 13 times. On Opteron this patch speeds up activation by 5%. This patch introduces a new function free_params() to free the parameters and this uses new flags that record whether or not vmalloc() was used and whether or not the input buffer must be wiped after use. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: remove PF_MEMALLOCMikulas Patocka2012-12-211-9/+4
| | | | | | | | | | | | | | | | When allocating memory for the userspace ioctl data, set some appropriate GPF flags directly instead of using PF_MEMALLOC. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: improve improve space map block alloc failure messageJoe Thornber2012-12-211-1/+1
| | | | | | | | | | | | | | | | Improve space map error message when unable to allocate a new metadata block. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: use DMERR_LIMIT for errorsMike Snitzer2012-12-211-10/+15
| | | | | | | | | | | | | | Throttle all errors logged from the IO path by dm thin. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: use DMERR_LIMIT for errorsMike Snitzer2012-12-213-21/+20
| | | | | | | | | | | | | | | | | | Nearly all of persistent-data is in the IO path so throttle error messages with DMERR_LIMIT to limit the amount logged when something has gone wrong. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm block manager: reinstate message when validator failsMike Snitzer2012-12-211-1/+4
| | | | | | | | | | | | | | | | | | Reinstate a useful error message when the block manager buffer validator fails. This was mistakenly eliminated when the block manager was converted to use dm-bufio. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm raid: round region_size to power of twoJonathan Brassow2012-12-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | If the user does not supply a bitmap region_size to the dm raid target, a reasonable size is computed automatically. If this is not a power of 2, the md code will report an error later. This patch catches the problem early and rounds the region_size to the next power of two. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: cleanup dead codeJoe Thornber2012-12-211-14/+5
| | | | | | | | | | | | | | | | | | Remove unused @data_block parameter from cell_defer. Change thin_bio_map to use many returns rather than setting a variable. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: rename cell_defer_except to cell_defer_no_holderJoe Thornber2012-12-211-21/+21
| | | | | | | | | | | | | | | | | | Rename cell_defer_except() to cell_defer_no_holder() which describes its function more clearly. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: optimize track_chunkMikulas Patocka2012-12-211-3/+2
| | | | | | | | | | | | | | | | | | | | track_chunk is always called with interrupts enabled. Consequently, we do not need to save and restore interrupt state in "flags" variable. This patch changes spin_lock_irqsave to spin_lock_irq and spin_unlock_irqrestore to spin_unlock_irq. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm raid: use DM_ENDIO_INCOMPLETEMikulas Patocka2012-12-211-1/+1
| | | | | | | | | | | | | | Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm raid1: remove impossible mempool_alloc error testMikulas Patocka2012-12-211-5/+3
| | | | | | | | | | | | | | | | mempool_alloc can't fail if __GFP_WAIT is specified, so the condition that tests if read_record is non-NULL is always true. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: emit ignore_discard in status when discards disabledMike Snitzer2012-12-211-2/+4
| | | | | | | | | | | | | | | | | | | | | | If "ignore_discard" is specified when creating the thin pool device then discard support is disabled for that device. The pool device's status should reflect this fact rather than stating "no_discard_passdown" (which implies discards are enabled but passdown is disabled). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: fix nested btree deletionJoe Thornber2012-12-212-3/+8
| | | | | | | | | | | | | | | | | | | | | | When deleting nested btrees, the code forgets to delete the innermost btree. The thin-metadata code serendipitously compensates for this by claiming there is one extra layer in the tree. This patch corrects both problems. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: wake worker when discard is preparedJoe Thornber2012-12-211-4/+7
| | | | | | | | | | | | | | | | | | | | When discards are prepared it is best to directly wake the worker that will process them. The worker will be woken anyway, via periodic commit, but there is no reason to not wake_worker here. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: fix race between simultaneous io and discards to same blockJoe Thornber2012-12-211-25/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race when discard bios and non-discard bios are issued simultaneously to the same block. Discard support is expensive for all thin devices precisely because you have to be careful to quiesce the area you're discarding. DM thin must handle this conflicting IO pattern (simultaneous non-discard vs discard) even though a sane application shouldn't be issuing such IO. The race manifests as follows: 1. A non-discard bio is mapped in thin_bio_map. This doesn't lock out parallel activity to the same block. 2. A discard bio is issued to the same block as the non-discard bio. 3. The discard bio is locked in a dm_bio_prison_cell in process_discard to lock out parallel activity against the same block. 4. The non-discard bio's mapping continues and its all_io_entry is incremented so the bio is accounted for in the thin pool's all_io_ds which is a dm_deferred_set used to track time locality of non-discard IO. 5. The non-discard bio is finally locked in a dm_bio_prison_cell in process_bio. The race can result in deadlock, leaving the block layer hanging waiting for completion of a discard bio that never completes, e.g.: INFO: task ruby:15354 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ruby D ffffffff8160f0e0 0 15354 15314 0x00000000 ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900 ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900 ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0 Call Trace: [<ffffffff814e5a19>] schedule+0x29/0x70 [<ffffffff814e3d85>] schedule_timeout+0x195/0x220 [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod] [<ffffffff814e589e>] wait_for_common+0x11e/0x190 [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0 [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20 [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260 [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0 [<ffffffff8119a65c>] block_ioctl+0x3c/0x40 [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340 [<ffffffff8119a547>] ? block_llseek+0x67/0xb0 [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0 [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0 [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b The thinp-test-suite's test_discard_random_sectors reliably hits this deadlock on fast SSD storage. The fix for this race is that the all_io_entry for a bio must be incremented whilst the dm_bio_prison_cell is held for the bio's associated virtual and physical blocks. That cell locking wasn't occurring early enough in thin_bio_map. This patch fixes this. Care is taken to always call the new function inc_all_io_entry() with the relevant cells locked, but they are generally unlocked before calling issue() to try to avoid holding the cells locked across generic_submit_request. Also, now that thin_bio_map may lock bios in a cell, process_bio() is no longer the only thread that will do so. Because of this we must be sure to use cell_defer_except() to release all non-holder entries, that were added by the other thread, because they must be deferred. This patch depends on "dm thin: replace dm_cell_release_singleton with cell_defer_except". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@vger.kernel.org
| * dm thin: replace dm_cell_release_singleton with cell_defer_exceptJoe Thornber2012-12-213-39/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change existing users of the function dm_cell_release_singleton to share cell_defer_except instead, and then remove the now-unused function. Everywhere that calls dm_cell_release_singleton, the bio in question is the holder of the cell. If there are no non-holder entries in the cell then cell_defer_except behaves exactly like dm_cell_release_singleton. Conversely, if there *are* non-holder entries then dm_cell_release_singleton must not be used because those entries would need to be deferred. Consequently, it is safe to replace use of dm_cell_release_singleton with cell_defer_except. This patch is a pre-requisite for "dm thin: fix race between simultaneous io and discards to same block". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: disable WRITE SAMEMike Snitzer2012-12-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WRITE SAME bios are not yet handled correctly by device-mapper so disable their use on device-mapper devices by setting max_write_same_sectors to zero. As an example, a ciphertext device is incompatible because the data gets changed according to the location at which it written and so the dm crypt target cannot support it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Cc: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: prevent unsafe change to dm_ioctl data_sizeAlasdair G Kergon2012-12-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Abort dm ioctl processing if userspace changes the data_size parameter after we validated it but before we finished copying the data buffer from userspace. The dm ioctl parameters are processed in the following sequence: 1. ctl_ioctl() calls copy_params(); 2. copy_params() makes a first copy of the fixed-sized portion of the userspace parameters into the local variable "tmp"; 3. copy_params() then validates tmp.data_size and allocates a new structure big enough to hold the complete data and copies the whole userspace buffer there; 4. ctl_ioctl() reads userspace data the second time and copies the whole buffer into the pointer "param"; 5. ctl_ioctl() reads param->data_size without any validation and stores it in the variable "input_param_size"; 6. "input_param_size" is further used as the authoritative size of the kernel buffer. The problem is that userspace code could change the contents of user memory between steps 2 and 4. In particular, the data_size parameter can be changed to an invalid value after the kernel has validated it. This lets userspace force the kernel to access invalid kernel memory. The fix is to ensure that the size has not changed at step 4. This patch shouldn't have a security impact because CAP_SYS_ADMIN is required to run this code, but it should be fixed anyway. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
| * dm persistent data: rename node to btree_nodeMikulas Patocka2012-12-214-47/+47
| | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a compilation failure on sparc32 by renaming struct node. struct node is already defined in include/linux/node.h. On sparc32, it happens to be included through other dependencies and persistent-data doesn't compile because of conflicting declarations. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | Merge tag 'md-3.8' of git://neil.brown.name/mdLinus Torvalds2012-12-183-130/+171
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md update from Neil Brown: "Mostly just little fixes. Probably biggest part is AVX accelerated RAID6 calculations." * tag 'md-3.8' of git://neil.brown.name/md: md/raid5: add blktrace calls md/raid5: use async_tx_quiesce() instead of open-coding it. md: Use ->curr_resync as last completed request when cleanly aborting resync. lib/raid6: build proper files on corresponding arch lib/raid6: Add AVX2 optimized gen_syndrome functions lib/raid6: Add AVX2 optimized recovery functions md: Update checkpoint of resync/recovery based on time. md:Add place to update ->recovery_cp. md.c: re-indent various 'switch' statements. md: close race between removing and adding a device. md: removed unused variable in calc_sb_1_csm.
| * | md/raid5: add blktrace callsNeilBrown2012-12-181-3/+35
| | | | | | | | | | | | | | | | | | This makes it easier to trace what raid5 is doing. Signed-off-by: NeilBrown <neilb@suse.de>
| * | md/raid5: use async_tx_quiesce() instead of open-coding it.NeilBrown2012-12-131-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handle_stripe_expansion contains: if (tx) { async_tx_ack(tx); dma_wait_for_async_tx(tx); } which is very similar to the body of async_tx_quiesce(), except that the later handles an error from dma_wait_for_async_tx() (admittedly by panicing, but that decision belongs in the dma code, not the md code). So just us async_tx_quiesce(). Acked-by: Dan Williams <djbw@fb.com> Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | md: Use ->curr_resync as last completed request when cleanly aborting resync.majianpeng2012-12-132-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a resync is aborted cleanly, ->curr_resync is a reliable record of where we got up to. If there was an error it is less reliable but we always know that ->curr_resync_completed is safe. So add a flag MD_RECOVERY_ERROR to differentiate between these cases and set recovery_cp accordingly. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | md: Update checkpoint of resync/recovery based on time.majianpeng2012-12-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | md will current only only checkpoint recovery or resync ever 1/16th of the device size. As devices get larger this can become a long time an so a lot of work that might need to be duplicated after a shutdown. So add a time-based checkpoint. Every 5 minutes limits the amount of duplicated effort to at most 5 minutes, and has almost zero impact on performance. [changelog entry re-written by NeilBrown] Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | md:Add place to update ->recovery_cp.kernelmail2012-12-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | In resyncing, recovery_cp only updated when resync aborted or completed. But in md drives,many place used it to judge.So add a place to update. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | md.c: re-indent various 'switch' statements.NeilBrown2012-12-111-119/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | Intent was unnecessarily deep. Also change one 'switch' which has a single case element, into an 'if'. Signed-off-by: NeilBrown <neilb@suse.de>
| * | md: close race between removing and adding a device.NeilBrown2012-12-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we remove a device from an md array, the final removal of the "dev-XX" sys entry is run asynchronously. If we then re-add that device immediately before the worker thread gets to run, we can end up trying to add the "dev-XX" sysfs entry back before it has been removed. So in both places where we add a device, call flush_workqueue(md_misc_wq); before taking the md lock (as holding the md lock can prevent removal to complete). Signed-off-by: NeilBrown <neilb@suse.de>
| * | md: removed unused variable in calc_sb_1_csm.NeilBrown2012-12-111-2/+1
| | | | | | | | | | | | | | | | | | | | | 'i' is unused. NeilBrown <neilb@suse.de>
* | | Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2012-12-175-50/+20
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver update from Jens Axboe: "Now that the core bits are in, here are the driver bits for 3.8. The branch contains: - A huge pile of drbd bits that were dumped from the 3.7 merge window. Following that, it was both made perfectly clear that there is going to be no more over-the-wall pulls and how the situation on individual pulls can be improved. - A few cleanups from Akinobu Mita for drbd and cciss. - Queue improvement for loop from Lukas. This grew into adding a generic interface for waiting/checking an even with a specific lock, allowing this to be pulled out of md and now loop and drbd is also using it. - A few fixes for xen back/front block driver from Roger Pau Monne. - Partition improvements from Stephen Warren, allowing partiion UUID to be used as an identifier." * 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits) drbd: update Kconfig to match current dependencies drbd: Fix drbdsetup wait-connect, wait-sync etc... commands drbd: close race between drbd_set_role and drbd_connect drbd: respect no-md-barriers setting also when changed online via disk-options drbd: Remove obsolete check drbd: fixup after wait_even_lock_irq() addition to generic code loop: Limit the number of requests in the bio list wait: add wait_event_lock_irq() interface xen-blkfront: free allocated page xen-blkback: move free persistent grants code block: partition: msdos: provide UUIDs for partitions init: reduce PARTUUID min length to 1 from 36 block: store partition_meta_info.uuid as a string cciss: use check_signature() cciss: cleanup bitops usage drbd: use copy_highpage drbd: if the replication link breaks during handshake, keep retrying drbd: check return of kmalloc in receive_uuids drbd: Broadcast sync progress no more often than once per second drbd: don't try to clear bits once the disk has failed ...
| * | | wait: add wait_event_lock_irq() interfaceLukas Czerner2012-11-305-50/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit moves the private wait_event_lock_irq() macro from MD to regular wait includes, introduces new macro wait_event_lock_irq_cmd() instead of using the old method with omitting cmd parameter which is ugly and makes a use of new macros in the MD. It also introduces the _interruptible_ variant. The use of new interface is when one have a special lock to protect data structures used in the condition, or one also needs to invoke "cmd" before putting it to sleep. All new macros are expected to be called with the lock taken. The lock is released before sleep and is reacquired afterwards. We will leave the macro with the lock held. Note to DM: IMO this should also fix theoretical race on waitqueue while using simultaneously wait_event_lock_irq() and wait_event() because of lack of locking around current state setting and wait queue removal. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-12-134-5/+5
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...
| * | | md: Fix typo in drivers/mdMasanari Iida2012-10-294-5/+5
| |/ / | | | | | | | | | | | | | | | | | | Correct spelling typo in drivers/md. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>