summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'dm-3.9-changes' of ↵Linus Torvalds2013-03-0248-601/+8791
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper update from Alasdair G Kergon: "The main addition here is a long-desired target framework to allow an SSD to be used as a cache in front of a slower device. Cache tuning is delegated to interchangeable policy modules so these can be developed independently of the mechanics needed to shuffle the data around. Other than that, kcopyd users acquire a throttling parameter, ioctl buffer usage gets streamlined, more mempool reliance is reduced and there are a few other bug fixes and tidy-ups." * tag 'dm-3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (30 commits) dm cache: add cleaner policy dm cache: add mq policy dm: add cache target dm persistent data: add bitset dm persistent data: add transactional array dm thin: remove cells from stack dm bio prison: pass cell memory in dm persistent data: add btree_walk dm: add target num_write_bios fn dm kcopyd: introduce configurable throttling dm ioctl: allow message to return data dm ioctl: optimize functions without variable params dm ioctl: introduce ioctl_flags dm: merge io_pool and tio_pool dm: remove unused _rq_bio_info_cache dm: fix limits initialization when there are no data devices dm snapshot: add missing module aliases dm persistent data: set some btree fn parms const dm: refactor bio cloning dm: rename bio cloning functions ...
| * dm cache: add cleaner policyHeinz Mauelshagen2013-03-014-0/+479
| | | | | | | | | | | | | | | | | | | | A simple cache policy that writes back all data to the origin. This is used to decommission a dm cache by emptying it. Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm cache: add mq policyJoe Thornber2013-03-014-0/+1279
| | | | | | | | | | | | | | | | | | | | A cache policy that uses a multiqueue ordered by recent hit count to select which blocks should be promoted and demoted. This is meant to be a general purpose policy. It prioritises reads over writes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: add cache targetJoe Thornber2013-03-0113-0/+4718
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: add bitsetJoe Thornber2013-03-013-0/+329
| | | | | | | | | | | | | | Add a persistent bitset as a wrapper around dm-array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: add transactional arrayJoe Thornber2013-03-013-0/+975
| | | | | | | | | | | | | | Add a transactional array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: remove cells from stackJoe Thornber2013-03-013-23/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch takes advantage of the new bio-prison interface where the memory is now passed in rather than using a mempool in bio-prison. This allows the map function to avoid performing potentially-blocking allocations that could lead to deadlocks: We want to avoid the cell allocation that is done in bio_detain. (The potential for mempool deadlocks still remains in other functions that use bio_detain.) Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm bio prison: pass cell memory inJoe Thornber2013-03-013-101/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | Change the dm_bio_prison interface so that instead of allocating memory internally, dm_bio_detain is supplied with a pre-allocated cell each time it is called. This enables a subsequent patch to move the allocation of the struct dm_bio_prison_cell outside the thin target's mapping function so it can no longer block there. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: add btree_walkJoe Thornber2013-03-014-0/+69
| | | | | | | | | | | | | | | | Add dm_btree_walk to iterate through the contents of a btree. This will be used by the dm cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: add target num_write_bios fnAlasdair G Kergon2013-03-012-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a num_write_bios function to struct target. If an instance of a target sets this, it will be queried before the target's mapping function is called on a write bio, and the response controls the number of copies of the write bio that the target will receive. This provides a convenient way for a target to send the same data to more than one device. The new cache target uses this in writethrough mode, to send the data both to the cache and the backing device. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm kcopyd: introduce configurable throttlingMikulas Patocka2013-03-015-5/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows the administrator to reduce the rate at which kcopyd issues I/O. Each module that uses kcopyd acquires a throttle parameter that can be set in /sys/module/*/parameters. We maintain a history of kcopyd usage by each module in the variables io_period and total_period in struct dm_kcopyd_throttle. The actual kcopyd activity is calculated as a percentage of time equal to "(100 * io_period / total_period)". This is compared with the user-defined throttle percentage threshold and if it is exceeded, we sleep. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: allow message to return dataMikulas Patocka2013-03-012-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces enhanced message support that allows the device-mapper core to recognise messages that are common to all devices, and for messages to return data to userspace. Core messages are processed by the function "message_for_md". If the device mapper doesn't support the message, it is passed to the target driver. If the message returns data, the kernel sets the flag DM_MESSAGE_OUT_FLAG. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: optimize functions without variable paramsMikulas Patocka2013-03-012-21/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device-mapper ioctls receive and send data in a buffer supplied by userspace. The buffer has two parts. The first part contains a 'struct dm_ioctl' and has a fixed size. The second part depends on the ioctl and has a variable size. This patch recognises the specific ioctls that do not use the variable part of the buffer and skips allocating memory for it. In particular, when a device is suspended and a resume ioctl is sent, this now avoid memory allocation completely. The variable "struct dm_ioctl tmp" is moved from the function copy_params to its caller ctl_ioctl and renamed to param_kernel. It is used directly when the ioctl function doesn't need any arguments. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: introduce ioctl_flagsMikulas Patocka2013-03-011-23/+41
| | | | | | | | | | | | | | | | | | | | | | This patch introduces flags for each ioctl function. So far, one flag is defined, IOCTL_FLAGS_NO_PARAMS. It is set if the function processing the ioctl doesn't take or produce any parameters in the section of the data buffer that has a variable size. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: merge io_pool and tio_poolJun'ichi Nomura2013-03-011-49/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch merges io_pool and tio_pool into io_pool and cleans up related functions. Though device-mapper used to have 2 pools of objects for each dm device, the use of bioset frontbad for per-bio data has shrunk the number of pools to 1 for both bio-based and request-based device types. (See c0820cf5 "dm: introduce per_bio_data" and 94818742 "dm: Use bioset's front_pad for dm_rq_clone_bio_info") So dm no longer has to maintain 2 different pointers. No functional changes. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: remove unused _rq_bio_info_cacheJun'ichi Nomura2013-03-011-21/+10
| | | | | | | | | | | | | | | | Remove _rq_bio_info_cache, which is no longer used. No functional changes. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: fix limits initialization when there are no data devicesMike Christie2013-03-011-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm_calculate_queue_limits will first reset the provided limits to defaults using blk_set_stacking_limits; whereby defeating the purpose of retaining the original live table's limits -- as was intended via commit 3ae706561637331aa578e52bb89ecbba5edcb7a9 ("dm: retain table limits when swapping to new table with no devices"). Fix this improper limits initialization (in the no data devices case) by avoiding the call to dm_calculate_queue_limits. [patch header revised by Mike Snitzer] Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.6+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: add missing module aliasesMikulas Patocka2013-03-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Add module aliases so that autoloading works correctly if the user tries to activate "snapshot-origin" or "snapshot-merge" targets. Reference: https://bugzilla.redhat.com/889973 Reported-by: Chao Yang <chyang@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: set some btree fn parms constMike Snitzer2013-03-012-9/+9
| | | | | | | | | | | | | | Mark some constant parameters constant in some dm-btree functions. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: refactor bio cloningAlasdair G Kergon2013-03-011-68/+96
| | | | | | | | | | | | | | Refactor part of the bio splitting and cloning code to try to make it easier to understand. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: rename bio cloning functionsAlasdair G Kergon2013-03-011-32/+31
| | | | | | | | | | | | | | | | | | | | | | Rename functions involved in splitting and cloning bios. The sequence of functions is now: (1) __split_and_process* - entry point that selects the processing strategy (2) __send* - prepare the details for each bio needed and loop through them (3) __clone_and_map* - creates a clone and maps it Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: rename request variables to biosAlasdair G Kergon2013-03-0115-81/+81
| | | | | | | | | | | | | | | | | | | | Use 'bio' in the name of variables and functions that deal with bios rather than 'request' to avoid confusion with the normal block layer use of 'request'. No functional changes. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: clean up clone_bioAlasdair G Kergon2013-03-011-45/+57
| | | | | | | | | | | | | | | | | | | | | | Remove the no-longer-used struct bio_set argument from clone_bio and split_bvec. Use tio->ti in __map_bio() instead of passing in ti. Factor out some code for setting up cloned bios. Take target_request_nr as a parameter to alloc_tio(). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm persistent data: remove CONFIG_EXPERIMENTALKees Cook2013-03-011-1/+1
| | | | | | | | | | | | | | | | | | The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: remove CONFIG_EXPERIMENTALAlasdair G Kergon2013-03-011-12/+12
| | | | | | | | | | | | Remove EXPERIMENTAL from all existing device-mapper targets. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: use block_size_is_power_of_twoMike Snitzer2013-03-011-12/+13
| | | | | | | | | | | | | | | | | | | | | | Use block_size_is_power_of_two() rather than checking sectors_per_block_shift directly. Also introduce local pool variable in get_bio_block() to eliminate redundant tc->pool dereferences. No functional change. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm bufio: use WRITE_FLUSH instead of REQ_FLUSHMikulas Patocka2013-03-011-1/+1
| | | | | | | | | | | | | | | | Use WRITE_FLUSH instead of REQ_FLUSH for submitted requests to make it consistent with the rest of the kernel. There is no functional change. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm table: remove superfluous variable resetWang Sheng-Hui2013-03-011-1/+0
| | | | | | | | | | | | | | | | If allocation fails, the local var *t is not used any more after kfree. Don't need to reset it to NULL. Remove the unnecesary NULL set here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm thin: support a non power of 2 discard_granularityMike Snitzer2013-03-011-8/+1
| | | | | | | | | | | | | | | | | | | | | | Support a non-power-of-2 discard granularity in dm-thin, now that the block layer supports this(via 8dd2cb7e880d2f77fba53b523c99133ad5054cfd "block: discard granularity might not be power of 2" and 59771079c18c44e39106f0f30054025acafadb41 "blk: avoid divide-by-zero with zero discard granularity"). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: fix truncated status stringsMikulas Patocka2013-03-0113-115/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid returning a truncated table or status string instead of setting the DM_BUFFER_FULL_FLAG when the last target of a table fills the buffer. When processing a table or status request, the function retrieve_status calls ti->type->status. If ti->type->status returns non-zero, retrieve_status assumes that the buffer overflowed and sets DM_BUFFER_FULL_FLAG. However, targets don't return non-zero values from their status method on overflow. Most targets returns always zero. If a buffer overflow happens in a target that is not the last in the table, it gets noticed during the next iteration of the loop in retrieve_status; but if a buffer overflow happens in the last target, it goes unnoticed and erroneously truncated data is returned. In the current code, the targets behave in the following way: * dm-crypt returns -ENOMEM if there is not enough space to store the key, but it returns 0 on all other overflows. * dm-thin returns errors from the status method if a disk error happened. This is incorrect because retrieve_status doesn't check the error code, it assumes that all non-zero values mean buffer overflow. * all the other targets always return 0. This patch changes the ti->type->status function to return void (because most targets don't use the return code). Overflow is detected in retrieve_status: if the status method fills up the remaining space completely, it is assumed that buffer overflow happened. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: do not replace bioset for request based dmJun'ichi Nomura2013-03-011-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a regression introduced in v3.8, which causes oops like this when dm-multipath is used: general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff810fe754>] [<ffffffff810fe754>] mempool_free+0x24/0xb0 Call Trace: <IRQ> [<ffffffff81187417>] bio_put+0x97/0xc0 [<ffffffffa02247a5>] end_clone_bio+0x35/0x90 [dm_mod] [<ffffffff81185efd>] bio_endio+0x1d/0x30 [<ffffffff811f03a3>] req_bio_endio.isra.51+0xa3/0xe0 [<ffffffff811f2f68>] blk_update_request+0x118/0x520 [<ffffffff811f3397>] blk_update_bidi_request+0x27/0xa0 [<ffffffff811f343c>] blk_end_bidi_request+0x2c/0x80 [<ffffffff811f34d0>] blk_end_request+0x10/0x20 [<ffffffffa000b32b>] scsi_io_completion+0xfb/0x6c0 [scsi_mod] [<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod] [<ffffffffa000b12f>] scsi_softirq_done+0x13f/0x160 [scsi_mod] [<ffffffff811f9fd0>] blk_done_softirq+0x80/0xa0 [<ffffffff81044551>] __do_softirq+0xf1/0x250 [<ffffffff8142ee8c>] call_softirq+0x1c/0x30 [<ffffffff8100420d>] do_softirq+0x8d/0xc0 [<ffffffff81044885>] irq_exit+0xd5/0xe0 [<ffffffff8142f3e3>] do_IRQ+0x63/0xe0 [<ffffffff814257af>] common_interrupt+0x6f/0x6f <EOI> [<ffffffffa021737c>] srp_queuecommand+0x8c/0xcb0 [ib_srp] [<ffffffffa0002f18>] scsi_dispatch_cmd+0x148/0x310 [scsi_mod] [<ffffffffa000a38e>] scsi_request_fn+0x31e/0x520 [scsi_mod] [<ffffffff811f1e57>] __blk_run_queue+0x37/0x50 [<ffffffff811f1f69>] blk_delay_work+0x29/0x40 [<ffffffff81059003>] process_one_work+0x1c3/0x5c0 [<ffffffff8105b22e>] worker_thread+0x15e/0x440 [<ffffffff8106164b>] kthread+0xdb/0xe0 [<ffffffff8142db9c>] ret_from_fork+0x7c/0xb0 The regression was introduced by the change c0820cf5 "dm: introduce per_bio_data", where dm started to replace bioset during table replacement. For bio-based dm, it is good because clone bios do not exist during the table replacement. For request-based dm, however, (not-yet-mapped) clone bios may stay in request queue and survive during the table replacement. So freeing the old bioset could cause the oops in bio_put(). Since the size of front_pad may change only with bio-based dm, it is not necessary to replace bioset for request-based dm. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | Merge branch 'for-next' of ↵Linus Torvalds2013-03-025-11/+15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target patches from Nicholas Bellinger: "Here are the remaining target-pending patches for v3.9-rc1. The most important one here is the immediate queue starvation regression fix for iscsi-target, which addresses a bug that's effecting v3.5+ kernels under heavy sustained READ only workloads. Thanks alot to Benjamin Estrabaud for helping to track this down! Also included is a pSCSI exception bugfix from Asias, along with a handful of other minor changes. Both bugfixes are CC'ed to stable." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target/pscsi: Rename sg_num to nr_vecs in pscsi_get_bio() target/pscsi: Fix page increment target/pscsi: Drop unnecessary NULL assignment to bio->bi_next target: Add __exit annotation for module_exit functions iscsi-target: Fix immediate queue starvation regression with DATAIN
| * | target/pscsi: Rename sg_num to nr_vecs in pscsi_get_bio()Asias He2013-02-281-2/+2
| | | | | | | | | | | | | | | | | | | | | It is actually a vector not a sg, so nr_vecs is better than sg_num. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| * | target/pscsi: Fix page incrementAsias He2013-02-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The page++ is wrong. It makes bio_add_pc_page() pointing to a wrong page address if the 'while (len > 0 && data_len > 0) { ... }' loop is executed more than one once. Signed-off-by: Asias He <asias@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| * | target/pscsi: Drop unnecessary NULL assignment to bio->bi_nextAsias He2013-02-281-2/+0
| | | | | | | | | | | | | | | Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| * | target: Add __exit annotation for module_exit functionsAsias He2013-02-284-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inclues sbp_exit, fileio_module_exit, iblock_module_exit and pscsi_module_exit. Note: rd_module_exit() can not be annotated by __exit, becasue it is called by target_core_init_configfs() which is annotated by __init. Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
| * | iscsi-target: Fix immediate queue starvation regression with DATAINNicholas Bellinger2013-02-281-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses a v3.5+ regression in iscsi-target where TX thread process context -> handle_response_queue() execution is allowed to run unbounded while servicing constant outgoing flow of ISTATE_SEND_DATAIN response state. This ends up preventing memory release of StatSN acknowledged commands in a timely manner when under heavy large block streaming DATAIN workloads. The regression bug was initially introduced with: commit 6f3c0e69a9c20441bdc6d3b2d18b83b244384ec6 Author: Andy Grover <agrover@redhat.com> Date: Tue Apr 3 15:51:09 2012 -0700 target/iscsi: Refactor target_tx_thread immediate+response queue loops Go ahead and follow original iscsi_target_tx_thread() logic and check to break for immediate queue processing after each DataIN Sequence and/or Response PDU has been sent. Reported-by: Benjamin ESTRABAUD <be@mpstor.com> Cc: Andy Grover <agrover@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* | | Merge tag 'scsi-for-linus' of ↵Linus Torvalds2013-03-0283-1205/+4100
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is an assorted set of stragglers into the merge window with driver updates for qla2xxx, megaraid_sas, storvsc and ufs. It also includes pulls of the uapi tree (all the remaining SCSI pieces) and the fcoe tree (updates to fcoe and libfc)" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (81 commits) [SCSI] ufs: Separate PCI code into glue driver [SCSI] ufs: Segregate PCI Specific Code [SCSI] scsi: fix lpfc build when wmb() is defined as mb() [SCSI] storvsc: Handle dynamic resizing of the device [SCSI] storvsc: Restructure error handling code on command completion [SCSI] storvsc: avoid usage of WRITE_SAME [SCSI] aacraid: suppress two GCC warnings [SCSI] hpsa: check for dma_mapping_error in hpsa_passthru ioctls [SCSI] hpsa: reorganize error handling in hpsa_passthru_ioctl [SCSI] hpsa: check for dma_mapping_error in hpsa_map_sg_chain_block [SCSI] hpsa: Check for dma_mapping_error for all code paths using fill_cmd [SCSI] hpsa: Check for dma_mapping_error in hpsa_map_one [SCSI] dc395x: uninitialized variable in device_alloc() [SCSI] Fix range check in scsi_host_dif_capable() [SCSI] storvsc: Initialize the sglist [SCSI] mpt2sas: Add support for OEM specific controller [SCSI] ipr: Fix oops while resetting an ipr adapter [SCSI] fnic: Fnic Trace Utility [SCSI] fnic: New debug flags and debug log messages [SCSI] fnic: fnic driver may hit BUG_ON on device reset ...
| * \ \ [SCSI] Merge tag 'fcoe-02-19-13' into for-linusJames Bottomley2013-03-0119-298/+822
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | FCoE Updates for 3.9 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * | | libfcoe: Check for unusable FCFs before looking for conflicting FCFsBhanu Prakash Gollapudi2013-02-191-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are multiple FCFs in the fabric, and one of them becomes unavailable, the fabric name for the unavailable FCF becomes 0 along with FIP_FL_AVAIL getting reset. In this case, FCF selection logic does not select any FCF as it first checks for conflicting FCFs (since fabric name is 0, it fails the condition), instead of first checking if it is usable or not. Fix it by first checking if FCF is usable and skip that FCF, and go to the next one in the list to check if it can be selected. Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | libfc: XenServer fails to mount root filesystem.Krishna Mohan2013-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | schedule_delayed_work() is using msec instead of jiffies. On PLOGI reject from target, remote port retry is scheduled @ 20 sec instead of 2sec(FC_DEF_E_D_TOV). XenServer dom0 kernel is configured with CONFIG_HZ=100Hz Signed-off-by: Krishna Mohan <krmohan@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | libfcoe: Handle CVL while waiting to select an FCFBhanu Prakash Gollapudi2013-02-121-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CVL is received while we wait to select best FCF, we drop it without handling it. This causes initiator and the switch to go out-of-sync. Initiator proceeds selecting one of the FCFs and tries to send FIP FLOGI. However the switch may reject the FLOGI, as it has cleared its internal state, and expects the initiator to start FIP discovery protocol. Fix this condition by resetting the fcoe controller. Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Reviewed-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | fcoe: Fix deadlock while deleting FCoE interface with NPIV portsNeerav Parikh2013-01-281-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes following deadlock caused by destroying of an FCoE interface with active NPIV ports on that interface. Call Trace: [<ffffffff814b7e88>] schedule+0x64/0x66 [<ffffffff814b6b4f>] schedule_timeout+0x36/0xe3 [<ffffffff81070c55>] ? update_curr+0xd6/0x110 [<ffffffff81071f6b>] ? hrtick_update+0x1b/0x4d [<ffffffff81072405>] ? dequeue_task_fair+0x1ca/0x1d9 [<ffffffff8106a369>] ? need_resched+0x1e/0x28 [<ffffffff814b7d14>] wait_for_common+0x9b/0xf1 [<ffffffff8106e7be>] ? try_to_wake_up+0x1e0/0x1e0 [<ffffffff814b7e22>] wait_for_completion+0x1d/0x1f [<ffffffff8105ae82>] flush_workqueue+0x116/0x2a1 [<ffffffff8105b357>] drain_workqueue+0x66/0x14c [<ffffffff8105b8ef>] destroy_workqueue+0x1a/0xcf [<ffffffffa009211e>] fc_remove_host+0x154/0x17f [scsi_transport_fc] [<ffffffffa00edbb8>] fcoe_if_destroy+0x184/0x1c9 [fcoe] [<ffffffffa00edc28>] fcoe_destroy_work+0x2b/0x44 [fcoe] [<ffffffff8105a82a>] process_one_work+0x1a8/0x2a4 [<ffffffffa00edbfd>] ? fcoe_if_destroy+0x1c9/0x1c9 [fcoe] [<ffffffff8105c396>] worker_thread+0x1db/0x268 [<ffffffff810604a3>] ? wake_up_bit+0x2a/0x2a [<ffffffff8105c1bb>] ? manage_workers.clone.16+0x1f6/0x1f6 [<ffffffff8105ffd6>] kthread+0x6f/0x77 [<ffffffff814c0304>] kernel_thread_helper+0x4/0x10 [<ffffffff8105ff67>] ? kthread_freezable_should_stop+0x4b/0x4b Call Trace: [<ffffffff814b7e88>] schedule+0x64/0x66 [<ffffffff814b8041>] schedule_preempt_disabled+0xe/0x10 [<ffffffff814b70a1>] __mutex_lock_common.clone.5+0x117/0x17a [<ffffffff814b7117>] __mutex_lock_slowpath+0x13/0x15 [<ffffffff814b6f76>] mutex_lock+0x23/0x37 [<ffffffff8125b890>] ? list_del+0x11/0x30 [<ffffffffa00edc84>] fcoe_vport_destroy+0x43/0x5f [fcoe] [<ffffffffa009130a>] fc_vport_terminate+0x48/0x110 [scsi_transport_fc] [<ffffffffa00913ef>] fc_vport_sched_delete+0x1d/0x79 [scsi_transport_fc] [<ffffffff8105a82a>] process_one_work+0x1a8/0x2a4 [<ffffffffa00913d2>] ? fc_vport_terminate+0x110/0x110 [scsi_transport_fc] [<ffffffff8105c396>] worker_thread+0x1db/0x268 [<ffffffff8105c1bb>] ? manage_workers.clone.16+0x1f6/0x1f6 [<ffffffff8105ffd6>] kthread+0x6f/0x77 [<ffffffff814c0304>] kernel_thread_helper+0x4/0x10 [<ffffffff8105ff67>] ? kthread_freezable_should_stop+0x4b/0x4b [<ffffffff814c0300>] ? gs_change+0x13/0x13 A prior attempt to fix this issue is posted here: http://lists.open-fcoe.org/pipermail/devel/2012-October/012318.html or http://article.gmane.org/gmane.linux.scsi.open-fcoe.devel/11924 Based on feedback and discussion with Neil Horman it seems that the above patch may have a case where the fcoe_vport_destroy() and fcoe_destroy_work() can race; hence that patch has been withdrawn with this patch that is trying to solve the same problem in a different way. In the current approach instead of removing the fcoe_config_mutex from the vport_delete callback function; I've chosen to delete all the NPIV ports first on a given root lport before continuing with the removal of the root lport. Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | fcoe: close race on link speed detection in fcoe codeNeil Horman2013-01-281-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating an fcoe interfce, we call fcoe_link_speed_update before we add the lports fcoe interface to the fc_hostlist. Since network device events like NETDEV_CHANGE are only processed if an fcoe interface is found with an underlying netdev that matches the netdev of the event. Since this processing in fcoe_device_notification is how link_speed changes get communicated to the libfc code (via fcoe_link_speed_update), we have a race condition - if a NETDEV_CHANGE event is sent after the call to fcoe_link_speed_update in fcoe_netdev_config, but before we add the interface to the fc_hostlist, we will loose the event and attributes like /sys/class/fc_host/hostX/speed will not get updated properly. Fix this by moving the add to the fc_hostlist above the serialized call to fcoe_netdev_config, ensuring that we catch netdev envents before we make a direct call to fcoe_link_speed_update. Also use this opportunity to clean up access to the fc_hostlist a bit by creating a fcoe_hostlist_del accessor and replacing the cleanup in fcoe_exit to use it properly. Tested by myself successfully [ Comment over 80 chars broken into multi-line by Robert Love to satisfy checkpatch.pl ] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | UAPI: (Scripted) Disintegrate include/scsi/fcDavid Howells2013-01-076-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
| | * | | debris left by "[SCSI] libfcoe: Remove mutex_trylock/restart_syscall checks"Al Viro2012-12-141-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AFAICS, the situation for fcoe_transport_disable() seems to be the same as for fcoe_transport_enable(). IOW, shouldn't it have restart_syscall() removed as well? I don't see any in-tree ->disable() instances that could return -ERESTARTSYS, anyway... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | bnx2fc: use fcoe_get_lesb/fcoe_ctlr_get_lesb() directly from libfcoeYi Zou2012-12-141-46/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop the bnx2fc_xxx versions as they are basically the same. Signed-off-by: Yi Zou <yi.zou@intel.com> Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | bnx2fc: use fcoe_link_speed_update() from the exported symbol in libfcoeYi Zou2012-12-141-31/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have fcoe_link_speed_update() in libfcoe ready for use now, take out the bnx2fc version which is almost the same. Signed-off-by: Yi Zou <yi.zou@intel.com> Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | bnx2fc: add support to get_netdev for bnx2f_interfaceYi Zou2012-12-141-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support to fcoe_port's newly added get_netdev fucntion pointer for bnx2fc. Signed-off-by: Yi Zou <yi.zou@intel.com> Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com>
| | * | | libfcoe, fcoe: consolidate the fcoe_ctlr_get_lesb/fcoe_get_lesbYi Zou2012-12-143-40/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly they can be moved into libfcoe instead of being private to fcoe now. Also add comments particularly on the term LESB to the corresponding function. Signed-off-by: Yi Zou <yi.zou@intel.com> Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com>