linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'dm-3.13-changes' of ↵	Linus Torvalds	2013-11-14	21	-467/+1544
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper changes from Mike Snitzer: "A set of device-mapper changes for 3.13. Improve reliability of buffer allocations for dm messages with a small number of arguments, a couple path group initialization fixes for dm multipath, a fix for resizing a dm array, various fixes and optimizations for dm cache, a fix for device mapper's Kconfig menu indentation. Features added include: - dm crypt support for activating legacy CBC TrueCrypt containers (useful for forensics of these old TCRYPT containers) - reduced dm-cache memory requirements for each block in the cache - basic support for shrinking a dm-cache's cache (fast) device - most notably, dm-cache support for managing cache coherency when deploying dm-cache with sophisticated origin volumes (that support hardware snapshots and/or clustering): these changes come in the form of a new passthrough operation mode and a cache block invalidation interface" * tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (32 commits) dm cache: resolve small nits and improve Documentation dm cache: add cache block invalidation support dm cache: add remove_cblock method to policy interface dm cache policy mq: reduce memory requirements dm cache metadata: check the metadata version when reading the superblock dm cache: add passthrough mode dm cache: cache shrinking support dm cache: promotion optimisation for writes dm cache: be much more aggressive about promoting writes to discarded blocks dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty() dm cache: optimize commit_if_needed dm space map disk: optimise sm_disk_dec_block MAINTAINERS: add reference to device-mapper's linux-dm.git tree dm: fix Kconfig menu indentation dm: allow remove to be deferred dm table: print error on preresume failure dm crypt: add TCW IV mode for old CBC TCRYPT containers dm crypt: properly handle extra key string in initialization dm cache: log error message if dm_kcopyd_copy() fails dm cache: use cell_defer() boolean argument consistently ...
\| *	dm cache: resolve small nits and improve Documentation	Mike Snitzer	2013-11-12	3	-12/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Document passthrough mode, cache shrinking, and cache invalidation. Also, use strcasecmp() and hlist_unhashed(). Reported-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: add cache block invalidation support	Joe Thornber	2013-11-11	2	-4/+233
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cache block invalidation is removing an entry from the cache without writing it back. Cache blocks can be invalidated via the 'invalidate_cblocks' message, which takes an arbitrary number of cblock ranges: invalidate_cblocks [<cblock>\|<cblock begin>-<cblock end>]* E.g. dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: add remove_cblock method to policy interface	Joe Thornber	2013-11-11	3	-4/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement policy_remove_cblock() and add remove_cblock method to the mq policy. These methods will be used by the following cache block invalidation patch which adds the 'invalidate_cblocks' message to the cache core. Also, update some comments in dm-cache-policy.h Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache policy mq: reduce memory requirements	Joe Thornber	2013-11-11	1	-312/+231
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than storing the cblock in each cache entry, we allocate all entries in an array and infer the cblock from the entry position. Saves 4 bytes of memory per cache block. In addition, this gives us an easy way of looking up cache entries by cblock. We no longer need to keep an explicit bitset to track which cblocks have been allocated. And no searching is needed to find free cblocks. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache metadata: check the metadata version when reading the superblock	Joe Thornber	2013-11-11	1	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Need to check the version to verify on-disk metadata is supported. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: add passthrough mode	Joe Thornber	2013-11-11	4	-38/+200
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"Passthrough" is a dm-cache operating mode (like writethrough or writeback) which is intended to be used when the cache contents are not known to be coherent with the origin device. It behaves as follows: * All reads are served from the origin device (all reads miss the cache) * All writes are forwarded to the origin device; additionally, write hits cause cache block invalidates This mode decouples cache coherency checks from cache device creation, largely to avoid having to perform coherency checks while booting. Boot scripts can create cache devices in passthrough mode and put them into service (mount cached filesystems, for example) without having to worry about coherency. Coherency that exists is maintained, although the cache will gradually cool as writes take place. Later, applications can perform coherency checks, the nature of which will depend on the type of the underlying storage. If coherency can be verified, the cache device can be transitioned to writethrough or writeback mode while still warm; otherwise, the cache contents can be discarded prior to transitioning to the desired operating mode. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: cache shrinking support	Joe Thornber	2013-11-11	2	-9/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow a cache to shrink if the blocks being removed from the cache are not dirty. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: promotion optimisation for writes	Joe Thornber	2013-11-10	1	-6/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a write block triggers promotion and covers a whole block we can avoid a copy. Introduce dm_{hook,unhook}_bio to simplify saving and restoring bio fields (bi_private is now used by overwrite). Switch writethrough support over to using these helpers too. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: be much more aggressive about promoting writes to discarded blocks	Joe Thornber	2013-11-10	1	-21/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously these promotions only got priority if there were unused cache blocks. Now we give them priority if there are any clean blocks in the cache. The fio_soak_test in the device-mapper-test-suite now gives uniform performance across subvolumes (~16 seconds). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty()	Joe Thornber	2013-11-10	2	-21/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are now two multiqueues for in cache blocks. A clean one and a dirty one. writeback_work comes from the dirty one. Demotions come from the clean one. There are two benefits: - Performance improvement, since demoting a clean block is a noop. - The cache cleans itself when io load is light. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: optimize commit_if_needed	Heinz Mauelshagen	2013-11-10	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check commit_requested flag _before_ calling dm_cache_changed_this_transaction() superfluously. Also, be sure to set last_commit_jiffies _after_ dm_cache_commit() completes. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm space map disk: optimise sm_disk_dec_block	Joe Thornber	2013-11-10	1	-17/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't waste time spotting blocks that have been allocated and then freed in the same transaction. The extra lookup is expensive, and I don't think it really gives us much. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	MAINTAINERS: add reference to device-mapper's linux-dm.git tree	Mike Snitzer	2013-11-10	1	-0/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm: fix Kconfig menu indentation	Mikulas Patocka	2013-11-10	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The option DM_LOG_USERSPACE is sub-option of DM_MIRROR, so place it right after DM_MIRROR. Doing so fixes various other Device mapper targets/features to be properly nested under "Device mapper support". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm: allow remove to be deferred	Mikulas Patocka	2013-11-10	4	-12/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows the removal of an open device to be deferred until it is closed. (Previously such a removal attempt would fail.) The deferred remove functionality is enabled by setting the flag DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or DM_REMOVE_ALL ioctl. On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if the device was removed immediately or flagged to be removed on close - if the flag is clear, the device was removed. On return from DM_DEV_STATUS and other ioctls, the flag DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on closure. A device that is scheduled to be deleted can be revived using the message "@cancel_deferred_remove". This message clears the DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm table: print error on preresume failure	Mike Snitzer	2013-11-10	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If preresume fails it is worth logging an error given that a device is left suspended due to the failure. This change was motivated by local preresume error logging that was added to the cache target ("preresume failed"). Elevating this target-agnostic context for the where the target-specific error occurred relative to the DM core's callouts makes sense. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com>
\| *	dm crypt: add TCW IV mode for old CBC TCRYPT containers	Milan Broz	2013-11-10	2	-4/+192
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers in LRW or XTS block encryption mode. TCRYPT containers prior to version 4.1 use CBC mode with some additional tweaks, this patch adds support for these containers. This new mode is implemented using special IV generator named TCW (TrueCrypt IV with whitening). TCW IV only supports containers that are encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and TripleDES). While this mode is legacy and is known to be vulnerable to some watermarking attacks (e.g. revealing of hidden disk existence) it can still be useful to activate old containers without using 3rd party software or for independent forensic analysis of such containers. (Both the userspace and kernel code is an independent implementation based on the format documentation and it completely avoids use of original source code.) The TCW IV generator uses two additional keys: Kw (whitening seed, size is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is always the IV size of the selected cipher). These keys are concatenated at the end of the main encryption key provided in mapping table. While whitening is completely independent from IV, it is implemented inside IV generator for simplification. The whitening value is always 16 bytes long and is calculated per sector from provided Kw as initial seed, xored with sector number and mixed with CRC32 algorithm. Resulting value is xored with ciphertext sector content. IV is calculated from the provided Kiv as initial IV seed and xored with sector number. Detailed calculation can be found in the Truecrypt documentation for version < 4.1 and will also be described on dm-crypt site, see: http://code.google.com/p/cryptsetup/wiki/DMCrypt The experimental support for activation of these containers is already present in git devel brach of cryptsetup. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm crypt: properly handle extra key string in initialization	Milan Broz	2013-11-10	1	-11/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some encryption modes use extra keys (e.g. loopAES has IV seed) which are not used in block cipher initialization but are part of key string in table constructor. This patch adds an additional field which describes the length of the extra key(s) and substracts it before real key encryption setting. The key_size always includes the size, in bytes, of the key provided in mapping table. The key_parts describes how many parts (usually keys) are contained in the whole key buffer. And key_extra_size contains size in bytes of additional keys part (this number of bytes must be subtracted because it is processed by the IV generator). \| K1 \| K2 \| .... \| K64 \| Kiv \| \|----------- key_size ----------------- \| \| \|-key_extra_size-\| \| [64 keys] \| [1 key] \| => key_parts = 65 Example where key string contains main key K, whitening key Kw and IV seed Kiv: \| K \| Kiv \| Kw \| \|--------------- key_size --------------\| \| \|-----key_extra_size------\| \| [1 key] \| [1 key] \| [1 key] \| => key_parts = 3 Because key_extra_size is calculated during IV mode setting, key initialization is moved after this step. For now, this change has no effect to supported modes (thanks to ilog2 rounding) but it is required by the following patch. Also, fix a sparse warning in crypt_iv_lmk_one(). Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: log error message if dm_kcopyd_copy() fails	Heinz Mauelshagen	2013-11-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A migration failure should be logged (albeit limited). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: use cell_defer() boolean argument consistently	Heinz Mauelshagen	2013-11-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a few cell_defer() calls that weren't passing a bool. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: return -EINVAL if the user specifies unknown cache policy	Mikulas Patocka	2013-11-10	2	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return -EINVAL when the specified cache policy is unknown rather than returning -ENOMEM. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache metadata: return bool from __superblock_all_zeroes	Joe Thornber	2013-11-10	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache policy mq: a few small fixes	Joe Thornber	2013-11-10	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename takeout_queue to concat_queue. Fix a harmless bug in mq policies pop() function. Currently pop() always succeeds, with up coming changes this wont be the case. Fix typo in comment above pre_cache_to_cache prototype. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache policy: remove return from void policy_remove_mapping	Joe Thornber	2013-11-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	No need to return from a void function. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: improve efficiency of quiescing flag management	Joe Thornber	2013-11-10	1	-22/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the quiescing flag an atomic_t and stop protecting it with a spin lock. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache: fix a race condition between queuing new migrations and quiescing ↵	Joe Thornber	2013-11-09	1	-14/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for a shutdown The code that was trying to do this was inadequate. The postsuspend method (in ioctl context), needs to wait for the worker thread to acknowledge the request to quiesce. Otherwise the migration count may drop to zero temporarily before the worker thread realises we're quiescing. In this case the target will be taken down, but the worker thread may have issued a new migration, which will cause an oops when it completes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
\| *	dm cache: io destined for the cache device can now serve as tick bios	Joe Thornber	2013-11-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously only origin bios could trigger ticks, which meant if all the io was destined for the cache no ticks were generated. If no ticks are generated then multiple hits, and movements in general, are attributed to the same tick. Only a stop gap fix, we need a better solution. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm cache policy mq: protect residency method with existing mutex	Joe Thornber	2013-11-09	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is safe to use a mutex in mq_residency() at this point since it is only called from ioctl context. But future-proof mq_residency() by using might_sleep() to catch new contexts that cannot sleep. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm array: fix bug in growing array	Joe Thornber	2013-11-05	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Entries would be lost if the old tail block was partially filled. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
\| *	dm mpath: requeue I/O during pg_init	Hannes Reinecke	2013-11-05	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When pg_init is running no I/O can be submitted to the underlying devices, as the path priority etc might change. When using queue_io for this, requests will be piling up within multipath as the block I/O scheduler just sees a _very fast_ device. All of this queued I/O has to be resubmitted from within multipathing once pg_init is done. This approach has the problem that it's virtually impossible to abort I/O when pg_init is running, and we're adding heavy load to the devices after pg_init since all of the queued I/O needs to be resubmitted _before_ any requests can be pulled off of the request queue and normal operation continues. This patch will requeue the I/O that triggers the pg_init call, and return 'busy' when pg_init is in progress. With these changes the block I/O scheduler will stop submitting I/O during pg_init, resulting in a quicker path switch and less I/O pressure (and memory consumption) after pg_init. Signed-off-by: Hannes Reinecke <hare@suse.de> [patch header edited for clarity and typos by Mike Snitzer] Signed-off-by: Mike Snitzer <snitzer@redhat.com>
\| *	dm mpath: fix race condition between multipath_dtr and pg_init_done	Shiva Krishna Merla	2013-11-01	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Whenever multipath_dtr() is happening we must prevent queueing any further path activation work. Implement this by adding a new 'pg_init_disabled' flag to the multipath structure that denotes future path activation work should be skipped if it is set. By disabling pg_init and then re-enabling in flush_multipath_work() we also avoid the potential for pg_init to be initiated while suspending an mpath device. Without this patch a race condition exists that may result in a kernel panic: 1) If after pg_init_done() decrements pg_init_in_progress to 0, a call to wait_for_pg_init_completion() assumes there are no more pending path management commands. 2) If pg_init_required is set by pg_init_done(), due to retryable mode_select errors, then process_queued_ios() will again queue the path activation work. 3) If free_multipath() completes before activate_path() work is called a NULL pointer dereference like the following can be seen when accessing members of the recently destructed multipath: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 RIP: 0010:[<ffffffffa003db1b>] [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath] [<ffffffff81090ac0>] worker_thread+0x170/0x2a0 [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40 [switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer] Signed-off-by: Shiva Krishna Merla <shivakrishna.merla@netapp.com> Reviewed-by: Krishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com> Tested-by: Speagle Andy <Andy.Speagle@netapp.com> Acked-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
\| *	dm: allocate buffer for messages with small number of arguments using GFP_NOIO	Mikulas Patocka	2013-10-31	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dm-mpath and dm-thin must process messages even if some device is suspended, so we allocate argv buffer with GFP_NOIO. These messages have a small fixed number of arguments. On the other hand, dm-switch needs to process bulk data using messages so excessive use of GFP_NOIO could cause trouble. The patch also lowers the default number of arguments from 64 to 8, so that there is smaller load on GFP_NOIO allocations. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* \|	Merge tag 'for-linus-20131112' of git://git.infradead.org/linux-mtd	Linus Torvalds	2013-11-14	65	-984/+925
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull MTD changes from Brian Norris: - Unify some compile-time differences so that we have fewer uses of #ifdef CONFIG_OF in atmel_nand - Other general cleanups (removing unused functions, options, variables, fields; use correct interfaces) - Fix BUG() for new odd-sized NAND, which report non-power-of-2 dimensions via ONFI - Miscellaneous driver fixes (SPI NOR flash; BCM47xx NAND flash; etc.) - Improve differentiation between SLC and MLC NAND -- this clarifies an ABI issue regarding the MTD "type" (in sysfs and in the MEMGETINFO ioctl), where the MTD_MLCNANDFLASH type was present but inconsistently used - Extend GPMI NAND to support multi-chip-select NAND for some platforms - Many improvements to the OMAP2/3 NAND driver, including an expanded DT binding to bring us closer to mainline support for some OMAP systems - Fix a deadlock in the error path of the Atmel NAND driver probe - Correct the error codes from MTD mmap() to conform to POSIX and the Linux Programmer's Manual. This is an acknowledged change in the MTD ABI, but I can't imagine somebody relying on the non-standard -ENOSYS error code specifically. Am I just being unimaginative? :) - Fix a few important GPMI NAND bugs (one regression from 3.12 and one long-standing race condition) - More? Read the log! * tag 'for-linus-20131112' of git://git.infradead.org/linux-mtd: (98 commits) mtd: gpmi: fix the NULL pointer mtd: gpmi: fix kernel BUG due to racing DMA operations mtd: mtdchar: return expected errors on mmap() call mtd: gpmi: only scan two chips for imx6 mtd: gpmi: Use devm_kzalloc() mtd: atmel_nand: fix bug driver will in a dead lock if no nand detected mtd: nand: use a local variable to simplify the nand_scan_tail mtd: nand: remove deprecated IRQF_DISABLED mtd: dataflash: Say if we find a device we don't support mtd: nand: omap: fix error return code in omap_nand_probe() mtd: nand_bbt: kill NAND_BBT_SCANALLPAGES mtd: m25p80: fixup device removal failure path mtd: mxc_nand: Include linux/of.h header mtd: remove duplicated include from mtdcore.c mtd: m25p80: add support for Macronix mx25l3255e mtd: nand: omap: remove selection of BCH ecc-scheme via KConfig mtd: nand: omap: updated devm_xx for all resource allocation and free calls mtd: nand: omap: use drivers/mtd/nand/nand_bch.c wrapper for BCH ECC instead of lib/bch.c mtd: nand: omap: clean-up ecc layout for BCH ecc schemes mtd: nand: omap2: clean-up BCHx_HW and BCHx_SW ECC configurations in device_probe ...
\| * \|	mtd: gpmi: fix the NULL pointer	Huang Shijie	2013-11-12	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The imx23 board will check the fingerprint, so it will call the mx23_check_transcription_stamp. This function will use @chip->buffers->databuf as its buffer which is allocated in the nand_scan_tail(). Unfortunately, the mx23_check_transcription_stamp is called before the nand_scan_tail(). So we will meet a NULL pointer bug: -------------------------------------------------------------------- [ 1.150000] NAND device: Manufacturer ID: 0xec, Chip ID: 0xd7 (Samsung NAND 4GiB 3,3V 8-bit), 4096MiB, page size: 4096, OOB size: 8 [ 1.160000] Unable to handle kernel NULL pointer dereference at virtual address 000005d0 [ 1.170000] pgd = c0004000 [ 1.170000] [000005d0] *pgd=00000000 [ 1.180000] Internal error: Oops: 5 [#1] ARM [ 1.180000] Modules linked in: [ 1.180000] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0 #89 [ 1.180000] task: c7440000 ti: c743a000 task.ti: c743a000 [ 1.180000] PC is at memcmp+0x10/0x54 [ 1.180000] LR is at gpmi_nand_probe+0x42c/0x894 [ 1.180000] pc : [<c025fcb0>] lr : [<c02f6a68>] psr: 20000053 [ 1.180000] sp : c743be2c ip : 600000d3 fp : ffffffff [ 1.180000] r10: 000005d0 r9 : c02f5f08 r8 : 00000000 [ 1.180000] r7 : c75858a8 r6 : c75858a8 r5 : c7585b18 r4 : c7585800 [ 1.180000] r3 : 000005d0 r2 : 00000004 r1 : c05c33e4 r0 : 000005d0 [ 1.180000] Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel [ 1.180000] Control: 0005317f Table: 40004000 DAC: 00000017 [ 1.180000] Process swapper (pid: 1, stack limit = 0xc743a1c0) -------------------------------------------------------------------- This patch rearrange the init procedure: Set the NAND_SKIP_BBTSCAN to skip the nand scan firstly, and after we set the proper settings, we will call the chip->scan_bbt() manually. Cc: stable@vger.kernel.org # 3.12 Signed-off-by: Huang Shijie <b32955@freescale.com> Reported-by: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: gpmi: fix kernel BUG due to racing DMA operations	Huang Shijie	2013-11-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[1] The gpmi uses the nand_command_lp to issue the commands to NAND chips. The gpmi issues a DMA operation with gpmi_cmd_ctrl when it handles a NAND_CMD_NONE control command. So when we read a page(NAND_CMD_READ0) from the NAND, we may send two DMA operations back-to-back. If we do not serialize the two DMA operations, we will meet a bug when 1.1) we enable CONFIG_DMA_API_DEBUG, CONFIG_DMADEVICES_DEBUG, and CONFIG_DEBUG_SG. 1.2) Use the following commands in an UART console and a SSH console: cmd 1: while true;do dd if=/dev/mtd0 of=/dev/null;done cmd 1: while true;do dd if=/dev/mmcblk0 of=/dev/null;done The kernel log shows below: ----------------------------------------------------------------- kernel BUG at lib/scatterlist.c:28! Unable to handle kernel NULL pointer dereference at virtual address 00000000 ......................... [<80044a0c>] (__bug+0x18/0x24) from [<80249b74>] (sg_next+0x48/0x4c) [<80249b74>] (sg_next+0x48/0x4c) from [<80255398>] (debug_dma_unmap_sg+0x170/0x1a4) [<80255398>] (debug_dma_unmap_sg+0x170/0x1a4) from [<8004af58>] (dma_unmap_sg+0x14/0x6c) [<8004af58>] (dma_unmap_sg+0x14/0x6c) from [<8027e594>] (mxs_dma_tasklet+0x18/0x1c) [<8027e594>] (mxs_dma_tasklet+0x18/0x1c) from [<8007d444>] (tasklet_action+0x114/0x164) ----------------------------------------------------------------- 1.3) Assume the two DMA operations is X (first) and Y (second). The root cause of the bug: Assume process P issues DMA X, and sleep on the completion @this->dma_done. X's tasklet callback is dma_irq_callback. It firstly wake up the process sleeping on the completion @this->dma_done, and then trid to unmap the scatterlist S. The waked process P will issue Y in another ARM core. Y initializes S->sg_magic to zero with sg_init_one(), while dma_irq_callback is unmapping S at the same time. See the diagram: ARM core 0 \| ARM core 1 ------------------------------------------------------------- (P issues DMA X, then sleep) --> \| \| (X's tasklet wakes P) --> \| \| \| <-- (P begin to issue DMA Y) \| (X's tasklet unmap the \| scatterlist S with dma_unmap_sg) --> \| <-- (Y calls sg_init_one() to init \| scatterlist S) \| [2] This patch serialize both the X and Y in the following way: Unmap the DMA scatterlist S firstly, and wake up the process at the end of the DMA callback, in such a way, Y will be executed after X. After this patch: ARM core 0 \| ARM core 1 ------------------------------------------------------------- (P issues DMA X, then sleep) --> \| \| (X's tasklet unmap the \| scatterlist S with dma_unmap_sg) --> \| \| (X's tasklet wakes P) --> \| \| \| <-- (P begin to issue DMA Y) \| \| <-- (Y calls sg_init_one() to init \| scatterlist S) \| Cc: stable@vger.kernel.org # 3.2 Signed-off-by: Huang Shijie <b32955@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: mtdchar: return expected errors on mmap() call	Vladimir Zapolskiy	2013-11-11	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According both to POSIX.1-2008 and Linux Programmer's Manual mmap() syscall shouldn't return undocumented ENOSYS, this change replaces the errno with more appropriate ENODEV and EACCESS. Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: gpmi: only scan two chips for imx6	Huang Shijie	2013-11-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We cannot scan two chips for imx23 and imx28: imx23: the Ready-Busy1 line is not connected for some board. imx28: we do not set the pinctrl for Ready-Busy1 So we only scan two chips for imx6. Signed-off-by: Huang Shijie <b32955@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: gpmi: Use devm_kzalloc()	Fabio Estevam	2013-11-07	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using devm_kzalloc() can make the code simpler. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Huang Shijie <b32955@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: atmel_nand: fix bug driver will in a dead lock if no nand detected	Josh Wu	2013-11-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the atmel driver probe function, the code shows like following: atmel_nand_probe(...) { ... err_nand_ioremap: platform_driver_unregister(&atmel_nand_nfc_driver); return res; } If no nand flash detected, the driver probe function will goto err_nand_ioremap label. Then platform_driver_unregister() will be called. It will get the lock of atmel_nand device since it is parent of nfc_device. The problem is the lock is already hold by atmel_nand_probe itself. So system will be in a dead lock. This patch just simply removed to platform_driver_unregister() call. When atmel_nand driver is quit the platform_driver_unregister() will be called in atmel_nand_remove(). [Brian: the NAND platform probe really has no business registering/unregistering another driver; this fixes the deadlock, but we should follow up the likely racy behavior here with a better architecture] Signed-off-by: Josh Wu <josh.wu@atmel.com> Cc: <stable@vger.kernel.org> # 3.12 Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: nand: use a local variable to simplify the nand_scan_tail	Huang Shijie	2013-11-07	1	-109/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are too many "chip->ecc" in the nand_scan_tail() which makes the eyes sore. This patch uses a local variable "ecc" to replace the "chip->ecc" to make the code more graceful. Do the code change with "s/chip->ecc\./ecc->/g" in the nand_scan_tail, and also change some lines by hand. Signed-off-by: Huang Shijie <b32955@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: nand: remove deprecated IRQF_DISABLED	Michael Opdenacker	2013-11-07	3	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch proposes to remove the use of the IRQF_DISABLED flag It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: dataflash: Say if we find a device we don't support	Mark Brown	2013-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that the error message if we identify a flash we don't know how to talk to is displayed on the console in order to aid diagnostics. While we're at convert the message to use dev_info() rather than our hand rolled version of it for consistency. Signed-off-by: Mark Brown <broonie@linaro.org> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: nand: omap: fix error return code in omap_nand_probe()	Wei Yongjun	2013-11-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix to return a negative error code from the error handling case instead of 0, to more closely match the rest of this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Acked-by: Pekon Gupta <pekon@ti.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: nand_bbt: kill NAND_BBT_SCANALLPAGES	Brian Norris	2013-11-07	3	-38/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that the last user of NAND_BBT_SCANALLPAGES has been removed, let's kill this peculiar BBT feature flag. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Reviewed-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
\| * \|	mtd: m25p80: fixup device removal failure path	Brian Norris	2013-11-07	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Device removal should fail if MTD unregistration fails. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Reviewed-by: Marek Vasut <marex@denx.de>
\| * \|	mtd: mxc_nand: Include linux/of.h header	Sachin Kamat	2013-11-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'of_match_ptr' is defined in linux/of.h. Include it explicitly to avoid build breakage in the future. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: remove duplicated include from mtdcore.c	Wei Yongjun	2013-11-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
\| * \|	mtd: m25p80: add support for Macronix mx25l3255e	Brian Norris	2013-11-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A new 32Mbit SPI NOR flash from Macronix. Nothing special. Signed-off-by: Brian Norris <computersforpeace@gmail.com> Reviewed-by: Marek Vasut <marex@denx.de>
\| * \|	mtd: nand: omap: remove selection of BCH ecc-scheme via KConfig	Pekon Gupta	2013-11-07	1	-34/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With OMAP NAND driver updates, selection of ecc-scheme: DT enabled kernel depends on ti,nand-ecc-opt and ti,elm-id DT bindings. Non DT enabled kernel depends on elm_dev and ecc-scheme passed along with platform-data from board file. So, selection of ecc-scheme (BCH8 or BCH4) from KConfig can be removed Signed-off-by: Pekon Gupta <pekon@ti.com> Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>