linux - linux

	Commit message (Collapse)	Author	Files	Lines
2011-08-04	tmpfs: use kmemdup for short symlinks	Hugh Dickins	2	-21/+21
	But we've not yet removed the old swp_entry_t i_direct[16] from shmem_inode_info. That's because it was still being shared with the inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode size), and use kmemdup() for short symlinks, say, those up to 128 bytes. I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode() rather than shmem_evict_inode(), where we usually do such freeing? I guess it doesn't matter, and I'm not into NUMA mpol testing right now. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: convert shmem_writepage and enable swap	Hugh Dickins	1	-51/+37
	Convert shmem_writepage() to use shmem_delete_from_page_cache() to use shmem_radix_tree_replace() to substitute swap entry for page pointer atomically in the radix tree. As with shmem_add_to_page_cache(), it's not entirely satisfactory to be copying such code from delete_from_swap_cache, but again judged easier to sell than making its other callers go through the extras. Remove the toy implementation's shmem_put_swap() and shmem_get_swap(), now unreferenced, and the hack to disable swap: it's now good to go. The way things have worked out, info->lock no longer helps to guard the shmem_swaplist: we increment swapped under shmem_swaplist_mutex only. That global mutex exclusion between shmem_writepage() and shmem_unuse() is not pretty, and we ought to find another way; but it's been forced on us by recent race discoveries, not a consequence of this patchset. And what has become of the WARN_ON_ONCE(1) free_swap_and_cache() if a swap entry was found already present? That's no longer possible, the (unknown) one inserting this page into filecache would hit the swap entry occupying that slot. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: convert mem_cgroup shmem to radix-swap	Hugh Dickins	4	-139/+20
	Remove mem_cgroup_shmem_charge_fallback(): it was only required when we had to move swappage to filecache with GFP_NOWAIT. Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by moving its call out from shmem_add_to_page_cache() to two of thats three callers. But leave it doing mem_cgroup_uncharge_cache_page() on error: although asymmetrical, it's easier for all 3 callers to handle. These two changes would also be appropriate if anyone were to start using shmem_read_mapping_page_gfp() with GFP_NOWAIT. Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test radix_tree_exceptional_entry() to get what it needs for itself. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: convert shmem_getpage_gfp to radix-swap	Hugh Dickins	1	-147/+112
	Convert shmem_getpage_gfp(), the engine-room of shmem, to expect page or swap entry returned from radix tree by find_lock_page(). Whereas the repetitive old method proceeded mainly under info->lock, dropping and repeating whenever one of the conditions needed was not met, now we can proceed without it, leaving shmem_add_to_page_cache() to check for a race. This way there is no need to preallocate a page, no need for an early radix_tree_preload(), no need for mem_cgroup_shmem_charge_fallback(). Move the error unwinding down to the bottom instead of repeating it throughout. ENOSPC handling is a little different from before: there is no longer any race between find_lock_page() and finding swap, but we can arrive at ENOSPC before calling shmem_recalc_inode(), which might occasionally discover freed space. Be stricter to check i_size before returning. info->lock is used for little but alloced, swapped, i_blocks updates. Move i_blocks updates out from under the max_blocks check, so even an unlimited size=0 mount can show accurate du. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: convert shmem_unuse_inode to radix-swap	Hugh Dickins	1	-26/+107
	Convert shmem_unuse_inode() to use a lockless gang lookup of the radix tree, searching for matching swap. This is somewhat slower than the old method: because of repeated radix tree descents, because of copying entries up, but probably most because the old method noted and skipped once a vector page was cleared of swap. Perhaps we can devise a use of radix tree tagging to achieve that later. shmem_add_to_page_cache() uses shmem_radix_tree_replace() to compensate for the lockless lookup by checking that the expected entry is in place, under lock. It is not very satisfactory to be copying this much from add_to_page_cache_locked(), but I think easier to sell than insisting that every caller of add_to_page_cache*() go through the extras. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: convert shmem_truncate_range to radix-swap	Hugh Dickins	1	-46/+146
	Disable the toy swapping implementation in shmem_writepage() - it's hard to support two schemes at once - and convert shmem_truncate_range() to a lockless gang lookup of swap entries along with pages, freeing both. Since the second loop tightens its noose until all entries of either kind have been squeezed out (and we shall make sure that there's not an instant when neither is visible), there is no longer a need for yet another pass below. shmem_radix_tree_replace() compensates for the lockless lookup by checking that the expected entry is in place, under lock, before replacing it. Here it just deletes, but will be used in later patches to substitute swap entry for page or page for swap entry. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: copy truncate_inode_pages_range	Hugh Dickins	1	-20/+79
	Bring truncate.c's code for truncate_inode_pages_range() inline into shmem_truncate_range(), replacing its first call (there's a followup call below, but leave that one, it will disappear next). Don't play with it yet, apart from leaving out the cleancache flush, and (importantly) the nrpages == 0 skip, and moving shmem_setattr()'s partial page preparation into its partial page handling. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: miscellaneous trivial cleanups	Hugh Dickins	3	-111/+109
	While it's at its least, make a number of boring nitpicky cleanups to shmem.c, mostly for consistency of variable naming. Things like "swap" instead of "entry", "pgoff_t index" instead of "unsigned long idx". And since everything else here is prefixed "shmem_", better change init_tmpfs() to shmem_init(). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tmpfs: demolish old swap vector support	Hugh Dickins	2	-700/+84
	The maximum size of a shmem/tmpfs file has been limited by the maximum size of its triple-indirect swap vector. With 4kB page size, maximum filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of that on a 64-bit kernel. (With 8kB page size, maximum filesize was just over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel, MAX_LFS_FILESIZE being then more restrictive than swap vector layout.) It's a shame that tmpfs should be more restrictive than ramfs, and this limitation has now been noticed. Add another level to the swap vector? No, it became obscure and hard to maintain, once I complicated it to make use of highmem pages nine years ago: better choose another way. Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then tmpfs would never have invented its own peculiar radix tree: we would have fitted swap entries into the common radix tree instead, in much the same way as we fit swap entries into page tables. And why should each file have a separate radix tree for its pages and for its swap entries? The swap entries are required precisely where and when the pages are not. We want to put them together in a single radix tree: which can then avoid much of the locking which was needed to prevent them from being exchanged underneath us. This also avoids the waste of memory devoted to swap vectors, first in the shmem_inode itself, then at least two more pages once a file grew beyond 16 data pages (pages accounted by df and du, but not by memcg). Allocated upfront, to avoid allocation when under swapping pressure, but pure waste when CONFIG_SWAP is not set - I have never spattered around the ifdefs to prevent that, preferring this move to sharing the common radix tree instead. There are three downsides to sharing the radix tree. One, that it binds tmpfs more tightly to the rest of mm, either requiring knowledge of swap entries in radix tree there, or duplication of its code here in shmem.c. I believe that the simplications and memory savings (and probable higher performance, not yet measured) justify that. Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix nodes that cannot be freed under memory pressure - whereas before it was the less precious highmem swap vector pages that could not be freed. I'm hoping that 64-bit has now been accessible for long enough, that the highmem argument has grown much less persuasive. Three, that swapoff is slower than it used to be on tmpfs files, since it's using a simple generic mechanism not tailored to it: I find this noticeable, and shall want to improve, but maybe nobody else will notice. So... now remove most of the old swap vector code from shmem.c. But, for the moment, keep the simple i_direct vector of 16 pages, with simple accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation to help mark where swap needs to be handled in subsequent patches. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	mm: let swap use exceptional entries	Hugh Dickins	3	-26/+66
	If swap entries are to be stored along with struct page pointers in a radix tree, they need to be distinguished as exceptional entries. Most of the handling of swap entries in radix tree will be contained in shmem.c, but a few functions in filemap.c's common code need to check for their appearance: find_get_page(), find_lock_page(), find_get_pages() and find_get_pages_contig(). So as not to slow their fast paths, tuck those checks inside the existing checks for unlikely radix_tree_deref_slot(); except for find_lock_page(), where it is an added test. And make it a BUG in find_get_pages_tag(), which is not applied to tmpfs files. A part of the reason for eliminating shmem_readpage() earlier, was to minimize the places where common code would need to allow for swap entries. The swp_entry_t known to swapfile.c must be massaged into a slightly different form when stored in the radix tree, just as it gets massaged into a pte_t when stored in page tables. In an i386 kernel this limits its information (type and page offset) to 30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum swapfile size of 128GB. Which is less than the 512GB we previously allowed with X86_PAE (where the swap entry can occupy the entire upper 32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and there's not a new limitation on 64-bit (where swap filesize is already limited to 16TB by a 32-bit page offset). Thirty areas of 128GB is probably still enough swap for a 64GB 32-bit machine. Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and enforce filesize limit in read_swap_header(), just as for ptes. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	radix_tree: exceptional entries and indices	Hugh Dickins	3	-15/+54
	A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its peculiar swap vector, instead keeping a file's swap entries in the same radix tree as its struct page pointers: thus saving memory, and simplifying its code and locking. This patch: The radix_tree is used by several subsystems for different purposes. A major use is to store the struct page pointers of a file's pagecache for memory management. But what if mm wanted to store something other than page pointers there too? The low bit of a radix_tree entry is already used to denote an indirect pointer, for internal use, and the unlikely radix_tree_deref_retry() case. Define the next bit as denoting an exceptional entry, and supply inline functions radix_tree_exception() to return non-0 in either unlikely case, and radix_tree_exceptional_entry() to return non-0 in the second case. If a subsystem already uses radix_tree with that bit set, no problem: it does not affect internal workings at all, but is defined for the convenience of those storing well-aligned pointers in the radix_tree. The radix_tree_gang_lookups have an implicit assumption that the caller can deduce the offset of each entry returned e.g. by the page->index of a struct page. But that may not be feasible for some kinds of item to be stored there. radix_tree_gang_lookup_slot() allow for an optional indices argument, output array in which to return those offsets. The same could be added to other radix_tree_gang_lookups, but for now keep it to the only one for which we need it. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	drivers/video/backlight/aat2870_bl.c: make it buildable as a module	Andrew Morton	1	-1/+1
	i386 allmodconfig: drivers/built-in.o: In function `aat2870_bl_remove': aat2870_bl.c:(.text+0x414f9): undefined reference to `backlight_device_unregister' drivers/built-in.o: In function `aat2870_bl_probe': aat2870_bl.c:(.text+0x418fc): undefined reference to `backlight_device_register' aat2870_bl.c:(.text+0x41a31): undefined reference to `backlight_device_unregiste Cc: Jin Park <jinyoungp@nvidia.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	drivers/video/backlight/aat2870_bl.c: fix setting max_current	Axel Lin	2	-3/+3
	- Current implementation tests wrong value for setting aat2870_bl->max_current. - In the current implementation, we cannot differentiate between 2 cases: a) if pdata->max_current is not set , or b) pdata->max_current is set to AAT2870_CURRENT_0_45 (which is also 0). Fix it by setting AAT2870_CURRENT_0_45 to be 1 and adjust the equation in aat2870_brightness() accordingly. Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Samuel Ortiz <sameo@linux.intel.com> Tested-by: Jin Park <jinyoungp@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	drivers/video/backlight/aat2870_bl.c: fix error checking for ↵	Axel Lin	1	-2/+2
	backlight_device_register backlight_device_register() returns ERR_PTR() on error. Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Jin Park <jinyoungp@nvidia.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	cris: add missing declaration of kgdb_init() and breakpoint()	WANG Cong	1	-0/+3
	Fix: arch/cris/arch-v10/kernel/irq.c:239: error: implicit declaration of function 'kgdb_init' arch/cris/arch-v10/kernel/irq.c:240: error: implicit declaration of function 'breakpoint' Declare these two functions. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	cris: fix the prototype of sync_serial_ioctl()	WANG Cong	1	-1/+1
	Fix: arch/cris/arch-v10/drivers/sync_serial.c:961: error: conflicting types for 'sync_serial_ioctl' Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	cris: fix a build error in sync_serial_open()	WANG Cong	1	-2/+2
	Fix: arch/cris/arch-v10/drivers/sync_serial.c:628: error: 'ret' undeclared (first use in this function) 'ret' should be 'err'. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	cris: fix a build error in kernel/fork.c	WANG Cong	1	-3/+3
	Fix this error: kernel/fork.c:267: error: implicit declaration of function 'alloc_thread_info_node' This is due to renaming alloc_thread_info() to alloc_thread_info_node(). [akpm@linux-foundation.org: coding-style fixes] Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	tpm_tis: fix build when ACPI is not enabled	Randy Dunlap	1	-1/+6
	Fix tpm_tis.c build when CONFIG_ACPI is not enabled by providing a stub function. Fixes many build errors/warnings: drivers/char/tpm/tpm_tis.c:89: error: dereferencing pointer to incomplete type drivers/char/tpm/tpm_tis.c:89: warning: type defaults to 'int' in declaration of 'type name' drivers/char/tpm/tpm_tis.c:89: error: request for member 'list' in something not a structure or union ... Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Leendert van Doorn <leendert@watson.ibm.com> Cc: James Morris <jmorris@namei.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODE	Johannes Weiner	1	-1/+1
	__GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes. It's supposed to be passed through from the page allocator to zone_statistics(), but it never gets there as gfp_allowed_mask is not wide enough and masks out the flag early in the allocation path. The result is an accounting glitch where successful NUMA allocations by-agent are not properly attributed as local. Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	ramoops: update module parameters	Sergiu Iordache	1	-0/+8
	Update the module parameters when platform data is used. This means that they can be read from /sys/module/ramoops/parameters in order to parse the memory area. Signed-off-by: Sergiu Iordache <sergiu@chromium.org> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	Documentation: add pointer to name_to_dev_t for root= values	Will Drewry	3	-3/+25
	Update kernel-parameters.txt to point users to the authoritative comment for name_to_dev_t. In addition, updates other places where some name_to_dev_t behavior was discussed. All other references to root= appear to be for explicit sample usage or just side comments when discussing other kernel parameters. Signed-off-by: Will Drewry <wad@chromium.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	fs/dcache.c: fix new kernel-doc warning	Randy Dunlap	1	-0/+1
	Fix new kernel-doc warning in fs/dcache.c: Warning(fs/dcache.c:797): No description found for parameter 'sb' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	ida: simplified functions for id allocation	Rusty Russell	2	-0/+71
	The current hyper-optimized functions are overkill if you simply want to allocate an id for a device. Create versions which use an internal lock. In followup patches, numerous drivers are converted to use this interface. Thanks to Tejun for feedback. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	taskstats: add_del_listener() should ignore !valid listeners	Oleg Nesterov	1	-1/+1
	When send_cpu_listeners() finds the orphaned listener it marks it as !valid and drops listeners->sem. Before it takes this sem for writing, s->pid can be reused and add_del_listener() can wrongly try to re-use this entry. Change add_del_listener() to check ->valid = T. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	taskstats: add_del_listener() shouldn't use the wrong node	Oleg Nesterov	1	-9/+7
	1. Commit 26c4caea9d69 "don't allow duplicate entries in listener mode" changed add_del_listener(REGISTER) so that "next_cpu:" can reuse the listener allocated for the previous cpu, this doesn't look exactly right even if minor. Change the code to kfree() in the already-registered case, this case is unlikely anyway so the extra kmalloc_node() shouldn't hurt but looke more correct and clean. 2. use the plain list_for_each_entry() instead of _safe() to scan listeners->list. 3. Remove the unneeded INIT_LIST_HEAD(&s->list), we are going to list_add(&s->list). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	rtc-omap: fix initialization of control register	Daniel Glöckner	1	-1/+1
	As the comment explains, the intention of the code is to clear the OMAP_RTC_CTRL_MODE_12_24 bit, but instead it only clears the OMAP_RTC_CTRL_SPLIT and OMAP_RTC_CTRL_AUTO_COMP bits, which should be kept. OMAP_RTC_CTRL_DISABLE, OMAP_RTC_CTRL_SET_32_COUNTER, OMAP_RTC_CTRL_TEST, and OMAP_RTC_CTRL_ROUND_30S are also better off being cleared. Signed-off-by: Daniel Glöckner <dg@emlix.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-04	fault-injection: add ability to export fault_attr in arbitrary directory	Akinobu Mita	7	-46/+33
	init_fault_attr_dentries() is used to export fault_attr via debugfs. But it can only export it in debugfs root directory. Per Forlin is working on mmc_fail_request which adds support to inject data errors after a completed host transfer in MMC subsystem. The fault_attr for mmc_fail_request should be defined per mmc host and export it in debugfs directory per mmc host like /sys/kernel/debug/mmc0/mmc_fail_request. init_fault_attr_dentries() doesn't help for mmc_fail_request. So this introduces fault_create_debugfs_attr() which is able to create a directory in the arbitrary directory and replace init_fault_attr_dentries(). [akpm@linux-foundation.org: extraneous semicolon, per Randy] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Per Forlin <per.forlin@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-03	tools/power turbostat: fit output into 80 columns on snb-ep	Len Brown	1	-23/+23
	Reduce columns for package number to 1. If you can afford more than 9 packages, you can also afford a terminal with more than 80 columns:-) Also shave a column also off the package C-states Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03	efivars: fix warnings when CONFIG_PSTORE=n	Tony Luck	1	-3/+3
	drivers/firmware/efivars.c:161: warning: ‘utf16_strlen’ defined but not used utf16_strlen() is only used inside CONFIG_PSTORE - make this "static inline" to shut the compiler up [thanks to hpa for the suggestion]. drivers/firmware/efivars.c:602: warning: initialization from incompatible pointer type Between v1 and v2 of this patch series we decided to make the "part" number unsigned - but missed fixing the stub version of efi_pstore_write() Acked-by: Matthew Garrett <mjg@redhat.com> Acked-by: Mike Waychison <mikew@google.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-08-02	ACPI: delete stale reference in kernel-parameters.txt	Len Brown	1	-2/+0
	Says for acpi= See also Documentation/power/pm.txt, pci=noacpi but this file does not exist Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-02	arch/tile/mm/init.c: trivial: use BUG_ON	Julia Lawall	1	-2/+1
	Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ -if (x) BUG(); +BUG_ON(x); @@ identifier x; @@ -if (!x) BUG(); +BUG_ON(!x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-08-02	ACPI: add missing _OSI strings	Shaohua Li	1	-1/+7
	Linux supports some optional features, but it should notify the BIOS about them via the _OSI method. Currently Linux doesn't notify any, which might make such features not work because the BIOS doesn't know about them. Jarosz has a system which needs this to make ACPI processor aggregator device work. Reported-by: "Jarosz, Sebastian" <sebastian.jarosz@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-02	ACPI: remove NID_INVAL	David Rientjes	1	-1/+0
	b552a8c56db8 ("ACPI: remove NID_INVAL") removed the left over uses of NID_INVAL, but didn't actually remove the definition. Remove it. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-02	thermal: make THERMAL_HWMON implementation fully internal	Jean Delvare	2	-47/+92
	THERMAL_HWMON is implemented inside the thermal_sys driver and has no effect on drivers implementing thermal zones, so they shouldn't see anything related to it in <linux/thermal.h>. Making the THERMAL_HWMON implementation fully internal has two advantages beyond the cleaner design: * This avoids rebuilding all thermal drivers if the THERMAL_HWMON implementation changes, or if CONFIG_THERMAL_HWMON gets enabled or disabled. * This avoids breaking the thermal kABI in these cases too, which should make distributions happy. The only drawback I can see is slightly higher memory fragmentation, as the number of kzalloc() calls will increase by one per thermal zone. But I doubt it will be a problem in practice, as I've never seen a system with more than two thermal zones. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Rene Herman <rene.herman@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-02	thermal: split hwmon lookup to a separate function	Jean Delvare	1	-6/+19
	We'll soon need to reuse it. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Rene Herman <rene.herman@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-02	thermal: hide CONFIG_THERMAL_HWMON	Jean Delvare	2	-15/+2
	It's about time to revert 16d752397301b9 ("thermal: Create CONFIG_THERMAL_HWMON=n"). Anybody running a kernel >= 2.6.40 would also be running a recent enough version of lm-sensors. Actually having CONFIG_THERMAL_HWMON is pretty convenient so instead of dropping it, we keep it but hide it. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Rene Herman <rene.herman@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-02	MAINTAINERS: Add keyword match for of_match_table to device tree section	Mark Brown	1	-0/+1
	If a patch is working with an of_match_table it's probably adding an OF binding to a driver in which case the binding ought to be reviewed by the device tree folks to make sure that things like the device matches are set up correctly. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-08-02	of: constify property name parameters for helper functions	Jamie Iles	2	-9/+13
	The helper functions for reading u32 integers, u32 arrays and strings should have the property name as a const pointer. Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-08-02	spi/pl022: remove function cannot exit	Linus Walleij	1	-8/+3
	The remove function in the PL022 driver cannot abort the remove function any way, so restructure the code so as not to make that assumption. Remove will now proceed no matter whether it can stop the transfer queue or not. Reported-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-08-02	dm table: set flush capability based on underlying devices	Mike Snitzer	2	-1/+43
	DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities regardless of whether or not a given DM device's underlying devices also advertised a need for them. Block's flush-merge changes from 2.6.39 have proven to be more costly for DM devices. Performance regressions have been reported even when DM's underlying devices do not advertise that they have a write cache. Fix the performance regressions by configuring a DM device's flushing capabilities based on those of the underlying devices' capabilities. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm crypt: optionally support discard requests	Milan Broz	2	-5/+65
	Add optional parameter field to dmcrypt table and support "allow_discards" option. Discard requests bypass crypt queue processing. Bio is simple remapped to underlying device. Note that discard will be never enabled by default because of security consequences. It is up to the administrator to enable it for encrypted devices. (Note that userspace cryptsetup does not understand new optional parameters yet. Support for this will come later. Until then, you should use 'dmsetup' to enable and disable this.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm raid: add md raid1 support	Jonathan Brassow	1	-10/+39
	Support the MD RAID1 personality through dm-raid. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm raid: support metadata devices	Jonathan Brassow	3	-24/+412
	Add the ability to parse and use metadata devices to dm-raid. Although not strictly required, without the metadata devices, many features of RAID are unavailable. They are used to store a superblock and bitmap. The role, or position in the array, of each device must be recorded in its superblock. This is to help with fault handling, array reshaping, and sanity checks. RAID 4/5/6 devices must be loaded in a specific order: in this way, the 'array_position' field helps validate the correctness of the mapping when it is loaded. It can be used during reshaping to identify which devices are added/removed. Fault handling is impossible without this field. For example, when a device fails it is recorded in the superblock. If this is a RAID1 device and the offending device is removed from the array, there must be a way during subsequent array assembly to determine that the failed device was the one removed. This is done by correlating the 'array_position' field and the bit-field variable 'failed_devices'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm raid: add write_mostly parameter	Jonathan Brassow	2	-2/+28
	Add the write_mostly parameter to RAID1 dm-raid tables. This allows the user to set the WriteMostly flag on a RAID1 device that should normally be avoided for read I/O. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm raid: add region_size parameter	Jonathan Brassow	2	-3/+83
	Allow the user to specify the region_size. Ensures that the supplied value meets md's constraints, viz. the number of regions does not exceed 2^21. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm raid: improve table parameters documentation	Jonathan Brassow	1	-46/+78
	Add more information about some dm-raid table parameters and clarify how parameters are printed when 'dmsetup table' is issued. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm ioctl: forbid multiple device specifiers	Mikulas Patocka	1	-0/+6
	Exactly one of name, uuid or device must be specified when referencing an existing device. This removes the ambiguity (risking the wrong device being updated) if two conflicting parameters were specified. Previously one parameter got used and any others were ignored silently. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm ioctl: introduce __get_dev_cell	Mikulas Patocka	1	-17/+24
	Move logic to find device based on major/minor number to a separate function __get_dev_cell (similar to __get_uuid_cell and __get_name_cell). This makes the function __find_device_hash_cell more straightforward. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02	dm ioctl: fill in device parameters in more ioctls	Mikulas Patocka	2	-29/+38
	Move parameter filling from find_device to __find_device_hash_cell. This patch causes ioctls using __find_device_hash_cell (DM_DEV_REMOVE_CMD, DM_DEV_SUSPEND_CMD - resume, DM_TABLE_CLEAR_CMD) to return device parameters, bringing them into line with the other ioctls. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>