linux - linux

	Commit message (Collapse)	Author	Files	Lines
2013-12-21	aio/migratepages: make aio migrate pages sane	Benjamin LaHaise	3	-15/+53
	The arbitrary restriction on page counts offered by the core migrate_page_move_mapping() code results in rather suspicious looking fiddling with page reference counts in the aio_migratepage() operation. To fix this, make migrate_page_move_mapping() take an extra_count parameter that allows aio to tell the code about its own reference count on the page being migrated. While cleaning up aio_migratepage(), make it validate that the old page being passed in is actually what aio_migratepage() expects to prevent misbehaviour in the case of races. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-12-21	aio: fix kioctx leak introduced by "aio: Fix a trinity splat"	Benjamin LaHaise	1	-1/+2
	e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference counting to correct a bug trinity found. Unfortunately, the change lead to kioctxes being leaked because there was no final reference count to put. Add that reference count back in to fix things. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org
2013-12-21	Don't set the INITRD_COMPRESS environment variable automatically	Linus Torvalds	1	-1/+3
	Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression config option") started setting the INITRD_COMPRESS environment variable depending on which decompression models the kernel had available. That is completely broken. For example, we by default have CONFIG_RD_LZ4 enabled, and are able to decompress such an initrd, but the user tools to create such an initrd may not be availble. So trying to tell dracut to generate an lz4-compressed image just because we can decode such an image is completely inappropriate. Cc: J P <ppandit@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-21	mm: fix build of split ptlock code	Olof Johansson	1	-1/+1
	Commit 597d795a2a78 ('mm: do not allocate page->ptl dynamically, if spinlock_t fits to long') restructures some allocators that are compiled even if USE_SPLIT_PTLOCKS arn't used. It results in compilation failure: mm/memory.c:4282:6: error: 'struct page' has no member named 'ptl' mm/memory.c:4288:12: error: 'struct page' has no member named 'ptl' Add in the missing ifdef. Fixes: 597d795a2a78 ('mm: do not allocate page->ptl dynamically, if spinlock_t fits to long') Signed-off-by: Olof Johansson <olof@lixom.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-20	pstore: Don't allow high traffic options on fragile devices	Luck, Tony	4	-2/+10
	Some pstore backing devices use on board flash as persistent storage. These have limited numbers of write cycles so it is a poor idea to use them from high frequency operations. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-20	mm: do not allocate page->ptl dynamically, if spinlock_t fits to long	Kirill A. Shutemov	5	-7/+8
	In struct page we have enough space to fit long-size page->ptl there, but we use dynamically-allocated page->ptl if size(spinlock_t) is larger than sizeof(int). It hurts 64-bit architectures with CONFIG_GENERIC_LOCKBREAK, where sizeof(spinlock_t) == 8, but it easily fits into struct page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-20	mm: page_alloc: revert NUMA aspect of fair allocation policy	Johannes Weiner	1	-10/+9
	Commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") meant to bring aging fairness among zones in system, but it was overzealous and badly regressed basic workloads on NUMA systems. Due to the way kswapd and page allocator interacts, we still want to make sure that all zones in any given node are used equally for all allocations to maximize memory utilization and prevent thrashing on the highest zone in the node. While the same principle applies to NUMA nodes - memory utilization is obviously improved by spreading allocations throughout all nodes - remote references can be costly and so many workloads prefer locality over memory utilization. The original change assumed that zone_reclaim_mode would be a good enough predictor for that, but it turned out to be as indicative as a coin flip. Revert the NUMA aspect of the fairness until we can find a proper way to make it configurable and agree on a sane default. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: <stable@kernel.org> # 3.12 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-20	Revert "mm: page_alloc: exclude unreclaimable allocations from zone fairness ↵	Mel Gorman	1	-2/+1
	policy" This reverts commit 73f038b863df. The NUMA behaviour of this patch is less than ideal. An alternative approch is to interleave allocations only within local zones which is implemented in the next patch. Cc: stable@vger.kernel.org Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-20	mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support	Kirill A. Shutemov	1	-3/+2
	Sasha Levin found a NULL pointer dereference that is due to a missing page table lock, which in turn is due to the pmd entry in question being a transparent huge-table entry. The code - introduced in commit 1998cc048901 ("mm: make madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it turns out that that function doesn't work correctly. pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would trigger if the transparent hugepage bit was set, but it doesn't do that if pmd_numa() is also set. Note that the NUMA bit only gets set on real NUMA machines, so people trying to reproduce this on most normal development systems would never actually trigger this. Fix it by removing the very subtle (and subtly incorrect) expectation, and instead just checking pmd_trans_huge() explicitly. Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> [ Additionally remove the now stale test for pmd_trans_huge() inside the pmd_bad() case - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure	Nicholas Bellinger	1	-0/+1
	This patch fixes a possible scsi_host reference leak in qlt_lport_register(), when a non zero return from the passed (*callback) does not call drop the local reference via scsi_host_put() before returning. This currently does not effect existing tcm_qla2xxx code as the passed callback will never fail, but fix this up regardless for future code. Cc: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19	target: Remove extra percpu_ref_init	Andy Grover	1	-7/+1
	lun->lun_ref is also initialized in core_tpg_post_addlun, so it doesn't need to be done in core_tpg_setup_virtual_lun0. (nab: Drop left-over percpu_ref_cancel_init in failure path) Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19	arm64: ptrace: avoid using HW_BREAKPOINT_EMPTY for disabled events	Will Deacon	1	-20/+18
	Commit 8f34a1da35ae ("arm64: ptrace: use HW_BREAKPOINT_EMPTY type for disabled breakpoints") fixed an issue with GDB trying to zero breakpoint control registers. The problem there is that the arch hw_breakpoint code will attempt to create a (disabled), execute breakpoint of length 0. This will fail validation and report unexpected failure to GDB. To avoid this, we treated disabled breakpoints as HW_BREAKPOINT_EMPTY, but that seems to have broken with recent kernels, causing watchpoints to be treated as TYPE_INST in the core code and returning ENOSPC for any further breakpoints. This patch fixes the problem by prioritising the `enable' field of the breakpoint: if it is cleared, we simply update the perf_event_attr to indicate that the thing is disabled and don't bother changing either the type or the length. This reinforces the behaviour that the breakpoint control register is essentially read-only apart from the enable bit when disabling a breakpoint. Cc: <stable@vger.kernel.org> Reported-by: Aaron Liu <liucy214@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19	ARC: Allow conditional multiple inclusion of uapi/asm/unistd.h	Vineet Gupta	1	-1/+7
	Commit 97bc386fc12d "ARC: Add guard macro to uapi/asm/unistd.h" inhibited multiple inclusion of ARCH unistd.h. This however hosed the system since Generic syscall table generator relies on it being included twice, and in lack-of an empty table was emitted by C preprocessor. Fix that by allowing one exception to rule for the special case (just like Xtensa) Suggested-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-12-19	target/file: Update hw_max_sectors based on current block_size	Nicholas Bellinger	4	-5/+14
	This patch allows FILEIO to update hw_max_sectors based on the current max_bytes_per_io. This is required because vfs_[writev,readv]() can accept a maximum of 2048 iovecs per call, so the enforced hw_max_sectors really needs to be calculated based on block_size. This addresses a >= v3.5 bug where block_size=512 was rejecting > 1M sized I/O requests, because FD_MAX_SECTORS was hardcoded to 2048 for the block_size=4096 case. (v2: Use max_bytes_per_io instead of ->update_hw_max_sectors) Reported-by: Henrik Goldman <hg@x-formation.com> Cc: <stable@vger.kernel.org> #3.5+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19	iser-target: Move INIT_WORK setup into isert_create_device_ib_res	Nicholas Bellinger	1	-2/+4
	This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into isert_create_device_ib_res(), instead of being done each callback invocation in isert_cq_[rx,tx]_callback(). This also fixes a 'INFO: trying to register non-static key' warning when cancel_work_sync() is called before INIT_WORK has setup the struct work_struct. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: <stable@vger.kernel.org> #3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19	iscsi-target: Fix incorrect np->np_thread NULL assignment	Nicholas Bellinger	2	-6/+1
	When shutting down a target there is a race condition between iscsit_del_np() and __iscsi_target_login_thread(). The latter sets the thread pointer to NULL, and the former tries to issue kthread_stop() on that pointer without any synchronization. This patch moves the np->np_thread NULL assignment into iscsit_del_np(), after kthread_stop() has completed. It also removes the signal_pending() + np_state check, and only exits when kthread_should_stop() is true. Reported-by: Hannes Reinecke <hare@suse.de> Cc: <stable@vger.kernel.org> #3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19	mm/hugetlb: check for pte NULL pointer in __page_check_address()	Jianguo Wu	1	-0/+4
	In __page_check_address(), if address's pud is not present, huge_pte_offset() will return NULL, we should check the return value. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: qiuxishi <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	fix build with make 3.80	Jan Beulich	1	-13/+7
	According to Documentation/Changes, make 3.80 is still being supported for building the kernel, hence make files must not make (unconditional) use of features introduced only in newer versions. Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression config option") however introduced "else ifeq" constructs which make 3.80 doesn't understand. Replace the logic there with more conventional (in the kernel build infrastructure) list constructs (except that the list here is intentionally limited to exactly one element). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: P J P <ppandit@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm/mempolicy: fix !vma in new_vma_page()	Wanpeng Li	1	-6/+8
	BUG_ON(!vma) assumption is introduced by commit 0bf598d863e3 ("mbind: add BUG_ON(!vma) in new_vma_page()"), however, even if address = __vma_address(page, vma); and vma->start < address < vma->end page_address_in_vma() may still return -EFAULT because of many other conditions in it. As a result the while loop in new_vma_page() may end with vma=NULL. This patch revert the commit and also fix the potential dereference NULL pointer reported by Dan. http://marc.info/?l=linux-mm&m=137689530323257&w=2 kernel BUG at mm/mempolicy.c:1204! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2 task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000 RIP: new_vma_page+0x70/0x90 RSP: 0000:ffff88005ab21db0 EFLAGS: 00010246 RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80 RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2 R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000 FS: 00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50 ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8 0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00 Call Trace: migrate_pages+0x12a/0x850 SYSC_mbind+0x513/0x6a0 SyS_mbind+0xe/0x10 ia32_do_call+0x13/0x13 Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48 RIP [<ffffffff8119f200>] new_vma_page+0x70/0x90 RSP <ffff88005ab21db0> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	MAINTAINERS: add Davidlohr as GPT maintainer	Davidlohr Bueso	1	-0/+6
	Add a new entry for the GPT standard. Any future changes will now be routed through linux-efi. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Matt Fleming <matt.fleming@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully	Jianguo Wu	1	-4/+10
	After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0 PGD c23762067 PUD c24be2067 PMD 0 Oops: 0000 [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm/compaction: respect ignore_skip_hint in update_pageblock_skip	Joonsoo Kim	1	-0/+4
	update_pageblock_skip() only fits to compaction which tries to isolate by pageblock unit. If isolate_migratepages_range() is called by CMA, it try to isolate regardless of pageblock unit and it don't reference get_pageblock_skip() by ignore_skip_hint. We should also respect it on update_pageblock_skip() to prevent from setting the wrong information. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm/mempolicy: correct putback method for isolate pages if failed	Joonsoo Kim	1	-1/+1
	queue_pages_range() isolates hugetlbfs pages and putback_lru_pages() can't handle these. We should change it to putback_movable_pages(). Naoya said that it is worth going into stable, because it can break in-use hugepage list. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Rafael Aquini <aquini@redhat.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: add missing dependency in Kconfig	Sima Baymani	1	-1/+1
	Eliminate the following (rand)config warning by adding missing PROC_FS dependency: warning: (HWPOISON_INJECT && MEM_SOFT_DIRTY) selects PROC_PAGE_MONITOR which has unmet direct dependencies (PROC_FS && MMU) Signed-off-by: Sima Baymani <sima.baymani@gmail.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	sh: always link in helper functions extracted from libgcc	Geert Uytterhoeven	1	-1/+1
	E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m: ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined! For "lib-y", if no symbols in a compilation unit are referenced by other units, the compilation unit will not be included in vmlinux. This breaks modules that do reference those symbols. Use "obj-y" instead to fix this. http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/ This doesn't fix all cases. There are others, e.g. udivsi3. This is also not limited to sh, many architectures handle this in the same way. A simple solution is to unconditionally include all helper functions. A more complex solution is to make the choice of "lib-y" or "obj-y" depend on CONFIG_MODULES: obj-$(CONFIG_MODULES) += ... lib-y($CONFIG_MODULES) += ... Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Mundt <lethal@linux-sh.org> Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Reviewed-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: page_alloc: exclude unreclaimable allocations from zone fairness policy	Johannes Weiner	1	-1/+2
	Dave Hansen noted a regression in a microbenchmark that loops around open() and close() on an 8-node NUMA machine and bisected it down to commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy"). That change forces the slab allocations of the file descriptor to spread out to all 8 nodes, causing remote references in the page allocator and slab. The round-robin policy is only there to provide fairness among memory allocations that are reclaimed involuntarily based on pressure in each zone. It does not make sense to apply it to unreclaimable kernel allocations that are freed manually, in this case instantly after the allocation, and incur the remote reference costs twice for no reason. Only round-robin allocations that are usually freed through page reclaim or slab shrinking. Bisected by Dave Hansen. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: defer TLB flush for THP migration as long as possible	Mel Gorman	2	-7/+3
	THP migration can fail for a variety of reasons. Avoid flushing the TLB to deal with THP migration races until the copy is ready to start. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: guarantee that tlb_flush_pending updates are visible before page ↵	Mel Gorman	1	-1/+6
	table updates According to documentation on barriers, stores issued before a LOCK can complete after the lock implying that it's possible tlb_flush_pending can be visible after a page table update. As per revised documentation, this patch adds a smp_mb__before_spinlock to guarantee the correct ordering. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: fix TLB flush race between migration, and change_protection_range	Rik van Riel	8	-7/+69
	There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [] flush TLB reload TLB from new entry read/write new page lose data [] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: avoid unnecessary disruption of NUMA hinting during migration	Mel Gorman	3	-6/+37
	do_huge_pmd_numa_page() handles the case where there is parallel THP migration. However, by the time it is checked the NUMA hinting information has already been disrupted. This patch adds an earlier check with some helpers. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: clear numa hinting information on mprotect	Mel Gorman	2	-0/+4
	On a protection change it is no longer clear if the page should be still accessible. This patch clears the NUMA hinting fault bits on a protection change. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	sched: numa: skip inaccessible VMAs	Mel Gorman	1	-0/+7
	Inaccessible VMA should not be trapping NUMA hint faults. Skip them. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: avoid unnecessary work on the failure path	Mel Gorman	1	-1/+3
	If a PMD changes during a THP migration then migration aborts but the failure path is doing more work than is necessary. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: ensure anon_vma is locked to prevent parallel THP splits	Mel Gorman	1	-0/+7
	The anon_vma lock prevents parallel THP splits and any associated complexity that arises when handling splits during THP migration. This patch checks if the lock was successfully acquired and bails from THP migration if it failed for any reason. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: do not clear PTE for pte_numa update	Mel Gorman	1	-2/+7
	The TLB must be flushed if the PTE is updated but change_pte_range is clearing the PTE while marking PTEs pte_numa without necessarily flushing the TLB if it reinserts the same entry. Without the flush, it's conceivable that two processors have different TLBs for the same virtual address and at the very least it would generate spurious faults. This patch only unmaps the pages in change_pte_range for a full protection change. [riel@redhat.com: write pte_numa pte back to the page tables] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: do not clear PMD during PTE update scan	Mel Gorman	1	-1/+1
	If the PMD is flushed then a parallel fault in handle_mm_fault() will enter the pmd_none and do_huge_pmd_anonymous_page() path where it'll attempt to insert a huge zero page. This is wasteful so the patch avoids clearing the PMD when setting pmd_numa. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: clear pmd_numa before invalidating	Mel Gorman	1	-0/+3
	On x86, PMD entries are similar to _PAGE_PROTNONE protection and are handled as NUMA hinting faults. The following two page table protection bits are what defines them _PAGE_NUMA:set _PAGE_PRESENT:clear A PMD is considered present if any of the _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PSE or _PAGE_NUMA bits are set. If pmdp_invalidate encounters a pmd_numa, it clears the present bit leaving _PAGE_NUMA which will be considered not present by the CPU but present by pmd_present. The existing caller of pmdp_invalidate should handle it but it's an inconsistent state for a PMD. This patch keeps the state consistent when calling pmdp_invalidate. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: call MMU notifiers on THP migration	Mel Gorman	1	-8/+14
	MMU notifiers must be called on THP page migration or secondary MMUs will get very confused. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	mm: numa: serialise parallel get_user_page against THP migration	Mel Gorman	3	-15/+60
	Base pages are unmapped and flushed from cache and TLB during normal page migration and replaced with a migration entry that causes any parallel NUMA hinting fault or gup to block until migration completes. THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages and get_user_pages_fast which commit 3f926ab945b6 ("mm: Close races between THP migration and PMD numa clearing") made worse by introducing a pmd_clear_flush(). This patch forces get_user_page (fast and normal) on a pmd_numa page to go through the slow get_user_page path where it will serialise against THP migration and properly account for the NUMA hinting fault. On the migration side the page table lock is taken for each PTE update. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19	kexec: migrate to reboot cpu	Vivek Goyal	3	-1/+3
	Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") moved reboot= handling to generic code. In the process it also removed the code in native_machine_shutdown() which are moving reboot process to reboot_cpu/cpu0. I guess that thought must have been that all reboot paths are calling migrate_to_reboot_cpu(), so we don't need this special handling. But kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu() so above change broke kexec. Now reboot can happen on non-boot cpu and when INIT is sent in second kerneo to bring up BP, it brings down the machine. So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid this problem. Bisected by WANG Chao. Reported-by: Matthew Whitehead <mwhitehe@redhat.com> Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18	net_dma: mark broken	Dan Williams	1	-0/+1
	net_dma can cause data to be copied to a stale mapping if a copy-on-write fault occurs during dma. The application sees missing data. The following trace is triggered by modifying the kernel to WARN if it ever triggers copy-on-write on a page that is undergoing dma: WARNING: CPU: 24 PID: 2529 at lib/dma-debug.c:485 debug_dma_assert_idle+0xd2/0x120() ioatdma 0000:00:04.0: DMA-API: cpu touching an active dma mapped page [pfn=0x16bcd9] Modules linked in: iTCO_wdt iTCO_vendor_support ioatdma lpc_ich pcspkr dca CPU: 24 PID: 2529 Comm: linbug Tainted: G W 3.13.0-rc1+ #353 00000000000001e5 ffff88016f45f688 ffffffff81751041 ffff88017ab0ef70 ffff88016f45f6d8 ffff88016f45f6c8 ffffffff8104ed9c ffffffff810f3646 ffff8801768f4840 0000000000000282 ffff88016f6cca10 00007fa2bb699349 Call Trace: [<ffffffff81751041>] dump_stack+0x46/0x58 [<ffffffff8104ed9c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff810f3646>] ? ftrace_pid_func+0x26/0x30 [<ffffffff8104ee86>] warn_slowpath_fmt+0x46/0x50 [<ffffffff8139c062>] debug_dma_assert_idle+0xd2/0x120 [<ffffffff81154a40>] do_wp_page+0xd0/0x790 [<ffffffff811582ac>] handle_mm_fault+0x51c/0xde0 [<ffffffff813830b9>] ? copy_user_enhanced_fast_string+0x9/0x20 [<ffffffff8175fc2c>] __do_page_fault+0x19c/0x530 [<ffffffff8175c196>] ? _raw_spin_lock_bh+0x16/0x40 [<ffffffff810f3539>] ? trace_clock_local+0x9/0x10 [<ffffffff810fa1f4>] ? rb_reserve_next_event+0x64/0x310 [<ffffffffa0014c00>] ? ioat2_dma_prep_memcpy_lock+0x60/0x130 [ioatdma] [<ffffffff8175ffce>] do_page_fault+0xe/0x10 [<ffffffff8175c862>] page_fault+0x22/0x30 [<ffffffff81643991>] ? __kfree_skb+0x51/0xd0 [<ffffffff813830b9>] ? copy_user_enhanced_fast_string+0x9/0x20 [<ffffffff81388ea2>] ? memcpy_toiovec+0x52/0xa0 [<ffffffff8164770f>] skb_copy_datagram_iovec+0x5f/0x2a0 [<ffffffff8169d0f4>] tcp_rcv_established+0x674/0x7f0 [<ffffffff816a68c5>] tcp_v4_do_rcv+0x2e5/0x4a0 [..] ---[ end trace e30e3b01191b7617 ]--- Mapped at: [<ffffffff8139c169>] debug_dma_map_page+0xb9/0x160 [<ffffffff8142bf47>] dma_async_memcpy_pg_to_pg+0x127/0x210 [<ffffffff8142cce9>] dma_memcpy_pg_to_iovec+0x119/0x1f0 [<ffffffff81669d3c>] dma_skb_copy_datagram_iovec+0x11c/0x2b0 [<ffffffff8169d1ca>] tcp_rcv_established+0x74a/0x7f0: ...the problem is that the receive path falls back to cpu-copy in several locations and this trace is just one of the areas. A few options were considered to fix this: 1/ sync all dma whenever a cpu copy branch is taken 2/ modify the page fault handler to hold off while dma is in-flight Option 1 adds yet more cpu overhead to an "offload" that struggles to compete with cpu-copy. Option 2 adds checks for behavior that is already documented as broken when using get_user_pages(). At a minimum a debug mode is warranted to catch and flag these violations of the dma-api vs get_user_pages(). Thanks to David for his reproducer. Cc: <stable@vger.kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Reported-by: David Whipple <whipple@securedatainnovations.ch> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2013-12-18	dma: pl330: ensure DMA descriptors are zero-initialised	Will Deacon	1	-4/+1
	I see the following splat with 3.13-rc1 when attempting to perform DMA: [ 253.004516] Alignment trap: not handling instruction e1902f9f at [<c0204b40>] [ 253.004583] Unhandled fault: alignment exception (0x221) at 0xdfdfdfd7 [ 253.004646] Internal error: : 221 [#1] PREEMPT SMP ARM [ 253.004691] Modules linked in: dmatest(+) [last unloaded: dmatest] [ 253.004798] CPU: 0 PID: 671 Comm: kthreadd Not tainted 3.13.0-rc1+ #2 [ 253.004864] task: df9b0900 ti: df03e000 task.ti: df03e000 [ 253.004937] PC is at dmaengine_unmap_put+0x14/0x34 [ 253.005010] LR is at pl330_tasklet+0x3c8/0x550 [ 253.005087] pc : [<c0204b44>] lr : [<c0207478>] psr: a00e0193 [ 253.005087] sp : df03fe48 ip : 00000000 fp : df03bf18 [ 253.005178] r10: bf00e108 r9 : 00000001 r8 : 00000000 [ 253.005245] r7 : df837040 r6 : dfb41800 r5 : df837048 r4 : df837000 [ 253.005316] r3 : dfdfdfcf r2 : dfb41f80 r1 : df837048 r0 : dfdfdfd7 [ 253.005384] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 253.005459] Control: 30c5387d Table: 9fb9ba80 DAC: fffffffd [ 253.005520] Process kthreadd (pid: 671, stack limit = 0xdf03e248) This is due to desc->txd.unmap containing garbage (uninitialised memory). Rather than add another dummy initialisation to _init_desc, instead ensure that the descriptors are zero-initialised during allocation and remove the dummy, per-field initialisation. Cc: Andriy Shevchenko <andriy.shevchenko@intel.com> Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2013-12-18	ALSA: hda - Add Dell headset detection quirk for one more laptop model	Hui Wang	1	-0/+1
	On the Dell machines with codec whose Subsystem Id is 0x10280640, no external microphone can be detected when plugging a 3-ring headset. Using ALC255_FIXUP_DELL1_MIC_NO_PRESENCE can fix this problem. The codec (Vendor ID: 0x10ec0255) on the machine belongs to alc_269 family. BugLink: https://bugs.launchpad.net/bugs/1260303 Cc: David Henningsson <david.henningsson@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-12-18	ASoC: wm8904: fix DSP mode B configuration	Bo Shen	1	-1/+1
	When wm8904 work in DSP mode B, we still need to configure it to work in DSP mode. Or else, it will work in Right Justified mode. Signed-off-by: Bo Shen <voice.shen@atmel.com> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@linaro.org> Cc: stable@vger.kernel.org
2013-12-18	ASoC: wm_adsp: Add small delay while polling DSP RAM start	Charles Keepax	1	-3/+7
	Some devices are getting very close to the limit whilst polling the RAM start, this patch adds a small delay to this loop to give a longer startup timeout. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@linaro.org> Cc: stable@vger.kernel.org
2013-12-18	KVM: PPC: Book3S HV: Don't drop low-order page address bits	Paul Mackerras	1	-0/+1
	Commit caaa4c804fae ("KVM: PPC: Book3S HV: Fix physical address calculations") unfortunately resulted in some low-order address bits getting dropped in the case where the guest is creating a 4k HPTE and the host page size is 64k. By getting the low-order bits from hva rather than gpa we miss out on bits 12 - 15 in this case, since hva is at page granularity. This puts the missing bits back in. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-18	powerpc: book3s: kvm: Don't abuse host r2 in exit path	Aneesh Kumar K.V	3	-4/+5
	We don't use PACATOC for PR. Avoid updating HOST_R2 with PR KVM mode when both HV and PR are enabled in the kernel. Without this we get the below crash (qemu) Unable to handle kernel paging request for data at address 0xffffffffffff8310 Faulting instruction address: 0xc00000000001d5a4 cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0] pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0 lr: c00000000001d760: .vtime_account_system+0x20/0x60 sp: c0000001dc53b170 msr: 8000000000009032 dar: ffffffffffff8310 dsisr: 40000000 current = 0xc0000001d76c62d0 paca = 0xc00000000fef1100 softe: 0 irq_happened: 0x01 pid = 4472, comm = qemu-system-ppc enter ? for help [c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60 [c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50 [c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4 [c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0 [c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40 [c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730 [c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770 [c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0 [c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98 Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-18	imx-drm: imx-drm-core: improve safety of imx_drm_add_crtc()	Russell King	1	-0/+9
	We must not add more CRTCs than we have declared to the vblank helpers, otherwise we overflow their arrays. Force failure if we exceed the number of CRTCs. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-18	imx-drm: imx-drm-core: make imx_drm_crtc_register() safer	Russell King	1	-2/+3
	imx_drm_crtc_register() doesn't clean up the CRTC upon failure, which leaves the CRTC attached to the DRM device. Also, it does setup after attaching the CRTC to the DRM device. Fix this by reordering the function such that we do the setup before drm_crtc_init(): this fixes both issues. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-18	imx-drm: imx-drm-core: use defined constant for number of CRTCs.	Russell King	1	-2/+2
	We have this definition, there's no reason not to use it. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>