linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	xfs: align initial file allocations correctly	Dave Chinner	2013-12-11	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function xfs_bmap_isaeof() is used to indicate that an allocation is occurring at or past the end of file, and as such should be aligned to the underlying storage geometry if possible. Commit 27a3f8f ("xfs: introduce xfs_bmap_last_extent") changed the behaviour of this function for empty files - it turned off allocation alignment for this case accidentally. Hence large initial allocations from direct IO are not getting correctly aligned to the underlying geometry, and that is cause write performance to drop in alignment sensitive configurations. Fix it by considering allocation into empty files as requiring aligned allocation again. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: fix calculation of freed inode cluster blocks	Ben Myers	2013-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	rec.ir_startino is an agino rather than an ino. Use the correct macro when dealing with it in xfs_difree. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
*	MAINTAINERS: fix incorrect mail address of XFS maintainer	Namjae Jeon	2013-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	When I tried to send the patches to XFS Maintainers, I got returned mail included delivery fail message for Dave's mail. Maybe, Dave Chinner mail address is incorrect. I try to fix it correctly. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: xfs_dir2_block_to_sf temp buffer allocation fails	Dave Chinner	2013-12-11	1	-24/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are using a large directory block size, and memory becomes fragmented, we can get memory allocation failures trying to kmem_alloc(64k) for a temporary buffer. However, there is not need for a directory buffer sized allocation, as the end result ends up in the inode literal area. This is, at most, slightly less than 2k of space, and hence we don't need an allocation larger than that fora temporary buffer. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: fix infinite loop by detaching the group/project hints from user dquot	Jie Liu	2013-12-09	1	-21/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_quota(8) will hang up if trying to turn group/project quota off before the user quota is off, this could be 100% reproduced by: # mount -ouquota,gquota /dev/sda7 /xfs # mkdir /xfs/test # xfs_quota -xc 'off -g' /xfs <-- hangs up # echo w > /proc/sysrq-trigger # dmesg SysRq : Show Blocked State task PC stack pid father xfs_quota D 0000000000000000 0 27574 2551 0x00000000 [snip] Call Trace: [<ffffffff81aaa21d>] schedule+0xad/0xc0 [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0 [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0 [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0 [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs] [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30 [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs] [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs] [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs] [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs] [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs] [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs] [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs] [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs] [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs] [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0 [<ffffffff812fba4b>] ? iput+0x5b/0x90 [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0 [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b It's fine if we turn user quota off at first, then turn off other kind of quotas if they are enabled since the group/project dquot refcount is decreased to zero once the user quota if off. Otherwise, those dquots refcount is non-zero due to the user dquot might refer to them as hint(s). Hence, above operation cause an infinite loop at xfs_qm_dquot_walk() while trying to purge dquot cache. This problem has been around since Linux 3.4, it was introduced by: [ b84a3a9675 xfs: remove the per-filesystem list of dquots ] Originally we will release the group dquot pointers because the user dquots maybe carrying around as a hint via xfs_qm_detach_gdquots(). However, with above change, there is no such work to be done before purging group/project dquot cache. In order to solve this problem, this patch introduces a special routine xfs_qm_dqpurge_hints(), and it would release the group/project dquot pointers the user dquots maybe carrying around as a hint, and then it will proceed to purge the user dquot cache if requested. Cc: stable@vger.kernel.org Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: fix assertion failure at xfs_setattr_nonsize	Jie Liu	2013-12-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For CRC enabled v5 super block, change a file's ownership can simply trigger an ASSERT failure at xfs_setattr_nonsize() if both group and project quota are enabled, i.e, [ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621 [ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable] [ 305.383939] Call Trace: [ 305.385536] [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs] [ 305.387142] [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs] [ 305.388727] [<ffffffff811ca388>] notify_change+0x1a8/0x350 [ 305.390298] [<ffffffff811ac39d>] chown_common+0xfd/0x110 [ 305.391868] [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110 [ 305.393440] [<ffffffff811ad760>] SyS_lchown+0x20/0x30 [ 305.394995] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 305.399870] RIP [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs] This fix adjust the assertion to check if the super block support both quota inodes or not. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: add xfs_setattr_time	Christoph Hellwig	2013-12-07	1	-36/+30
\| \| \| \| \| \| \| \| \| \| \|	Split out a xfs_setattr_time helper to share code between truncate and regular setattr similar to xfs_setattr_mode. I might also have another caller growing for this in the near future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: tiny xfs_setattr_mode cleanup	Christoph Hellwig	2013-12-07	1	-6/+4
\| \| \| \| \| \| \| \| \| \|	Remove the pointless tp argument, and properly align the local variable declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: fix false assertion at xfs_qm_vop_create_dqattach	Jie Liu	2013-12-06	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the previous fix, there still has another ASSERT failure if turning off any type of quota while fsstress is running at the same time. Backtrace in this case: [ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118 [ 50.867924] ------------[ cut here ]------------ ... <snip> [ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable] [ 50.867999] invalid opcode: 0000 [#1] SMP [ 50.869407] Call Trace: [ 50.869446] [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs] [ 50.869512] [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs] [ 50.869564] [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs] [ 50.869615] [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs] [ 50.869655] [<ffffffff811becd5>] vfs_mkdir+0x95/0x130 [ 50.869689] [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0 [ 50.869723] [<ffffffff811bf689>] SyS_mkdir+0x19/0x20 [ 50.869757] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip> [ 50.870003] RIP [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs] [ 50.870050] RSP <ffff88002941fd60> [ 50.879251] ---[ end trace c93a2b342341c65b ]--- We're hitting the ASSERT(XFS_IS_QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(), however the assertion itself is not right IMHO. While performing quota off, we firstly clear the XFS_QUOTA_ACTIVE bit(s) from struct xfs_mount without taking any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee that the desired quota is still active. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: integrate xfs_quota_priv header file to xfs_qm	Jie Liu	2013-12-06	2	-43/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	The xfs_quota_priv header file is only included by xfs_qm header and there is no much users for its contents, hence we can move those stuff to xfs_qm header file and kill it. This patch also remove an unused macro DQFLAGTO_TYPESTR. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: make quota metadata truncation behavior consistent to user space	Jie Liu	2013-12-06	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In xfs_qm_scall_trunc_qfiles(), we ignore the error if failed to remove the users quota metadata and proceed to remove groups and projects if they are being there. However, in user space, the remove operation will break and return if failed to remove any kind of quota. Also for v5 super block, we can enabled both group and project quota at the same time, in this case the current error handling will cover the group error with projects but they might failed due to different reasons. It seems we'd better the error handling consistent to the user space and don't trying to remove another kind of quota metadata if the previous operation is failed. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: fix memory leak in xfs_dir2_node_removename	Mark Tinguely	2013-12-05	1	-13/+13
\| \| \| \| \| \| \| \| \| \|	Fix the leak of kernel memory in xfs_dir2_node_removename() when xfs_dir2_leafn_remove() returns an error code. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: free the list of recovery items on error	Mark Tinguely	2013-12-05	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recovery builds a list of items on the transaction's r_itemq head. Normally these items are committed and freed. But in the event of a recovery error, these allocations are leaked. If the error occurs during item reordering, then reconstruct the r_itemq list before deleting the list to avoid leaking the entries that were on one of the temporary lists. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	fs: fix iversion handling	Christoph Hellwig	2013-12-05	3	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently notify_change directly updates i_version for size updates, which not only is counter to how all other fields are updated through struct iattr, but also breaks XFS, which need inode updates to happen under its own lock, and synchronized to the structure that gets written to the log. Remove the update in the common code, and it to btrfs and ext4, XFS already does a proper updaste internally and currently gets a double update with the existing code. IMHO this is 3.13 and -stable material and should go in through the XFS tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: growfs overruns AGFL buffer on V4 filesystems	Dave Chinner	2013-12-05	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This loop in xfs_growfs_data_private() is incorrect for V4 superblocks filesystems: for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++) agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK); For V4 filesystems, we don't have a agfl header structure, and so XFS_AGFL_SIZE() returns an entire sector's worth of entries, which we then index from an offset into the sector. Hence: buffer overrun. This problem was introduced in 3.10 by commit 77c95bba ("xfs: add CRC checks to the AGFL") which changed the AGFL structure but failed to update the growfs code to handle the different structures. Fix it by using the correct offset into the buffer for both V4 and V5 filesystems. Cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: don't perform discard if the given range length is less than block size	Jie Liu	2013-12-04	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For discard operation, we should return EINVAL if the given range length is less than a block size, otherwise it will go through the file system to discard data blocks as the end range might be evaluated to -1, e.g, # fstrim -v -o 0 -l 100 /xfs7 /xfs7: 9811378176 bytes were trimmed This issue can be triggered via xfstests/generic/288. Also, it seems to get the request queue pointer via bdev_get_queue() instead of the hard code pointer dereference is not a bad thing. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: fix the comment explaining xfs_trans_dqlockedjoin	Christoph Hellwig	2013-12-04	1	-2/+2
\| \| \| \| \| \| \|	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: underflow bug in xfs_attrlist_by_handle()	Dan Carpenter	2013-12-04	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we allocate less than sizeof(struct attrlist) then we end up corrupting memory or doing a ZERO_PTR_SIZE dereference. This can only be triggered with CAP_SYS_ADMIN. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: remove unused FI_ flags	Christoph Hellwig	2013-12-04	1	-9/+0
\| \| \| \| \| \| \|	Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com.> Signed-off-by: Ben Myers <bpm@sgi.com>
*	xfs: simplify xfs_setsize_buftarg callchain; remove unused arg	Eric Sandeen	2013-12-04	1	-18/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "verbose" argument to xfs_setsize_buftarg_flags() has been unused since: ffe37436 xfs: stop using the page cache to back the buffer cache Remove it, and fold the function into xfs_setsize_buftarg() now that there's no need for different types of callers. Fix inconsistent comment spacing while we're at it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
*	Linux 3.13-rc2v3.13-rc2	Linus Torvalds	2013-11-29	1	-1/+1
\|
*	Merge tag 'arm64-stable' of ↵	Linus Torvalds	2013-11-29	8	-66/+67
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull ARM64 fixes from Catalin Marinas: - Remove preempt_count modifications in the arm64 IRQ handling code since that's already dealt with in generic irq_enter/irq_exit - PTE_PROT_NONE bit moved higher up to avoid overlapping with the hardware bits (for PROT_NONE mappings which are pte_present) - Big-endian fixes for ptrace support - Asynchronous aborts unmasking while in the kernel - pgprot_writecombine() change to create Normal NonCacheable memory rather than Device GRE * tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: arm64: Move PTE_PROT_NONE higher up arm64: Use Normal NonCacheable memory for writecombine arm64: debug: make aarch32 bkpt checking endian clean arm64: ptrace: fix compat registes get/set to be endian clean arm64: Unmask asynchronous aborts when in kernel mode arm64: dts: Reserve the memory used for secondary CPU release address arm64: let the core code deal with preempt_count
\| *	arm64: Move PTE_PROT_NONE higher up	Catalin Marinas	2013-11-29	1	-14/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PTE_PROT_NONE means that a pte is present but does not have any read/write attributes. However, setting the memory type like pgprot_writecombine() is allowed and such bits overlap with PTE_PROT_NONE. This causes mmap/munmap issues in drivers that change the vma->vm_pg_prot on PROT_NONE mappings. This patch reverts the PTE_FILE/PTE_PROT_NONE shift in commit 59911ca4325d (ARM64: mm: Move PTE_PROT_NONE bit) and moves PTE_PROT_NONE together with the other software bits. Signed-off-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> # 3.11+
\| *	arm64: Use Normal NonCacheable memory for writecombine	Catalin Marinas	2013-11-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides better performance compared to Device GRE and also allows unaligned accesses. Such memory is intended to be used with standard RAM (e.g. framebuffers) and not I/O. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
\| *	arm64: debug: make aarch32 bkpt checking endian clean	Matthew Leach	2013-11-28	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current breakpoint instruction checking code for A32 is not endian clean. Fix this with appropriate byte-swapping when retrieving instructions. Signed-off-by: Matthew Leach <matthew.leach@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
\| *	arm64: ptrace: fix compat registes get/set to be endian clean	Matthew Leach	2013-11-28	1	-21/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a BE system the wrong half of the X registers is retrieved/written when attempting to get/set the value of aarch32 registers through ptrace. Ensure that types are the correct width so that the relevant casting occurs. Signed-off-by: Matthew Leach <matthew.leach@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
\| *	arm64: Unmask asynchronous aborts when in kernel mode	Catalin Marinas	2013-11-25	3	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The asynchronous aborts are generally fatal for the kernel but they can be masked via the pstate A bit. If a system error happens while in kernel mode, it won't be visible until returning to user space. This patch enables this kind of abort early to help identifying the cause. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
\| *	arm64: dts: Reserve the memory used for secondary CPU release address	Catalin Marinas	2013-11-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the spin-table SMP booting method, secondary CPUs poll a location passed in the DT. The foundation-v8.dts file doesn't have this memory reserved and there is a risk of Linux using it before secondary CPUs are started. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
\| *	arm64: let the core code deal with preempt_count	Marc Zyngier	2013-11-25	1	-22/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit f27dde8deef3 (sched: Add NEED_RESCHED to the preempt_count) introduced the use of bit 31 in preempt_count for obscure scheduling purposes. This causes interrupts taken from EL0 to hit the (open coded) BUG when this flag is flipped while handling the interrupt (we compare the values before and after, and kill the kernel if they are different). The fix is to stop messing with the preempt count entirely, as this is already being dealt with in the generic code (irq_enter/irq_exit). Tested on a dual A53 FPGA running cyclictest. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2013-11-29	14	-88/+87
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "One performance improvement and a few bug fixes. Two of the fixes deal with the clock related problems we have seen on recent kernels" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: handle asce-type exceptions as normal page fault s390,time: revert direct ktime path for s390 clockevent device s390/time,vdso: convert to the new update_vsyscall interface s390/uaccess: add missing page table walk range check s390/mm: optimize copy_page s390/dasd: validate request size before building CCW/TCW request s390/signal: always restore saved runtime instrumentation psw bit
\| * \|	s390/mm: handle asce-type exceptions as normal page fault	Martin Schwidefsky	2013-11-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Git commit 9e34f2686bb088b211b6cac8772e1f644c6180f8 "s390/mm,tlb: tlb flush on page table upgrade fixup" removed the exception handler for the asce-type exception. This is incorrect as the user-copy with MVCOS can cause asce-type exceptions in the kernel if a user pointer is too large. Those need to be handled with do_no_context to branch to the fixup in the user-copy code. The simplest fix for this problem is to call do_dat_exception for asce-type excpetions, as there is no vma for the address the code will handle the exception correctly. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
\| * \|	s390,time: revert direct ktime path for s390 clockevent device	Martin Schwidefsky	2013-11-25	1	-15/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Git commit 4f37a68cdaf6dea833cfdded2a3e0c47c0f006da "s390: Use direct ktime path for s390 clockevent device" makes use of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta calculation with ktime_get() in clockevents_program_event and the get_tod_clock() in s390_next_event. This is based on the assumption that the difference between the internal ktime and the hardware clock is reflected in the wall_to_monotonic delta. But this is not true, the ntp corrections are applied via changes to the tk->mult multiplier and this is not reflected in wall_to_monotonic. In theory this could be solved by using the raw monotonic clock but it is simpler to switch back to the standard clock delta calculation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
\| * \|	s390/time,vdso: convert to the new update_vsyscall interface	Martin Schwidefsky	2013-11-25	8	-45/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Switch to the improved update_vsyscall interface that provides sub-nanosecond precision for gettimeofday and clock_gettime. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
\| * \|	s390/uaccess: add missing page table walk range check	Heiko Carstens	2013-11-25	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When translating a user space address, the address must be checked against the ASCE limit of the process. If the address is larger than the maximum address that is reachable with the ASCE, an ASCE type exception must be generated. The current code simply ignored the higher order bits. This resulted in an address wrap around in user space instead of an exception in user space. Cc: stable@vger.kernel.org # v3.9+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
\| * \|	s390/mm: optimize copy_page	Heiko Carstens	2013-11-20	1	-25/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Always use the mvcl instruction to copy a page instead of mvpg or a couple of mvc instructions. Copying a huge page is 25% faster this way. Also bypass caches when copying pages since only parts of a page will be used afterwards. Especially when copying a huge page this would kick everything out of the L1 and L2 data caches on a zEC12 machine. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
\| * \|	s390/dasd: validate request size before building CCW/TCW request	Stefan Weinhuber	2013-11-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An I/O request that does not read or write full blocks cannot be translated into a correct CCW or TCW program and should be rejected right away. In particular the code that creates TCW requests will not notice this problem and create broken TCWs that will be rejected by the hardware. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Reference-ID: RQM1956
\| * \|	s390/signal: always restore saved runtime instrumentation psw bit	Hendrik Brueckner	2013-11-20	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit "s390: fix handling of runtime instrumentation psw bit" (5ebf250dab) changed the behavior of setting the runtime instrumentation psw bit. This commit restores the original logic: 1. When returning from the signal handler, the runtime instrumentation psw bit is restored to its saved state. 2. If the runtime instrumentation psw bit is enabled during the signal handler, it is always turned off when leaving the signal handler. The saved state is restored as described in 1. That also implies that turning on runtime instrumentation in the signal handler is only effective while running in the signal context. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
* \| \|	Merge branch 'i2c/for-current' of ↵	Linus Torvalds	2013-11-29	5	-13/+19
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Some easy but needed fixes for i2c drivers since rc1" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: bcm2835: Linking platform nodes to adapter nodes i2c: omap: raw read and write endian fix i2c: i2c-bcm-kona: Fix module build i2c: i2c-diolan-u2c: different usb endpoints for DLN-2-U2C i2c: bcm-kona: remove duplicated include i2c: davinci: raw read and write endian fix
\| * \| \|	i2c: bcm2835: Linking platform nodes to adapter nodes	Florian Meier	2013-11-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to find I2C devices in the device tree, the platform nodes have to be known by the I2C core. This requires setting the dev.of_node parameter of the adapter. Signed-off-by: Florian Meier <florian.meier@koalo.de> Tested-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
\| * \| \|	i2c: omap: raw read and write endian fix	Victor Kamensky	2013-11-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All OMAP IP blocks expect LE data, but CPU may operate in BE mode. Need to use endian neutral functions to read/write h/w registers. I.e instead of __raw_read[lw] and __raw_write[lw] functions code need to use read[lw]_relaxed and write[lw]_relaxed functions. If the first simply reads/writes register, the second will byteswap it if host operates in BE mode. Changes are trivial sed like replacement of __raw_xxx functions with xxx_relaxed variant. Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
\| * \| \|	i2c: i2c-bcm-kona: Fix module build	Tim Kryger	2013-11-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Correct a typo that prevented the driver from being built as a module. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Tim Kryger <tim.kryger@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
\| * \| \|	i2c: i2c-diolan-u2c: different usb endpoints for DLN-2-U2C	Martin Vogt	2013-11-26	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous diolan adapter uses other out/in endpoints than the current DLN-2-U2C in compatibility mode. They changed from 0x2/0x84 to 0x3/0x83. This patch gets the endpoints from the usb interface, instead of hardcode them in the driver. This was tested on a current DLN-2-U2C board. Signed-off-by: Martin Vogt <mvogt1@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
\| * \| \|	i2c: bcm-kona: remove duplicated include	Wei Yongjun	2013-11-26	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Markus Mayer <markus.mayer@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
\| * \| \|	i2c: davinci: raw read and write endian fix	Taras Kondratiuk	2013-11-26	1	-2/+2
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I2C IP block expect LE data, but CPU may operate in BE mode. Need to use endian neutral functions to read/write h/w registers. I.e instead of __raw_read[lw] and __raw_write[lw] functions code need to use read[lw]_relaxed and write[lw]_relaxed functions. If the first simply reads/writes register, the second will byteswap it if host operates in BE mode. Changes are trivial sed like replacement of __raw_xxx functions with xxx_relaxed variant. Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
* \| \|	Merge branch 'for-3.13-fixes' of ↵	Linus Torvalds	2013-11-29	1	-13/+37
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "This contains one important fix. The NUMA support added a while back broke ordering guarantees on ordered workqueues. It was enforced by having single frontend interface with @max_active == 1 but the NUMA support puts multiple interfaces on unbound workqueues on NUMA machines thus breaking the ordered guarantee. This is fixed by disabling NUMA support on ordered workqueues. The above and a couple other patches were sitting in for-3.12-fixes but I forgot to push that out, so they ended up waiting a bit too long. My aplogies. Other fixes are minor" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues workqueue: fix comment typo for __queue_work() workqueue: fix ordered workqueues in NUMA setups workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
\| * \| \|	workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in ↵	Li Bin	2013-11-23	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	init_workqueues When one work starts execution, the high bits of work's data contain pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID is assigned WORK_OFFQ_POOL_NONE when the work being initialized indicating that no pool is associated and get_work_pool() uses it to check the associated pool. So if worker_pool_assign_id() assigns a ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers leakage, and it may break the non-reentrance guarantee. This patch fix this issue by modifying the worker_pool_assign_id() function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE. Furthermore, in the current implementation, the BUILD_BUG_ON() in init_workqueues makes no sense. The number of worker pools needed cannot be determined at compile time, because the number of backing pools for UNBOUND workqueues is dynamic based on the assigned custom attributes. So remove it. tj: Minor comment and indentation updates. Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
\| * \| \|	workqueue: fix comment typo for __queue_work()	Li Bin	2013-11-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It seems the "dying" should be "draining" here. Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
\| * \| \|	workqueue: fix ordered workqueues in NUMA setups	Tejun Heo	2013-11-23	1	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An ordered workqueue implements execution ordering by using single pool_workqueue with max_active == 1. On a given pool_workqueue, work items are processed in FIFO order and limiting max_active to 1 enforces the queued work items to be processed one by one. Unfortunately, 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues") accidentally broke this guarantee by applying NUMA affinity to ordered workqueues too. On NUMA setups, an ordered workqueue would end up with separate pool_workqueues for different nodes. Each pool_workqueue still limits max_active to 1 but multiple work items may be executed concurrently and out of order depending on which node they are queued to. Fix it by using dedicated ordered_wq_attrs[] when creating ordered workqueues. The new attrs match the unbound ones except that no_numa is always set thus forcing all NUMA nodes to share the default pool_workqueue. While at it, add sanity check in workqueue creation path which verifies that an ordered workqueues has only the default pool_workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Libin <huawei.libin@huawei.com> Cc: stable@vger.kernel.org Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
\| * \| \|	workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY	Oleg Nesterov	2013-11-23	1	-4/+5
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed() in create_worker(). Otherwise userland can change ->cpus_allowed in between. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* \| \|	Merge branch 'for-3.13-fixes' of ↵	Linus Torvalds	2013-11-29	5	-5/+6
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "libata device removal path was removing parent device node before its child, which is mostly harmless but triggers warning after recent sysfs changes. Rafael's patch fixes the order. Other than that, minor controller-specific fixes and device ID additions" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ATA: Fix port removal ordering ahci: add Marvell 9230 to the AHCI PCI device list ata: fix acpi_bus_get_device() return value check pata_arasan_cf: add missing clk_disable_unprepare() on error path ahci: add support for IBM Akebono platform device