linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	security: Fix setting of PF_SUPERPRIV by __capable()	David Howells	2008-08-14	11	-63/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags the target process if that is not the current process and it is trying to change its own flags in a different way at the same time. __capable() is using neither atomic ops nor locking to protect t->flags. This patch removes __capable() and introduces has_capability() that doesn't set PF_SUPERPRIV on the process being queried. This patch further splits security_ptrace() in two: (1) security_ptrace_may_access(). This passes judgement on whether one process may access another only (PTRACE_MODE_ATTACH for ptrace() and PTRACE_MODE_READ for /proc), and takes a pointer to the child process. current is the parent. (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only, and takes only a pointer to the parent process. current is the child. In Smack and commoncap, this uses has_capability() to determine whether the parent will be permitted to use PTRACE_ATTACH if normal checks fail. This does not set PF_SUPERPRIV. Two of the instances of __capable() actually only act on current, and so have been changed to calls to capable(). Of the places that were using __capable(): (1) The OOM killer calls __capable() thrice when weighing the killability of a process. All of these now use has_capability(). (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see whether the parent was allowed to trace any process. As mentioned above, these have been split. For PTRACE_ATTACH and /proc, capable() is now used, and for PTRACE_TRACEME, has_capability() is used. (3) cap_safe_nice() only ever saw current, so now uses capable(). (4) smack_setprocattr() rejected accesses to tasks other than current just after calling __capable(), so the order of these two tests have been switched and capable() is used instead. (5) In smack_file_send_sigiotask(), we need to allow privileged processes to receive SIGIO on files they're manipulating. (6) In smack_task_wait(), we let a process wait for a privileged process, whether or not the process doing the waiting is privileged. I've tested this with the LTP SELinux and syscalls testscripts. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Linus Torvalds	2008-08-14	8	-26/+153
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: padlock - fix VIA PadLock instruction usage with irq_ts_save/restore() crypto: hash - Add missing top-level functions crypto: hash - Fix digest size check for digest type crypto: tcrypt - Fix AEAD chunk testing crypto: talitos - Add handling for SEC 3.x treatment of link table
\| *	crypto: padlock - fix VIA PadLock instruction usage with irq_ts_save/restore()	Suresh Siddha	2008-08-13	4	-1/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Wolfgang Walter reported this oops on his via C3 using padlock for AES-encryption: ################################################################## BUG: unable to handle kernel NULL pointer dereference at 000001f0 IP: [<c01028c5>] __switch_to+0x30/0x117 *pde = 00000000 Oops: 0002 [#1] PREEMPT Modules linked in: Pid: 2071, comm: sleep Not tainted (2.6.26 #11) EIP: 0060:[<c01028c5>] EFLAGS: 00010002 CPU: 0 EIP is at __switch_to+0x30/0x117 EAX: 00000000 EBX: c0493300 ECX: dc48dd00 EDX: c0493300 ESI: dc48dd00 EDI: c0493530 EBP: c04cff8c ESP: c04cff7c DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process sleep (pid: 2071, ti=c04ce000 task=dc48dd00 task.ti=d2fe6000) Stack: dc48df30 c0493300 00000000 00000000 d2fe7f44 c03b5b43 c04cffc8 00000046 c0131856 0000005a dc472d3c c0493300 c0493470 d983ae00 00002696 00000000 c0239f54 00000000 c04c4000 c04cffd8 c01025fe c04f3740 00049800 c04cffe0 Call Trace: [<c03b5b43>] ? schedule+0x285/0x2ff [<c0131856>] ? pm_qos_requirement+0x3c/0x53 [<c0239f54>] ? acpi_processor_idle+0x0/0x434 [<c01025fe>] ? cpu_idle+0x73/0x7f [<c03a4dcd>] ? rest_init+0x61/0x63 ======================= Wolfgang also found out that adding kernel_fpu_begin() and kernel_fpu_end() around the padlock instructions fix the oops. Suresh wrote: These padlock instructions though don't use/touch SSE registers, but it behaves similar to other SSE instructions. For example, it might cause DNA faults when cr0.ts is set. While this is a spurious DNA trap, it might cause oops with the recent fpu code changes. This is the code sequence that is probably causing this problem: a) new app is getting exec'd and it is somewhere in between start_thread() and flush_old_exec() in the load_xyz_binary() b) At pont "a", task's fpu state (like TS_USEDFPU, used_math() etc) is cleared. c) Now we get an interrupt/softirq which starts using these encrypt/decrypt routines in the network stack. This generates a math fault (as cr0.ts is '1') which sets TS_USEDFPU and restores the math that is in the task's xstate. d) Return to exec code path, which does start_thread() which does free_thread_xstate() and sets xstate pointer to NULL while the TS_USEDFPU is still set. e) At the next context switch from the new exec'd task to another task, we have a scenarios where TS_USEDFPU is set but xstate pointer is null. This can cause an oops during unlazy_fpu() in __switch_to() Now: 1) This should happen with or with out pre-emption. Viro also encountered similar problem with out CONFIG_PREEMPT. 2) kernel_fpu_begin() and kernel_fpu_end() will fix this problem, because kernel_fpu_begin() will manually do a clts() and won't run in to the situation of setting TS_USEDFPU in step "c" above. 3) This was working before the fpu changes, because its a spurious math fault which doesn't corrupt any fpu/sse registers and the task's math state was always in an allocated state. With out the recent lazy fpu allocation changes, while we don't see oops, there is a possible race still present in older kernels(for example, while kernel is using kernel_fpu_begin() in some optimized clear/copy page and an interrupt/softirq happens which uses these padlock instructions generating DNA fault). This is the failing scenario that existed even before the lazy fpu allocation changes: 0. CPU's TS flag is set 1. kernel using FPU in some optimized copy routine and while doing kernel_fpu_begin() takes an interrupt just before doing clts() 2. Takes an interrupt and ipsec uses padlock instruction. And we take a DNA fault as TS flag is still set. 3. We handle the DNA fault and set TS_USEDFPU and clear cr0.ts 4. We complete the padlock routine 5. Go back to step-1, which resumes clts() in kernel_fpu_begin(), finishes the optimized copy routine and does kernel_fpu_end(). At this point, we have cr0.ts again set to '1' but the task's TS_USEFPU is stilll set and not cleared. 6. Now kernel resumes its user operation. And at the next context switch, kernel sees it has do a FP save as TS_USEDFPU is still set and then will do a unlazy_fpu() in __switch_to(). unlazy_fpu() will take a DNA fault, as cr0.ts is '1' and now, because we are in __switch_to(), math_state_restore() will get confused and will restore the next task's FP state and will save it in prev tasks's FP state. Remember, in __switch_to() we are already on the stack of the next task but take a DNA fault for the prev task. This causes the fpu leakage. Fix the padlock instruction usage by calling them inside the context of new routines irq_ts_save/restore(), which clear/restore cr0.ts manually in the interrupt context. This will not generate spurious DNA in the context of the interrupt which will fix the oops encountered and the possible FPU leakage issue. Reported-and-bisected-by: Wolfgang Walter <wolfgang.walter@stwm.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
\| *	crypto: hash - Add missing top-level functions	Herbert Xu	2008-08-13	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \|	The top-level functions init/update/final were missing for ahash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
\| *	crypto: hash - Fix digest size check for digest type	Herbert Xu	2008-08-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The changeset ca786dc738f4f583b57b1bba7a335b5e8233f4b0 crypto: hash - Fixed digest size check missed one spot for the digest type. This patch corrects that error. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
\| *	crypto: tcrypt - Fix AEAD chunk testing	Herbert Xu	2008-08-13	1	-9/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My changeset 4b22f0ddb6564210c9ded7ba25b2a1007733e784 crypto: tcrpyt - Remove unnecessary kmap/kunmap calls introduced a typo that broke AEAD chunk testing. In particular, axbuf should really be xbuf. There is also an issue with testing the last segment when encrypting. The additional part produced by AEAD wasn't tested. Similarly, on decryption the additional part of the AEAD input is mistaken for corruption. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
\| *	crypto: talitos - Add handling for SEC 3.x treatment of link table	Lee Nipper	2008-08-13	1	-15/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Later SEC revision requires the link table (used for scatter/gather) to have an extra entry to account for the total length in descriptor [4], which contains cipher Input and ICV. This only applies to decrypt, not encrypt. Without this change, on 837x, a gather return/length error results when a decryption uses a link table to gather the fragments. This is observed by doing a ping with size of 1447 or larger with AES, or a ping with size 1455 or larger with 3des. So, add check for SEC compatible "fsl,3.0" for using extra link table entry. Signed-off-by: Lee Nipper <lee.nipper@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* \|	Merge git://oss.sgi.com:8090/xfs/linux-2.6	Linus Torvalds	2008-08-14	62	-1326/+883
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://oss.sgi.com:8090/xfs/linux-2.6: (45 commits) [XFS] Fix use after free in xfs_log_done(). [XFS] Make xfs_bmap__count_leaves void. [XFS] Use KM_NOFS for debug trace buffers [XFS] use KM_MAYFAIL in xfs_mountfs [XFS] refactor xfs_mount_free [XFS] don't call xfs_freesb from xfs_unmountfs [XFS] xfs_unmountfs should return void [XFS] cleanup xfs_mountfs [XFS] move root inode IRELE into xfs_unmountfs [XFS] stop using file_update_time [XFS] optimize xfs_ichgtime [XFS] update timestamp in xfs_ialloc manually [XFS] remove the sema_t from XFS. [XFS] replace dquot flush semaphore with a completion [XFS] replace inode flush semaphore with a completion [XFS] extend completions to provide XFS object flush requirements [XFS] replace the XFS buf iodone semaphore with a completion [XFS] clean up stale references to semaphores [XFS] use get_unaligned_ helpers [XFS] Fix compile failure in xfs_buf_trace() ...
\| * \|	[XFS] Fix use after free in xfs_log_done().	Lachlan McIlroy	2008-08-13	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ticket allocation code got reworked in 2.6.26 and we now free tickets whereas before we used to cache them so the use-after-free went undetected. SGI-PV: 985525 SGI-Modid: xfs-linux-melb:xfs-kern:31877a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
\| * \|	[XFS] Make xfs_bmap_*_count_leaves void.	Ruben Porras	2008-08-13	1	-19/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_bmap_count_leaves and xfs_bmap_disk_count_leaves always return always 0, make them void. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31844a Signed-off-by: Ruben Porras <ruben.porras@linworks.de> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Use KM_NOFS for debug trace buffers	Lachlan McIlroy	2008-08-13	6	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use KM_NOFS to prevent recursion back into the filesystem which can cause deadlocks. In the case of xfs_iread() we hold the lock on the inode cluster buffer while allocating memory for the trace buffers. If we recurse back into XFS to flush data that may require a transaction to allocate extents which needs log space. This can deadlock with the xfsaild thread which can't push the tail of the log because it is trying to get the inode cluster buffer lock. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31838a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
\| * \|	[XFS] use KM_MAYFAIL in xfs_mountfs	Christoph Hellwig	2008-08-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use KM_MAYFAIL for the m_perag allocation, we can deal with the error easily and blocking forever during mount is not a good idea either. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31837a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] refactor xfs_mount_free	Christoph Hellwig	2008-08-13	1	-16/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_mount_free mostly frees the perag data, which is something that is duplicated in the mount error path. Move the XFS_QM_DONE call to the caller and remove the useless mutex_destroy/spinlock_destroy calls so that we can re-use it for the mount error path. Also rename it to xfs_free_perag to reflect what it does. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31836a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] don't call xfs_freesb from xfs_unmountfs	Christoph Hellwig	2008-08-13	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_readsb is called before xfs_mount so xfs_freesb should be called after xfs_unmountfs, too. This means it now happens after a few things during the of xfs_unmount which all have nothing to do with the superblock. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31835a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] xfs_unmountfs should return void	Christoph Hellwig	2008-08-13	2	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_unmounts can't and shouldn't return errors so declare it as returning void. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31833a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] cleanup xfs_mountfs	Christoph Hellwig	2008-08-13	10	-50/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove all the useless flags and code keyed off it in xfs_mountfs. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31831a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] move root inode IRELE into xfs_unmountfs	Christoph Hellwig	2008-08-13	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The root inode is allocated in xfs_mountfs so it should be release in xfs_unmountfs. For the unmount case that means we do it after the the xfs_sync(mp, SYNC_WAIT \| SYNC_CLOSE) in the forced shutdown case and the dmapi unmount event. Note that both reference the rip variable which might be freed by that time in case inode flushing has kicked in, so strictly speaking this might count as a bug fix SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31830a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] stop using file_update_time	Christoph Hellwig	2008-08-13	3	-45/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no need to call file_update_time and then copy the values over to the XFS inode. The only additional thing in file_update_time are checks not applicable to the write path. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31829a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
\| * \|	[XFS] optimize xfs_ichgtime	Christoph Hellwig	2008-08-13	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Port a little optmization from file_update_time to xfs_ichgtime, and only update the timestamp and mark the inode dirty if the timestamp actually changes in the timer tick resultion supported by the running kernel. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31827a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] update timestamp in xfs_ialloc manually	Christoph Hellwig	2008-08-13	4	-21/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In xfs_ialloc we just want to set all timestamps to the current time. We don't need to mark the inode dirty like xfs_ichgtime does, and we don't need nor want the opimizations in xfs_ichgtime that I will introduce in the next patch. So just opencode the timestamp update in xfs_ialloc, and remove the new unused XFS_ICHGTIME_ACC case in xfs_ichgtime. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31825a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] remove the sema_t from XFS.	David Chinner	2008-08-13	2	-53/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that all users of the sema_t are gone from XFS we can finally kill it. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31823a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] replace dquot flush semaphore with a completion	David Chinner	2008-08-13	4	-41/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the new completion flush code to implement the dquot flush lock. Removes one of the final users of semaphores in the XFS code base. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31822a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] replace inode flush semaphore with a completion	David Chinner	2008-08-13	4	-40/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the new completion flush code to implement the inode flush lock. Removes one of the final users of semaphores in the XFS code base. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31817a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] extend completions to provide XFS object flush requirements	David Chinner	2008-08-13	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XFS object flushing doesn't quite match existing completion semantics. It mixed exclusive access with completion. That is, we need to mark an object as being flushed before flushing it to disk, and then block any other attempt to flush it until the completion occurs. We do this but adding an extra count to the completion before we start using them. However, we still need to determine if there is a completion in progress, and allow no-blocking attempts fo completions to decrement the count. To do this we introduce: int try_wait_for_completion(struct completion x) returns a failure status if done == 0, otherwise decrements done to zero and returns a "started" status. This is provided to allow counted completions to begin safely while holding object locks in inverted order. int completion_done(struct completion x) returns 1 if there is no waiter, 0 if there is a waiter (i.e. a completion in progress). This replaces the use of semaphores for providing this exclusion and completion mechanism. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31816a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] replace the XFS buf iodone semaphore with a completion	David Chinner	2008-08-13	4	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a completion. Change it to a completion and remove the last user of the sema_t from XFS. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31815a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] clean up stale references to semaphores	David Chinner	2008-08-13	3	-42/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A lot of code has been converted away from semaphores, but there are still comments that reference semaphore behaviour. The log code is the worst offender. Update the comments to reflect what the code really does now. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31814a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] use get_unaligned_* helpers	Harvey Harrison	2008-08-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31813a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Fix compile failure in xfs_buf_trace()	Lachlan McIlroy	2008-08-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SGI-PV: 957103 SGI-Modid: xfs-linux-melb:xfs-kern:31804a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Use the same btree_cur union member for alloc and inobt trees.	Christoph Hellwig	2008-08-13	3	-30/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The alloc and inobt btree use the same agbp/agno pair in the btree_cur union. Make them use the same bc_private.a union member so that code for these two short form btree implementations can be shared. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31788a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] small cleanups in xfs_btree.c	Christoph Hellwig	2008-08-13	1	-57/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unneeded xfs_btree_get_block forward declaration. Move xfs_btree_firstrec next to xfs_btree_lastrec. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31787a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] sanitize xfs_initialize_vnode	Christoph Hellwig	2008-08-13	6	-126/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sanitize setting up the Linux indode. Setting up the xfs_inode <-> inode link is opencoded in xfs_iget_core now because that's the only place it needs to be done, xfs_initialize_vnode is renamed to xfs_setup_inode and loses all superflous paramaters. The check for I_NEW is removed because it always is true and the di_mode check moves into xfs_iget_core because it's only needed there. xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode and the whole things is moved into xfs_iops.c where it belongs. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31782a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] kill bhv_vnode_t	Christoph Hellwig	2008-08-13	11	-38/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All remaining bhv_vnode_t instance are in code that's more or less Linux specific. (Well, for xfs_acl.c that could be argued, but that code is on the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the whole tree. We can clean up variable naming and some useless helpers later. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31781a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] remove some easy bhv_vnode_t instances	Christoph Hellwig	2008-08-13	6	-37/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In various places we can just move a VFS_I call into the argument list of called functions/macros instead of having a local bhv_vnode_t. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31776a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] kill xfs_lock_dir_and_entry	Christoph Hellwig	2008-08-13	3	-133/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When multiple inodes are locked in XFS it happens in order of the inode number, with the everything but the first inode trylocked if any of the previous inodes is in the AIL. Except for the sorting of the inodes this logic is implemented in xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry in a particularly stupid way adds a lock roundtrip if the inode ordering is not optimal. This patch adds a new helper xfs_lock_two_inodes that takes two inodes and locks them in the most optimal way according to the above locking protocol and uses it for all places that want to lock two inodes. The only caller of xfs_lock_inodes is xfs_rename which might lock up to four inodes. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31772a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] kill INDUCE_IO_ERROR	Christoph Hellwig	2008-08-13	3	-15/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All the error injection is already enabled through ifdef DEBUG, so kill the never set second cpp symbol to activate it without the rest of the debugging infrastructure. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31771a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] implement IHOLD/IRELE directly	Christoph Hellwig	2008-08-13	3	-37/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that all direct calls to VN_HOLD/VN_RELE are gone we can implement IHOLD/IRELE directly. For the IHOLD case also replace igrab with a direct increment of i_count because we are guaranteed to already have a live and referenced inode by the VFS. Also remove the vn_hold statistic because it's been rather meaningless for some time with most references done by other callers. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31764a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] remove remaining VN_HOLD calls	Christoph Hellwig	2008-08-13	3	-9/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use IHOLD(ip) instead of VN_HOLD(VFS_I(ip)). SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31765a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] remove spurious VN_HOLD/VN_RELE calls from xfs_acl.c	Christoph Hellwig	2008-08-13	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All the ACL routines are called from inode operations which are guaranteed to have a referenced inode by the VFS, so there's no need for the ACL code to grab another temporary one. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31763a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] kill vn_to_inode	Christoph Hellwig	2008-08-13	5	-28/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bhv_vnode_t is just a typedef for struct inode, so there's no need for a helper to convert between the two. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31761a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Remove vn_from_inode()	Christoph Hellwig	2008-08-13	2	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bhv_vnode_t is just a typedef for struct inode, so there's no need for a helper to convert between the two. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31760a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] remove shouting-indirection macros from xfs_trans.h	Eric Sandeen	2008-08-13	5	-59/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31758a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] convert xfs to use ERR_CAST	Eric Sandeen	2008-08-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Looks like somehow xfs got missed in the conversion that took place in e231c2ee64eb1c5cd3c63c31da9dac7d888dcf7f, "Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) <http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit diff;h=e231c2ee64eb1c5cd3c63c31da9dac7d888dcf7f>" SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31757a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] remove INT_GET and friends	Eric Sandeen	2008-08-13	1	-68/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thanks to hch's endian work, INT_GET etc are no longer used, and may as well be removed. INT_SET is still used in the acl code, though. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31756a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Move xfs_attr_rolltrans to xfs_trans_roll	Niv Sardi	2008-08-13	5	-79/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move it from the attr code to the transaction code and make the attr code call the new function. We rolltrans is really usefull whenever we want to use rolling transaction, should be generic, it isn't dependent on any part of the attr code anyway. We use this excuse to change all the: if ((error = xfs_attr_rolltrans())) calls into: error = xfs_trans_roll(); if (error) SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31729a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] don't leak m_fsname/m_rtname/m_logname	Christoph Hellwig	2008-08-13	2	-17/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a helper to free the m_fsname/m_rtname/m_logname allocations and use it properly for all mount failure cases. Also switch the allocations for these to kstrdup while we're at it. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31728a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Move attr log alloc size calculator to another function.	Niv Sardi	2008-08-13	2	-32/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We will need that to be able to calculate the size of log we need for a specific attr (for Create+EA). The local flag is needed so that we can fail if we run into ENOSPC when trying to alloc blocks. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31727a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Use KM_NOFS for incore inode extent tree allocation V2	David Chinner	2008-08-13	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we allow incore extent tree allocations to recurse into the filesystem under memory pressure, new delayed allocations through xfs_iomap_write_delay() can deadlock on themselves if memory reclaim tries to write back dirty pages from that inode. It will deadlock in xfs_iomap_write_allocate() trying to take the ilock we already hold. This can also show up as complex ABBA deadlocks when multiple threads are triggering memory reclaim when trying to allocate extents. The main cause of this is the fact that delayed allocation is not done in a transaction, so KM_NOFS is not automatically added to the allocations to prevent this recursion. Mark all allocations done for the incore inode extent tree as KM_NOFS to ensure they never recurse back into the filesystem. Version 2: o KM_NOFS implies KM_SLEEP, so just use KM_NOFS SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31726a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] XFS: Kill xfs_vtoi()	David Chinner	2008-08-13	4	-17/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfs_vtoi() is redundant and only unsed in small sections of code. Replace them with widely used XFS_I() inline and kill xfs_vtoi(). SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31725a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] Kill shouty XFS_ITOV() macro	David Chinner	2008-08-13	13	-24/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace XFS_ITOV() with the new VFS_I() inline. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31724a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
\| * \|	[XFS] kill shouty XFS_ITOV_NULL macro	David Chinner	2008-08-13	6	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace XFS_ITOV_NULL() with the new VFS_I() inline. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31722a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>