linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	compat: return -EFAULT on error in waitid()	Dan Carpenter	2013-02-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The copy_to_user() call returns the number of bytes remaining but we want to return -EFAULT on error. Fixes "x32: fix waitid()" Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	bug.h, compiler.h: introduce compiletime_assert & BUILD_BUG_ON_MSG	Daniel Santos	2013-02-22	2	-16/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce compiletime_assert to compiler.h, which moves the details of how to break a build and emit an error message for a specific compiler to the headers where these details should be. Following in the tradition of the POSIX assert macro, compiletime_assert creates a build-time error when the supplied condition is false. Next, we add BUILD_BUG_ON_MSG to bug.h which simply wraps compiletime_assert, inverting the logic, so that it fails when the condition is true, consistent with the language "build bug on." This macro allows you to specify the error message you want emitted when the supplied condition is true. Finally, we remove all other code from bug.h that mucks with these details (BUILD_BUG & BUILD_BUG_ON), and have them all call BUILD_BUG_ON_MSG. This not only reduces source code bloat, but also prevents the possibility of code being changed for one macro and not for the other (which was previously the case for BUILD_BUG and BUILD_BUG_ON). Since __compiletime_error_fallback is now only used in compiler.h, I'm considering it a private macro and removing the double negation that's now extraneous. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	compiler.h, bug.h: prevent double error messages with BUILD_BUG{,_ON}	Daniel Santos	2013-02-22	2	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to the introduction of __attribute__((error("msg"))) in gcc 4.3, creating compile-time errors required a little trickery. BUILD_BUG{,_ON} uses this attribute when available to generate compile-time errors, but also uses the negative-sized array trick for older compilers, resulting in two error messages in some cases. The reason it's "some" cases is that as of gcc 4.4, the negative-sized array will not create an error in some situations, like inline functions. This patch replaces the negative-sized array code with the new __compiletime_error_fallback() macro which expands to the same thing unless the the error attribute is available, in which case it expands to do{}while(0), resulting in exactly one compile-time error on all versions of gcc. Note that we are not changing the negative-sized array code for the unoptimized version of BUILD_BUG_ON, since it has the potential to catch problems that would be disabled in later versions of gcc were __compiletime_error_fallback used. The reason is that that an unoptimized build can't always remove calls to an error-attributed function call (like we are using) that should effectively become dead code if it were optimized. However, using a negative-sized array with a similar value will not result in an false-positive (error). The only caveat being that it will also fail to catch valid conditions, which we should be expecting in an unoptimized build anyway. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	bug.h: make BUILD_BUG_ON generate compile-time error	Daniel Santos	2013-02-22	1	-13/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Negative sized arrays wont create a compile-time error in some cases starting with gcc 4.4 (e.g., inlined functions), but gcc 4.3 introduced the error function attribute that will. This patch modifies BUILD_BUG_ON to behave like BUILD_BUG already does, using the error function attribute so that you don't have to build the entire kernel to discover that you have a problem, and then enjoy trying to track it down from a link-time error. Also, we are only including asm/bug.h and then expecting that linux/compiler.h will eventually be included to define __linktime_error (used in BUILD_BUG_ON). This patch includes it directly for clarity and to avoid the possibility of changes in <arch>/*/include/asm/bug.h being changed or not including linux/compiler.h for some reason. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	bug.h: prevent double evaulation of `condition' in BUILD_BUG_ON	Daniel Santos	2013-02-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When calling BUILD_BUG_ON in an optimized build using gcc 4.3 and later, the condition will be evaulated twice, possibily with side-effects. This patch eliminates that error. [akpm@linux-foundation.org: tweak code layout] Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	bug.h: fix BUILD_BUG_ON macro in __CHECKER__	Daniel Santos	2013-02-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When __CHECKER__ is defined, we disable all of the BUILD_BUG.* macros. However, both BUILD_BUG_ON_NOT_POWER_OF_2 and BUILD_BUG_ON was evaluating to nothing in this case, and we want (0) since this is a function-like macro that will be followed by a semicolon. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	compiler{,-gcc4}.h, bug.h: Remove duplicate macros	Daniel Santos	2013-02-22	3	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__linktime_error() does the same thing as __compiletime_error() and is only used in bug.h. Since the macro defines a function attribute that will cause a failure at compile-time (not link-time), it makes more sense to keep __compiletime_error(), which is also neatly mated with __compiletime_warning(). Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	compiler-gcc{3,4}.h: Use GCC_VERSION macro	Daniel Santos	2013-02-22	2	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using GCC_VERSION reduces complexity, is easier to read and is GCC's recommended mechanism for doing version checks. (Just don't ask me why they didn't define it in the first place.) This also makes it easy to merge compiler-gcc{,3,4}.h should somebody want to. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	compiler-gcc.h: Add gcc-recommended GCC_VERSION macro	Daniel Santos	2013-02-22	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Throughout compiler*.h, many version checks are made. These can be simplified by using the macro that gcc's documentation recommends. However, my primary reason for adding this is that I need bug-check macros that are enabled at certain gcc versions and it's cleaner to use this macro than the tradition method: #if __GNUC__ > 4 \|\| (__GNUC__ == 4 && __GNUC_MINOR__ => 2) If you add patch level, it gets this ugly: #if __GNUC__ > 4 \|\| (__GNUC__ == 4 && (__GNUC_MINOR__ > 2 \|\| \ __GNUC_MINOR__ == 2 __GNUC_PATCHLEVEL__ >= 1)) As opposed to: #if GCC_VERSION >= 40201 While having separate headers for gcc 3 & 4 eliminates some of this verbosity, they can still be cleaned up by this. See also: http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	compiler-gcc4.h: Reorder macros based upon gcc ver	Daniel Santos	2013-02-22	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This helps to keep the file from getting confusing, removes one duplicate version check and should encourage future editors to put new macros where they belong. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	device_cgroup: don't grab mutex in rcu callback	Jerry Snitselaar	2013-02-22	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 103a197c0c4e ("security/device_cgroup: lock assert fails in dev_exception_clean()") grabs devcgroup_mutex to fix assert failure, but a mutex can't be grabbed in rcu callback. Since there shouldn't be any other references when css_free is called, mutex isn't needed for list cleanup in devcgroup_css_free(). Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge tag 'please-pull-pstore' of ↵	Linus Torvalds	2013-02-21	6	-60/+192
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore patches from Tony Luck: "A few fixes to reduce places where pstore might hang a system in the crash path. Plus a new mountpoint (/sys/fs/pstore ... makes more sense then /dev/pstore)." Fix up trivial conflict in drivers/firmware/efivars.c * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore: Create a convenient mount point for pstore efi_pstore: Introducing workqueue updating sysfs efivars: Disable external interrupt while holding efivars->lock efi_pstore: Avoid deadlock in non-blocking paths pstore: Avoid deadlock in panic and emergency-restart path
\| *	pstore: Create a convenient mount point for pstore	Josh Boyer	2013-02-12	2	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using /dev/pstore as a mount point for the pstore filesystem is slightly awkward. We don't normally mount filesystems in /dev/ and the /dev/pstore file isn't created automatically by anything. While this method will still work, we can create a persistent mount point in sysfs. This will put pstore on par with things like cgroups and efivarfs. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| *	efi_pstore: Introducing workqueue updating sysfs	Seiji Aguchi	2013-02-12	2	-6/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Problem] efi_pstore creates sysfs entries, which enable users to access to NVRAM, in a write callback. If a kernel panic happens in an interrupt context, it may fail because it could sleep due to dynamic memory allocations during creating sysfs entries. [Patch Description] This patch removes sysfs operations from a write callback by introducing a workqueue updating sysfs entries which is scheduled after the write callback is called. Also, the workqueue is kicked in a just oops case. A system will go down in other cases such as panic, clean shutdown and emergency restart. And we don't need to create sysfs entries because there is no chance for users to access to them. efi_pstore will be robust against a kernel panic in an interrupt context with this patch. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| *	efivars: Disable external interrupt while holding efivars->lock	Seiji Aguchi	2013-02-12	1	-42/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Problem] There is a scenario which efi_pstore fails to log messages in a panic case. - CPUA holds an efi_var->lock in either efivarfs parts or efi_pstore with interrupt enabled. - CPUB panics and sends IPI to CPUA in smp_send_stop(). - CPUA stops with holding the lock. - CPUB kicks efi_pstore_write() via kmsg_dump(KSMG_DUMP_PANIC) but it returns without logging messages. [Patch Description] This patch disables an external interruption while holding efivars->lock as follows. In efi_pstore_write() and get_var_data(), spin_lock/spin_unlock is replaced by spin_lock_irqsave/spin_unlock_irqrestore because they may be called in an interrupt context. In other functions, they are replaced by spin_lock_irq/spin_unlock_irq. because they are all called from a process context. By applying this patch, we can avoid the problem above with a following senario. - CPUA holds an efi_var->lock with interrupt disabled. - CPUB panics and sends IPI to CPUA in smp_send_stop(). - CPUA receives the IPI after releasing the lock because it is disabling interrupt while holding the lock. - CPUB waits for one sec until CPUA releases the lock. - CPUB kicks efi_pstore_write() via kmsg_dump(KSMG_DUMP_PANIC) And it can hold the lock successfully. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Mike Waychison <mikew@google.com> Acked-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| *	efi_pstore: Avoid deadlock in non-blocking paths	Seiji Aguchi	2013-01-11	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Issue] There is a scenario which efi_pstore may hang up: - cpuA grabs efivars->lock - cpuB panics and calls smp_send_stop - smp_send_stop sends IRQ to cpuA - after 1 second, cpuB gives up on cpuA and sends an NMI instead - cpuA is now in an NMI handler while still holding efivars->lock - cpuB is deadlocked This case may happen if a firmware has a bug and cpuA is stuck talking with it. [Solution] This patch changes a spin_lock to a spin_trylock in non-blocking paths. and if the spin_lock has already taken by another cpu, it returns without accessing to a firmware to avoid the deadlock. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| *	pstore: Avoid deadlock in panic and emergency-restart path	Seiji Aguchi	2013-01-11	2	-6/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Issue] When pstore is in panic and emergency-restart paths, it may be blocked in those paths because it simply takes spin_lock. This is an example scenario which pstore may hang up in a panic path: - cpuA grabs psinfo->buf_lock - cpuB panics and calls smp_send_stop - smp_send_stop sends IRQ to cpuA - after 1 second, cpuB gives up on cpuA and sends an NMI instead - cpuA is now in an NMI handler while still holding buf_lock - cpuB is deadlocked This case may happen if a firmware has a bug and cpuA is stuck talking with it more than one second. Also, this is a similar scenario in an emergency-restart path: - cpuA grabs psinfo->buf_lock and stucks in a firmware - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer. And then, cpuB is deadlocked by taking psinfo->buf_lock again. [Solution] This patch avoids the deadlocking issues in both panic and emergency_restart paths by introducing a function, is_non_blocking_path(), to check if a cpu can be blocked in current path. With this patch, pstore is not blocked even if another cpu has taken a spin_lock, in those paths by changing from spin_lock_irqsave to spin_trylock_irqsave. In addition, according to a comment of emergency_restart() in kernel/sys.c, spin_lock shouldn't be taken in an emergency_restart path to avoid deadlock. This patch fits the comment below. <snip> /** * emergency_restart - reboot the system * * Without shutting down any hardware or taking any locks * reboot the system. This is called when we know we are in * trouble so this is our best effort to reboot. This is * safe to call in interrupt context. */ void emergency_restart(void) <snip> Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	Merge tag 'dlm-3.9' of ↵	Linus Torvalds	2013-02-21	2	-0/+18
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm update from David Teigland: "This includes a single patch to avoid excessive and unnecessary scanning of rsbs to free." * tag 'dlm-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: avoid scanning unchanged toss lists
\| * \|	dlm: avoid scanning unchanged toss lists	David Teigland	2013-01-07	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep track of whether a toss list contains any shrinkable rsbs. If not, dlm_scand can avoid scanning the list for rsbs to shrink. Unnecessary scanning can otherwise waste a lot of time because the toss lists can contain a large number of rsbs that are non-shrinkable (directory records). Signed-off-by: David Teigland <teigland@redhat.com>
* \| \|	Merge tag 'nfs-for-3.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs	Linus Torvalds	2013-02-21	26	-260/+415
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NFS client bugfixes from Trond Myklebust: - Fix an Oops in the pNFS layoutget code - Fix a number of NFSv4 and v4.1 state recovery deadlocks and hangs due to the interaction of the session drain lock and state management locks. - Remove task->tk_xprt, which was hiding a lot of RCU dereferencing bugs - Fix a long standing NFSv3 posix lock recovery bug. - Revert commit 324d003b0cd8 ("NFS: add nfs_sb_deactive_async to avoid deadlock"). It turned out that the root cause of the deadlock was due to interactions with the workqueues that have now been resolved. * tag 'nfs-for-3.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (22 commits) NLM: Ensure that we resend all pending blocking locks after a reclaim umount oops when remove blocklayoutdriver first sunrpc: silence build warning in gss_fill_context nfs: remove kfree() redundant null checks NFSv4.1: Don't decode skipped layoutgets NFSv4.1: Fix bulk recall and destroy of layouts NFSv4.1: Fix an ABBA locking issue with session and state serialisation NFSv4: Fix a reboot recovery race when opening a file NFSv4: Ensure delegation recall and byte range lock removal don't conflict NFSv4: Fix up the return values of nfs4_open_delegation_recall NFSv4.1: Don't lose locks when a server reboots during delegation return NFSv4.1: Prevent deadlocks between state recovery and file locking NFSv4: Allow the state manager to mark an open_owner as being recovered SUNRPC: Add missing static declaration to _gss_mech_get_by_name Revert "NFS: add nfs_sb_deactive_async to avoid deadlock" SUNRPC: Nuke the tk_xprt macro SUNRPC: Avoid RCU dereferences in the transport bind and connect code SUNRPC: Fix an RCU dereference in xprt_reserve SUNRPC: Pass pointers to struct rpc_xprt to the congestion window SUNRPC: Fix an RCU dereference in xs_local_rpcbind ...
\| * \| \|	NLM: Ensure that we resend all pending blocking locks after a reclaim	Trond Myklebust	2013-02-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, nlmclnt_lock will break out of the for(;;) loop when the reclaimer wakes up the blocking lock thread by setting nlm_lck_denied_grace_period. This causes the lock request to fail with an ENOLCK error. The intention was always to ensure that we resend the lock request after the grace period has expired. Reported-by: Wangyuan Zhang <Wangyuan.Zhang@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
\| * \| \|	umount oops when remove blocklayoutdriver first	fanchaoting	2013-02-17	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	now pnfs client uses block layout, maybe we can remove blocklayoutdriver first. if we umount later, it can cause oops in unset_pnfs_layoutdriver. because nfss->pnfs_curr_ld->clear_layoutdriver is invalid. reproduce it: modprobe blocklayoutdriver mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/ rmmod blocklayoutdriver umount /mnt then you can see following CPU 0 Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa04cfe6d>] [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286 RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001 RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800 RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400 R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550 FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0) Stack: ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7 ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400 Call Trace: [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4] [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs] [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs] [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70 [<ffffffff8117986a>] deactivate_super+0x4a/0x70 [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130 [<ffffffff81194d62>] sys_umount+0x72/0xe0 [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2 RIP [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP <ffff8800022d9e48> CR2: ffffffffa04a1b38 ---[ end trace 29f75aaedda058bf ]--- Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
\| * \| \|	sunrpc: silence build warning in gss_fill_context	Jeff Layton	2013-02-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 620038f6d23, gcc is throwing the following warning: CC [M] net/sunrpc/auth_gss/auth_gss.o In file included from include/linux/sunrpc/types.h:14:0, from include/linux/sunrpc/sched.h:14, from include/linux/sunrpc/clnt.h:18, from net/sunrpc/auth_gss/auth_gss.c:45: net/sunrpc/auth_gss/auth_gss.c: In function ‘gss_pipe_downcall’: include/linux/sunrpc/debug.h:45:10: warning: ‘timeout’ may be used uninitialized in this function [-Wmaybe-uninitialized] printk(KERN_DEFAULT args); \ ^ net/sunrpc/auth_gss/auth_gss.c:194:15: note: ‘timeout’ was declared here unsigned int timeout; ^ If simple_get_bytes returns an error, then we'll end up calling printk with an uninitialized timeout value. Reasonably harmless, but fairly simple to fix by removing the printout of the uninitialised parameters. Cc: Andy Adamson <andros@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> [Trond: just remove the parameters rather than initialising timeout] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	nfs: remove kfree() redundant null checks	Tim Gardner	2013-02-17	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	smatch analysis: fs/nfs/getroot.c:130 nfs_get_root() info: redundant null check on name calling kfree() fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null check on devname_garbage calling kfree() Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	NFSv4.1: Don't decode skipped layoutgets	Weston Andros Adamson	2013-02-17	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0). Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean that the RPC was successfully sent, received and parsed. To fix this, use the result's len member to see if parsing took place. This fixes the following OOPS -- calling xdr_init_decode() with a buffer length 0 doesn't set the stream's 'p' member and ends up using uninitialized memory in filelayout_decode_layout. BUG: unable to handle kernel paging request at 0000000000008050 IP: [<ffffffff81282e78>] memcpy+0x18/0x120 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq CPU 1 Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib] Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff81282e78>] [<ffffffff81282e78>] memcpy+0x18/0x120 RSP: 0018:ffff88003dfab588 EFLAGS: 00010206 RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009 RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000 RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024 R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0 FS: 0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540) Stack: ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0 <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70 <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc] [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files] [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files] [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
\| * \| \|	NFSv4.1: Fix bulk recall and destroy of layouts	Trond Myklebust	2013-02-14	3	-74/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current code in pnfs_destroy_all_layouts() assumes that removing the layout from the server->layouts list is sufficient to make it invisible to other processes. This ignores the fact that most users access the layout through the nfs_inode->layout... There is further breakage due to lack of reference counting of the layouts, meaning that the whole thing Oopses at the drop of a hat. The code in initiate_bulk_draining() is almost correct, and can be used as a model for pnfs_destroy_all_layouts(), so move that code to pnfs.c, and refactor the code to allow us to choose between a single filesystem bulk recall, and a recall of all layouts. Also note that initiate_bulk_draining() currently calls iput() while holding locks. Fix that too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
\| * \| \|	NFSv4.1: Fix an ABBA locking issue with session and state serialisation	Trond Myklebust	2013-02-12	1	-12/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that if nfs_wait_on_sequence() causes our rpc task to wait for an NFSv4 state serialisation lock, then we also drop the session slot. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
\| * \| \|	NFSv4: Fix a reboot recovery race when opening a file	Trond Myklebust	2013-02-11	1	-12/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the server reboots after it has replied to our OPEN, but before we call nfs4_opendata_to_nfs4_state(), then the reboot recovery thread will not see a stateid for this open, and so will fail to recover it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	NFSv4: Ensure delegation recall and byte range lock removal don't conflict	Trond Myklebust	2013-02-11	4	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a mutex to the struct nfs4_state_owner to ensure that delegation recall doesn't conflict with byte range lock removal. Note that we nest the new mutex _outside_ the state manager reclaim protection (nfsi->rwsem) in order to avoid deadlocks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	NFSv4: Fix up the return values of nfs4_open_delegation_recall	Trond Myklebust	2013-02-11	1	-14/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adjust the return values so that they return EAGAIN to the caller in cases where we might want to retry the delegation recall after the state recovery has run. Note that we can't wait and retry in this routine, because the caller may be the state manager thread. If delegation recall fails due to a session or reboot related issue, also ensure that we mark the stateid as delegated so that nfs_delegation_claim_opens can find it again later. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	NFSv4.1: Don't lose locks when a server reboots during delegation return	Trond Myklebust	2013-02-11	3	-43/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the server reboots while we are converting a delegation into OPEN/LOCK stateids as part of a delegation return, the current code will simply exit with an error. This causes us to lose both delegation state and locking state (i.e. locking atomicity). Deal with this by exposing the delegation stateid during delegation return, so that we can recover the delegation, and then resume open/lock recovery. Note that not having to hold the nfs_inode->rwsem across the calls to nfs_delegation_claim_opens() also fixes a deadlock against the NFSv4.1 reboot recovery code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	NFSv4.1: Prevent deadlocks between state recovery and file locking	Trond Myklebust	2013-02-11	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently have a deadlock in which the state recovery thread ends up blocking due to one of the locks which it is trying to recover holding the nfs_inode->rwsem. The situation is as follows: the state recovery thread is scheduled in order to recover from a reboot. It immediately drains the session, forcing all ordinary NFSv4.1 calls to nfs41_setup_sequence() to be put to sleep. This includes the file locking process that holds the nfs_inode->rwsem. When the thread gets to nfs4_reclaim_locks(), it tries to grab a write lock on nfs_inode->rwsem, and boom... Fix is to have the lock drop the nfs_inode->rwsem while it is doing RPC calls. We use a sequence lock in order to signal to the locking process whether or not a state recovery thread has run on that inode, in which case it should retry the lock. Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	NFSv4: Allow the state manager to mark an open_owner as being recovered	Trond Myklebust	2013-02-11	2	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a seqcount_t lock for use by the state manager to signal that an open owner has been recovered. This mechanism will be used by the delegation, open and byte range lock code in order to figure out if they need to replay requests due to collisions with lock recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Add missing static declaration to _gss_mech_get_by_name	Trond Myklebust	2013-02-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ditto for _gss_mech_get_by_pseudoflavor. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"	Trond Myklebust	2013-02-01	5	-56/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 324d003b0cd82151adbaecefef57b73f7959a469. The deadlock turned out to be caused by a workqueue limitation that has now been worked around in the RPC code (see comment in rpc_free_task). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Nuke the tk_xprt macro	Trond Myklebust	2013-02-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is no longer in use Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Avoid RCU dereferences in the transport bind and connect code	Trond Myklebust	2013-02-01	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid an RCU dereference by removing task->tk_xprt Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Fix an RCU dereference in xprt_reserve	Trond Myklebust	2013-02-01	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Pass pointers to struct rpc_xprt to the congestion window	Trond Myklebust	2013-02-01	3	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid access to task->tk_xprt Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Fix an RCU dereference in xs_local_rpcbind	Trond Myklebust	2013-02-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback	Trond Myklebust	2013-02-01	4	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid another RCU dereference by passing the pointer to struct rpc_xprt from the caller. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| * \| \|	SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()	Trond Myklebust	2013-02-01	7	-16/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is defined as an __rcu variable. Replace dereferences of tk_xprt with non-rcu dereferences where it is safe to do so. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw	Linus Torvalds	2013-02-21	23	-374/+375
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull GFS2 updates from Steven Whitehouse: "This is one of the smallest collections of patches for the merge window for some time. There are some clean ups relating to the transaction code and the shrinker, which are mostly in preparation for further development, but also make the code much easier to follow in these areas. There is a patch which allows the use of ->writepages even in the default ordered write mode for all writebacks. This results in sending larger i/os to the block layer, and a subsequent increase in performance. It also reduces the number of different i/o paths by one. There is also a bug fix reinstating the withdraw ack system which somehow got lost when the lock modules were merged into GFS2." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Reinstate withdraw ack system GFS2: Get a block reservation before resizing a file GFS2: Split glock lru processing into two parts GFS2: Use ->writepages for ordered writes GFS2: Clean up freeze code GFS2: Merge gfs2_attach_bufdata() into trans.c GFS2: Copy gfs2_trans_add_bh into new data/meta functions GFS2: Split gfs2_trans_add_bh() into two GFS2: Merge revoke adding functions GFS2: Separate LRU scanning from shrinker
\| * \| \| \|	GFS2: Reinstate withdraw ack system	Steven Whitehouse	2013-02-13	4	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch reinstates the ack system which withdraw should be using. It appears to have been accidentally forgotten when the lock module was merged into GFS2, due to two different sysfs files having the same name. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \| \| \|	GFS2: Get a block reservation before resizing a file	Bob Peterson	2013-02-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allocates a block reservation structure before growing or shrinking a file. Without this structure, the grow or shink code can reference the bad pointer. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \| \| \|	GFS2: Split glock lru processing into two parts	Steven Whitehouse	2013-02-01	1	-23/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The intent here is to split the processing of the glock lru list into two parts, so that the selection of glocks and the disposal are separate functions. The plan is then, that further updates can then be made to these functions in the future to improve the selection of glocks and also the efficiency of glock disposal. The new feature which this patch brings is sorting the glocks to be disposed of into glock number (and thus also disk block number) order. Not all glocks will need i/o in order to dispose of them, but some will, and at least we'll generate mostly disk block order i/o now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \| \| \|	GFS2: Use ->writepages for ordered writes	Steven Whitehouse	2013-01-29	8	-70/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \| \| \|	GFS2: Clean up freeze code	Steven Whitehouse	2013-01-29	5	-78/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The freeze code has not been looked at a lot recently. Upstream has moved on, and this is an attempt to catch us back up again. There is a vfs level interface for the freeze code which can be called from our (obsolete, but kept for backward compatibility purposes) sysfs freeze interface. This means freezing this way vs. doing it from the ioctl should now work in identical fashion. As a result of this, the freeze function is only called once and we can drop our own special purpose code for counting the number of freezes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \| \| \|	GFS2: Merge gfs2_attach_bufdata() into trans.c	Steven Whitehouse	2013-01-29	4	-56/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The locking in gfs2_attach_bufdata() was type specific (data/meta) which made the function rather confusing. This patch moves the core of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata() and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta() As a result all of the locking related to adding data and metadata to the journal is now in these two functions. This should help to clarify what is going on, and give us some opportunities to simplify in some cases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \| \| \|	GFS2: Copy gfs2_trans_add_bh into new data/meta functions	Steven Whitehouse	2013-01-29	4	-84/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch copies the body of gfs2_trans_add_bh into the two newly added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can then move the .lo_add functions from lops.c into trans.c and call them directly. As a result of this, we no longer need to use the .lo_add functions at all, so that is removed from the log operations structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>