linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	percpu: fix DEBUG_PREEMPT per_cpu checking	Hugh Dickins	2008-02-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2.6.25-rc1 percpu changes broke CONFIG_DEBUG_PREEMPT's per_cpu checking on several architectures. On s390, sparc64 and x86 it's been weakened to not checking at all; whereas on powerpc64 it's become too strict, issuing warnings from __raw_get_cpu_var in io_schedule and init_timer for example. Fix this by weakening powerpc's __my_cpu_offset to use the non-checking local_paca instead of get_paca (which itself contains such a check); and strengthening the generic my_cpu_offset to go the old slow way via smp_processor_id when CONFIG_DEBUG_PREEMPT (debug_smp_processor_id is where all the knowledge of what's correct when lives). Signed-off-by: Hugh Dickins <hugh@veritas.com> Reviewed-by: Mike Travis <travis@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	POWERPC: use generic per cpu	travis@sgi.com	2008-01-30	1	-19/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Powerpc has a way to determine the address of the per cpu area of the currently executing processor via the paca and the array of per cpu offsets is avoided by looking up the per cpu area from the remote paca's (copying x86_64). Cc: Paul Mackerras <paulus@samba.org> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Olof Johansson <olof@lixom.net> Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
*	modules: fold percpu_modcopy into module.c	travis@sgi.com	2008-01-30	1	-9/+0
\| \| \| \| \| \| \| \|	percpu_modcopy() is defined multiple times in arch files. However, the only user is module.c. Put a static definition into module.c and remove the definitions from the arch files. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	percpu: move arch XX_PER_CPU_XX definitions into linux/percpu.h	travis@sgi.com	2008-01-30	1	-17/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Special consideration for IA64: Add the ability to specify arch specific per cpu flags - remove .data.percpu attribute from DEFINE_PER_CPU for non-smp case. The arch definitions are all the same. So move them into linux/percpu.h. We cannot move DECLARE_PER_CPU since some include files just include asm/percpu.h to avoid include recursion problems. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	[POWERPC] ppc64: support CONFIG_DEBUG_PREEMPT	Hugh Dickins	2007-10-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add CONFIG_DEBUG_PREEMPT support to ppc64: it was useful for testing get_paca() preemption. Cheat a little, just use debug_smp_processor_id() in the debug version of get_paca(): it contains all the right checks and reporting, though get_paca() doesn't really use smp_processor_id(). Use local_paca for what might have been called __raw_get_paca(). Silence harmless warnings from io.h and lparcfg.c with local_paca - it is okay for iseries_lparcfg_data to be referencing shared_proc with preemption enabled: all cpus should show the same value for shared_proc. Why do other architectures need TRACE_IRQFLAGS_SUPPORT for DEBUG_PREEMPT? I don't know, ppc64 appears to get along fine without it. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	define new percpu interface for shared data	Fenghua Yu	2007-07-19	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	per cpu data section contains two types of data. One set which is exclusively accessed by the local cpu and the other set which is per cpu, but also shared by remote cpus. In the current kernel, these two sets are not clearely separated out. This can potentially cause the same data cacheline shared between the two sets of data, which will result in unnecessary bouncing of the cacheline between cpus. One way to fix the problem is to cacheline align the remotely accessed per cpu data, both at the beginning and at the end. Because of the padding at both ends, this will likely cause some memory wastage and also the interface to achieve this is not clean. This patch: Moves the remotely accessed per cpu data (which is currently marked as ____cacheline_aligned_in_smp) into a different section, where all the data elements are cacheline aligned. And as such, this differentiates the local only data and remotely accessed data cleanly. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] lockdep: add per_cpu_offset()	Ingo Molnar	2006-07-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Add the per_cpu_offset() generic method. (used by the lock validator) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Define __raw_get_cpu_var and use it	Paul Mackerras	2006-06-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several instances of per_cpu(foo, raw_smp_processor_id()), which is semantically equivalent to __get_cpu_var(foo) but without the warning that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For those architectures with optimized per-cpu implementations, namely ia64, powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower code than __get_cpu_var(), so it would be preferable to use __get_cpu_var on those platforms. This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x, raw_smp_processor_id()) on architectures that use the generic per-cpu implementation, and turns into __get_cpu_var(x) on the architectures that have an optimized per-cpu implementation. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] for_each_possible_cpu: powerpc	KAMEZAWA Hiroyuki	2006-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] more for_each_cpu() conversions	Andrew Morton	2006-03-23	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we stop allocating percpu memory for not-possible CPUs we must not touch the percpu data for not-possible CPUs at all. The correct way of doing this is to test cpu_possible() or to use for_each_cpu(). This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very few instances of this bug, if any. But the patch converts lots of open-coded test to use the preferred helper macros. Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Kyle McMartin <kyle@parisc-linux.org> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andi Kleen <ak@muc.de> Cc: Christian Zankel <chris@zankel.net> Cc: Philippe Elie <phil.el@wanadoo.fr> Cc: Nathan Scott <nathans@sgi.com> Cc: Jens Axboe <axboe@suse.de> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] powerpc/64: per cpu data optimisations	Anton Blanchard	2006-01-11	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current ppc64 per cpu data implementation is quite slow. eg: lhz 11,18(13) /* smp_processor_id() / ld 9,.LC63-.LCTOC1(30) / per_cpu__variable_name / ld 8,.LC61-.LCTOC1(30) / __per_cpu_offset / sldi 11,11,3 / form index into __per_cpu_offset / mr 10,9 ldx 9,11,8 / __per_cpu_offset[smp_processor_id()] / ldx 0,10,9 / load per cpu data / 5 loads for something that is supposed to be fast, pretty awful. One reason for the large number of loads is that we have to synthesize 2 64bit constants (per_cpu__variable_name and __per_cpu_offset). By putting __per_cpu_offset into the paca we can avoid the 2 loads associated with it: ld 11,56(13) / paca->data_offset / ld 9,.LC59-.LCTOC1(30) / per_cpu__variable_name / ldx 0,9,11 / load per cpu data Longer term we can should be able to do even better than 3 loads. If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset was in a register we could cut it down to one load. A suggestion from Rusty is to use gcc's __thread extension here. In order to do this we would need to free up r13 (the __thread register and where the paca currently is). So far Ive had a few unsuccessful attempts at doing that :) The patch also allocates per cpu memory node local on NUMA machines. This patch from Rusty has been sitting in my queue _forever_ but stalled when I hit the compiler bug. Sorry about that. Finally I also only allocate per cpu data for possible cpus, which comes straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128) and 4 possible cpus we see some nice gains: total used free shared buffers cached Mem: 4012228 212860 3799368 0 0 162424 total used free shared buffers cached Mem: 4016200 212984 3803216 0 0 162424 A saving of 3.75MB. Quite nice for smaller machines. Note: we now have to be careful of per cpu users that touch data for !possible cpus. At this stage it might be worth making the NUMA and possible cpu optimisations generic, but per cpu init is done so early we have to be careful that all architectures have their possible map setup correctly. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] Move all the very similar files to asm-powerpc	Stephen Rothwell	2005-08-30	1	-0/+1
	They differed in either simple comments or in the protecting ifdefs. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>