summaryrefslogtreecommitdiffstats
path: root/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* x86, 64-bit: Clean up user address maskingLinus Torvalds2009-06-214-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The discussion about using "access_ok()" in get_user_pages_fast() (see commit 7f8189068726492950bf1a2dcfd9b51314560abf: "x86: don't use 'access_ok()' as a range check in get_user_pages_fast()" for details and end result), made us notice that x86-64 was really being very sloppy about virtual address checking. So be way more careful and straightforward about masking x86-64 virtual addresses: - All the VIRTUAL_MASK* variants now cover half of the address space, it's not like we can use the full mask on a signed integer, and the larger mask just invites mistakes when applying it to either half of the 48-bit address space. - /proc/kcore's kc_offset_to_vaddr() becomes a lot more obvious when it transforms a file offset into a (kernel-half) virtual address. - Unify/simplify the 32-bit and 64-bit USER_DS definition to be based on TASK_SIZE_MAX. This cleanup and more careful/obvious user virtual address checking also uncovered a buglet in the x86-64 implementation of strnlen_user(): it would do an "access_ok()" check on the whole potential area, even if the string itself was much shorter, and thus return an error even for valid strings. Our sloppy checking had hidden this. So this fixes 'strnlen_user()' to do this properly, the same way we already handled user strings in 'strncpy_from_user()'. Namely by just checking the first byte, and then relying on fault handling for the rest. That always works, since we impose a guard page that cannot be mapped at the end of the user space address space (and even if we didn't, we'd have the address space hole). Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds2009-06-2019-397/+1075
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...
| * perf_counter, x86: Improve interactions with fast-gupIngo Molnar2009-06-192-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve a few details in perfcounter call-chain recording that makes use of fast-GUP: - Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally racy and can be changed on another CPU, so we have to be careful about how we access them. The PAE branch is already careful with read-barriers - but the non-PAE and 64-bit side needs an ACCESS_ONCE() to make sure the pte value is observed only once. - make the checks a bit stricter so that we can feed it any kind of cra^H^H^H user-space input ;-) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: Make callchain samples extensiblePeter Zijlstra2009-06-191-23/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before exposing upstream tools to a callchain-samples ABI, tidy it up to make it more extensible in the future: Use markers in the IP chain to denote context, use (u64)-1..-4095 range for these context markers because we use them for ERR_PTR(), so these addresses are unlikely to be mapped. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: powerpc: Add processor back-end for MPC7450 familyPaul Mackerras2009-06-183-0/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the performance monitor hardware on the MPC7450 family of processors (7450, 7451, 7455, 7447/7457, 7447A, 7448), used in the later Apple G4 powermacs/powerbooks and other machines. These machines have 6 hardware counters with a unique set of events which can be counted on each counter, with some events being available on multiple counters. Raw event codes for these processors are (PMC << 8) + PMCSEL. If PMC is non-zero then the event is that selected by the given PMCSEL value for that PMC (hardware counter). If PMC is zero then the event selected is one of the low-numbered ones that are common to several PMCs. In this case PMCSEL must be <= 22 and the event is what that PMCSEL value would select on PMC1 (but it may be placed any other PMC that has the same event for that PMCSEL value). For events that count cycles or occurrences that exceed a threshold, the threshold requested can be specified in the 0x3f000 bits of the raw event codes. If the event uses the threshold multiplier bit and that bit should be set, that is indicated with the 0x40000 bit of the raw event code. This fills in some of the generic cache events. Unfortunately there are quite a few blank spaces in the table, partly because these processors tend to count cache hits rather than cache accesses. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55631.802122.696927@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernelsPaul Mackerras2009-06-181-60/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This abstracts a few things in arch/powerpc/kernel/perf_counter.c that are specific to 64-bit kernels, and provides definitions for 32-bit kernels. In particular, * Only 64-bit has MMCRA and the bits in it that give information about a PMU interrupt (sampled PR, HV, slot number etc.) * Only 64-bit has the lppaca and the lppaca->pmcregs_in_use field * Use of SDAR is confined to 64-bit for now * Only 64-bit has soft/lazy interrupt disable and therefore pseudo-NMIs (interrupts that occur while interrupts are soft-disabled) * Only 64-bit has PMC7 and PMC8 * Only 64-bit has the MSR_HV bit. This also fixes the types used in a couple of places, where we were using long types for things that need to be 64-bit. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55590.634126.876084@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: powerpc: Change how processor-specific back-ends get selectedPaul Mackerras2009-06-188-45/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present, the powerpc generic (processor-independent) perf_counter code has list of processor back-end modules, and at initialization, it looks at the PVR (processor version register) and has a switch statement to select a suitable processor-specific back-end. This is going to become inconvenient as we add more processor-specific back-ends, so this inverts the order: now each back-end checks whether it applies to the current processor, and registers itself if so. Furthermore, instead of looking at the PVR, back-ends now check the cur_cpu_spec->oprofile_cpu_type string and match on that. Lastly, each back-end now specifies a name for itself so the core can print a nice message when a back-end registers itself. This doesn't provide any support for unregistering back-ends, but that wouldn't be hard to do and would allow back-ends to be modules. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55529.762227.518531@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: powerpc: Use unsigned long for register and constraint valuesPaul Mackerras2009-06-188-212/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the powerpc perf_counter back-end to use unsigned long types for hardware register values and for the value/mask pairs used in checking whether a given set of events fit within the hardware constraints. This is in preparation for adding support for the PMU on some 32-bit powerpc processors. On 32-bit processors the hardware registers are only 32 bits wide, and the PMU structure is generally simpler, so 32 bits should be ample for expressing the hardware constraints. On 64-bit processors, unsigned long is 64 bits wide, so using unsigned long vs. u64 (unsigned long long) makes no actual difference. This makes some other very minor changes: adjusting whitespace to line things up in initialized structures, and simplifying some code in hw_perf_disable(). Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55473.26174.331511@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: powerpc: Enable use of software counters on 32-bit powerpcPaul Mackerras2009-06-186-7/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables the perf_counter subsystem on 32-bit powerpc. Since we don't have any support for hardware counters on 32-bit powerpc yet, only software counters can be used. Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as 64-bit, the main thing this does is add an implementation of set_perf_counter_pending(). This needs to arrange for perf_counter_do_pending() to be called when interrupts are enabled. Rather than add code to local_irq_restore as 64-bit does, the 32-bit set_perf_counter_pending() generates an interrupt by setting the decrementer to 1 so that a decrementer interrupt will become pending in 1 or 2 timebase ticks (if a decrementer interrupt isn't already pending). When interrupts are enabled, timer_interrupt() will be called, and some new code in there calls perf_counter_do_pending(). We use a per-cpu array of flags to indicate whether we need to call perf_counter_do_pending() or not. This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT, which is selected by processor families for which we have hardware PMU support (currently only PPC64), and PPC_PERF_CTRS, which enables the powerpc-specific perf_counter back-end. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linuxppc-dev@ozlabs.org Cc: benh@kernel.crashing.org LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: x86: Set the period in the intel overflow handlerPeter Zijlstra2009-06-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9e350de37ac960 ("perf_counter: Accurate period data") missed a spot, which caused all Intel-PMU samples to have a period of 0. This broke auto-freq sampling. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'linus' into perfcounters/coreIngo Molnar2009-06-171193-18010/+71502
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/include/asm/kmap_types.h include/linux/mm.h include/asm-generic/kmap_types.h Merge reason: We crossed changes with kmap_types.h cleanups in mainline. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86: Add NMI types for kmap_atomic, fixPeter Zijlstra2009-06-152-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I just realized this has a kmap_atomic bug in... The below would fix it - but it's complicating this code some more. Alternatively I would have to introduce something like pte_offset_map_irq() which would make the irq/nmi detection and leave the regular code paths alone, however that would mean either duplicating the gup_fast() pagewalk or passing down a pte function pointer, which would only duplicate the gup_pte_range() bit, neither is really attractive ... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Nick Piggin <npiggin@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: Make set_perf_counter_pending() declaration commonPaul Mackerras2009-06-153-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present, every architecture that supports perf_counters has to declare set_perf_counter_pending() in its arch-specific headers. This consolidates the declarations into a single declaration in one common place, include/linux/perf_counter.h. On powerpc, we continue to provide a static inline definition of set_perf_counter_pending() in the powerpc hw_irq.h. Also, this removes from the x86 perf_counter.h the unused null definitions of {test,clear}_perf_counter_pending. Reported-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: benh@kernel.crashing.org LKML-Reference: <18998.13388.920691.523227@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: powerpc: Fix two compile warningsPaul Mackerras2009-06-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a couple of compile warnings that crept into the powerpc perf_counter code recently: CC arch/powerpc/kernel/perf_counter.o arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart': arch/powerpc/kernel/perf_counter.c:1016: warning: unused variable 'addr' arch/powerpc/kernel/perf_counter.c: In function 'hw_perf_counter_init': arch/powerpc/kernel/perf_counter.c:891: warning: 'ev' may be used uninitialized in this function Stephen Rothwell reported this against linux-next as well. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <18998.12884.787039.22202@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: x86: Fix call-chain support to use NMI-safe methodsPeter Zijlstra2009-06-151-10/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __copy_from_user_inatomic() isn't NMI safe in that it can trigger the page fault handler which is another trap and its return path invokes IRET which will also close the NMI context. Therefore use a GUP based approach to copy the stack frames over. We tried an alternative solution as well: we used a forward ported version of Mathieu Desnoyers's "NMI safe INT3 and Page Fault" patch that modifies the exception return path to use an open-coded IRET with explicit stack unrolling and TF checking. This didnt work as it interacted with faulting user-space instructions, causing them not to restart properly, which corrupts user-space registers. Solving that would probably involve disassembling those instructions and backtracing the RIP. But even without that, the code was deemed rather complex to the already non-trivial x86 entry assembly code, so instead we went for this GUP based method that does a software-walk of the pagetables. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <npiggin@suse.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86: Add NMI types for kmap_atomicPeter Zijlstra2009-06-152-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two new kmap_atomic slots for NMI context. And teach pte_offset_map() about NMI context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Nick Piggin <npiggin@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86, mm: Add __get_user_pages_fast()Peter Zijlstra2009-06-151-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a gup_fast() variant which is usable from IRQ/NMI context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Nick Piggin <npiggin@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter, x86: Fix kernel-space call-chainsIngo Molnar2009-06-151-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel-space call-chains were trimmed at the first entry because we never processed anything beyond the first stack context. Allow the backtrace to jump from NMI to IRQ stack then to task stack and finally user-space stack. Also calculate the stack and bp variables correctly so that the stack walker does not exit early. We can get deep traces as a result, visible in perf report -D output: 0x32af0 [0xe0]: PERF_EVENT (IP, 5): 15134: 0xffffffff815225fd period: 1 ... chain: u:2, k:22, nr:24 ..... 0: 0xffffffff815225fd ..... 1: 0xffffffff810ac51c ..... 2: 0xffffffff81018e29 ..... 3: 0xffffffff81523939 ..... 4: 0xffffffff81524b8f ..... 5: 0xffffffff81524bd9 ..... 6: 0xffffffff8105e498 ..... 7: 0xffffffff8152315a ..... 8: 0xffffffff81522c3a ..... 9: 0xffffffff810d9b74 ..... 10: 0xffffffff810dbeec ..... 11: 0xffffffff810dc3fb This is a 22-entries kernel-space chain. (We still only record reliable stack entries.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter, x86: Fix call-chain walkingIngo Molnar2009-06-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the ptregs variant when we hit user-mode tasks. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter, x86: Update AMD hw caching related event tableJaswinder Singh Rajput2009-06-131-21/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All AMD models share the same hw caching related event table. Also complete the table with more events. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1244835381.2802.2.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter, x86: Check old-AMD performance monitoring supportJaswinder Singh Rajput2009-06-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD supports performance monitoring start from K7 (i.e. family 6), so disable it for earlier AMD CPUs. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1244714289.6923.0.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2009-06-202-3/+11
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix out of scope variable access in sched_slice() sched: Hide runqueues from direct refer at source code level sched: Remove unneeded __ref tag sched, x86: Fix cpufreq + sched_clock() TSC scaling
| * | | sched, x86: Fix cpufreq + sched_clock() TSC scalingPeter Zijlstra2009-06-172-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For freqency dependent TSCs we only scale the cycles, we do not account for the discrepancy in absolute value. Our current formula is: time = cycles * mult (where mult is a function of the cpu-speed on variable tsc machines) Suppose our current cycle count is 10, and we have a multiplier of 5, then our time value would end up being 50. Now cpufreq comes along and changes the multiplier to say 3 or 7, which would result in our time being resp. 30 or 70. That means that we can observe random jumps in the time value due to frequency changes in both fwd and bwd direction. So what this patch does is change the formula to: time = cycles * frequency + offset And we calculate offset so that time_before == time_after, thereby ridding us of these jumps in time. [ Impact: fix/reduce sched_clock() jumps across frequency changing events ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Chucked-on-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds2009-06-206-4/+11
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
| * | | | function-graph: add stack frame testSteven Rostedt2009-06-196-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2009-06-2037-597/+670
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits) x86, mce: fix error path in mce_create_device() x86: use zalloc_cpumask_var for mce_dev_initialized x86: fix duplicated sysfs attribute x86: de-assembler-ize asm/desc.h i386: fix/simplify espfix stack switching, move it into assembly i386: fix return to 16-bit stack from NMI handler x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present x86: Remove duplicated #include's x86: msr.h linux/types.h is only required for __KERNEL__ x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround x86, mce: mce_intel.c needs <asm/apic.h> x86: apic/io_apic.c: dmar_msi_type should be static x86, io_apic.c: Work around compiler warning x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present x86: mce: Handle banks == 0 case in K7 quirk x86, boot: use .code16gcc instead of .code16 x86: correct the conversion of EFI memory types x86: cap iomem_resource to addressable physical memory x86, mce: rename _64.c files which are no longer 64-bit-specific x86, mce: mce.h cleanup ... Manually fix up trivial conflict in arch/x86/mm/fault.c
| * \ \ \ \ Merge branch 'x86/mce3' into x86/urgentIngo Molnar2009-06-2016-508/+528
| |\ \ \ \ \
| | * | | | | x86, mce: fix error path in mce_create_device()Hidetoshi Seto2009-06-181-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't skip removing mce_attrs in route from error2. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | x86: use zalloc_cpumask_var for mce_dev_initializedYinghai Lu2009-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need a cleared cpu_mask to record if mce is initialized, especially when MAXSMP is used. used zalloc_... instead Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | x86: fix duplicated sysfs attributeYinghai Lu2009-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sysfs attribute cmci_disabled was accidentall turned into a duplicate of ignore_ce, breaking all other attributes. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | Merge branch 'x86/urgent' into x86/mce3Ingo Molnar2009-06-1715-24/+95
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/kernel/cpu/mcheck/mce_intel.c Merge reason: merge with an urgent-branch MCE fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | | | | x86, mce: mce_intel.c needs <asm/apic.h>H. Peter Anvin2009-06-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs <asm/apic.h>. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
| | * | | | | | x86, mce: rename _64.c files which are no longer 64-bit-specificHidetoshi Seto2009-06-173-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename files that are no longer 64bit specific: mce_amd_64.c => mce_amd.c mce_intel_64.c => mce_intel.c Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: mce.h cleanupHidetoshi Seto2009-06-171-11/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reorder definitions. - static inline dummy mcheck_init() for !CONFIG_X86_MCE - gather defs for exception, threshold handler Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: remove therm_throt.hHidetoshi Seto2009-06-174-14/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now all symbols in the header are static. Remove the header. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: remove intel_set_thermal_handler()Hidetoshi Seto2009-06-172-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and make intel_thermal_interrupt() static. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: squash mce_intel.c into therm_throt.cHidetoshi Seto2009-06-173-74/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | move intel_init_thermal() into therm_throt.c Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: unify smp_thermal_interruptHidetoshi Seto2009-06-174-87/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Put common functions into therm_throt.c, modify Makefile. unexpected_thermal_interrupt intel_thermal_interrupt smp_thermal_interrupt intel_set_thermal_handler Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: unify smp_thermal_interrupt, prepareHidetoshi Seto2009-06-173-17/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let them in same shape. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: unify smp_thermal_interrupt, prepare mce_intel_64Hidetoshi Seto2009-06-171-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Break smp_thermal_interrupt() into two functions. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: unify smp_thermal_interrupt, prepare p4Hidetoshi Seto2009-06-171-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unused argument regs from handlers, and use inc_irq_stat. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: make mce_disabled booleanHidetoshi Seto2009-06-173-17/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mce_disabled on 32bit is a tristate variable [1,0,-1], while 64bit version is boolean [0,1]. This patch makes mce_disabled always boolean, and use mce_p5_enabled to indicate the third state instead. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: unify mce.hHidetoshi Seto2009-06-1712-59/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 2 headers: arch/x86/include/asm/mce.h arch/x86/kernel/cpu/mcheck/mce.h and in the latter small header: #include <asm/mce.h> This patch move all contents in the latter header into the former, and fix all files using the latter to include the former instead. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: sysfs entries for new mce optionsHidetoshi Seto2009-06-171-1/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sysfs interface for admins who want to tweak these options without rebooting the system. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: rename static variables around triggerHidetoshi Seto2009-06-171-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "trigger" is not straight forward name for valiable that holds name of user mode helper program which triggered by machine check events. This patch renames this valiable and kins to more recognizable names. trigger => mce_helper trigger_argv => mce_helper_argv notify_user => mce_need_notify No functional changes. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: add __read_mostlyHidetoshi Seto2009-06-171-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add __read_mostly to data written during setup. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: cleanup mce_start()Hidetoshi Seto2009-06-171-35/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify interface of mce_start(): - no_way_out = mce_start(no_way_out, &order); + order = mce_start(&no_way_out); Now Monarch and Subjects share same exit(return) in usual path. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: don't init timer if !mce_availableHidetoshi Seto2009-06-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mce_cpu_restart, mce_init_timer is called unconditionally. If !mce_available (e.g. mce is disabled), there are no useful work for timer. Stop running it. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | | | | x86, mce: fix a race condition about mce_callin and no_way_outHuang Ying2009-06-171-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If one CPU has no_way_out == 1, all other CPUs should have no_way_out == 1. But despite global_nwo is read after mce_callin, global_nwo is updated after mce_callin too. So it is possible that some CPU read global_nwo before some other CPU update global_nwo, so that no_way_out == 1 for some CPU, while no_way_out == 0 for some other CPU. This patch fixes this race condition via moving mce_callin updating after global_nwo updating, with a smp_wmb in between. A smp_rmb is added between their reading too. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
| * | | | | | | x86: de-assembler-ize asm/desc.hAlexander van Heukelum2009-06-184-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | asm/desc.h is included in three assembly files, but the only macro it defines, GET_DESC_BASE, is never used. This patch removes the includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard from asm/desc.h. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@zytor.com>