linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'urgent' of git://amd64.org/linux/rric into perf/urgent	Ingo Molnar	2011-11-15	2	-5/+25
\|\
\| *	oprofile: Fix crash when unloading module (hr timer mode)	Robert Richter	2011-11-04	2	-5/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Oprofile may crash in a KVM guest while unlaoding modules. This happens if oprofile_arch_init() fails and oprofile switches to the hr timer mode as a fallback. In this case oprofile_arch_exit() is called, but it never was initialized properly which causes the crash. This patch fixes this. oprofile: using timer interrupt. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 PGD 41da3f067 PUD 41d80e067 PMD 0 Oops: 0002 [#1] PREEMPT SMP CPU 5 Modules linked in: oprofile(-) Pid: 2382, comm: modprobe Not tainted 3.1.0-rc7-00018-g709a39d #18 Advanced Micro Device Anaheim/Anaheim RIP: 0010:[<ffffffff8123c226>] [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP: 0018:ffff88041de1de98 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffffffa00060e0 RCX: dead000000200200 RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620 RBP: ffff88041de1dea8 R08: 0000000000000001 R09: 0000000000000082 R10: 0000000000000000 R11: ffff88041de1dde8 R12: 0000000000000080 R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210 FS: 00007f9ae5bef700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 000000041ca44000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 2382, threadinfo ffff88041de1c000, task ffff88042db6d040) Stack: ffff88041de1deb8 ffffffffa0006770 ffff88041de1deb8 ffffffffa000251e ffff88041de1dec8 ffffffffa00022c2 ffff88041de1ded8 ffffffffa0004993 ffff88041de1df78 ffffffff81073115 656c69666f72706f 0000000000610200 Call Trace: [<ffffffffa000251e>] op_nmi_exit+0x15/0x17 [oprofile] [<ffffffffa00022c2>] oprofile_arch_exit+0xe/0x10 [oprofile] [<ffffffffa0004993>] oprofile_exit+0x13/0x15 [oprofile] [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81 89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b RIP [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58 RSP <ffff88041de1de98> CR2: 0000000000000008 ---[ end trace 06d4e95b6aa3b437 ]--- CC: stable@kernel.org # 2.6.37+ Signed-off-by: Robert Richter <robert.richter@amd.com>
* \|	locking, oprofile: Annotate oprofilefs lock as raw	Thomas Gleixner	2011-09-13	3	-7/+7
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	The oprofilefs_lock can be taken in atomic context (in profiling interrupts) and therefore cannot cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	atomic: use <linux/atomic.h>	Arun Sharma	2011-07-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	perf: Remove the nmi parameter from the oprofile_perf backend	Will Deacon	2011-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	In commit a8b0ca17b80e ("perf: Remove the nmi parameter from the swevent and overflow interface") one site was overlooked. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110708173442.GB31972@e102144-lin.cambridge.arm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	perf: Add context field to perf_event	Avi Kivity	2011-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The perf_event overflow handler does not receive any caller-derived argument, so many callers need to resort to looking up the perf_event in their local data structure. This is ugly and doesn't scale if a single callback services many perf_events. Fix by adding a context parameter to perf_event_create_kernel_counter() (and derived hardware breakpoints APIs) and storing it in the perf_event. The field can be accessed from the callback as event->overflow_handler_context. All callers are updated. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	oprofile: Fix locking dependency in sync_start()	Robert Richter	2011-05-31	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the A->B/B->A locking dependency, see the warning below. The function task_exit_notify() is called with (task_exit_notifier) .rwsem set and then calls sync_buffer() which locks buffer_mutex. In sync_start() the buffer_mutex was set to prevent notifier functions to be started before sync_start() is finished. But when registering the notifier, (task_exit_notifier).rwsem is locked too, but now in different order than in sync_buffer(). In theory this causes a locking dependency, what does not occur in practice since task_exit_notify() is always called after the notifier is registered which means the lock is already released. However, after checking the notifier functions it turned out the buffer_mutex in sync_start() is unnecessary. This is because sync_buffer() may be called from the notifiers even if sync_start() did not finish yet, the buffers are already allocated but empty. No need to protect this with the mutex. So we fix this theoretical locking dependency by removing buffer_mutex in sync_start(). This is similar to the implementation before commit: 750d857 oprofile: fix crash when accessing freed task structs which introduced the locking dependency. Lockdep warning: oprofiled/4447 is trying to acquire lock: (buffer_mutex){+.+...}, at: [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile] but task is already holding lock: ((task_exit_notifier).rwsem){++++..}, at: [<ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((task_exit_notifier).rwsem){++++..}: [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff81463a2b>] down_write+0x44/0x67 [<ffffffff810581c0>] blocking_notifier_chain_register+0x52/0x8b [<ffffffff8105a6ac>] profile_event_register+0x2d/0x2f [<ffffffffa00013c1>] sync_start+0x47/0xc6 [oprofile] [<ffffffffa00001bb>] oprofile_setup+0x60/0xa5 [oprofile] [<ffffffffa00014e3>] event_buffer_open+0x59/0x8c [oprofile] [<ffffffff810cd3b9>] __dentry_open+0x1eb/0x308 [<ffffffff810cd59d>] nameidata_to_filp+0x60/0x67 [<ffffffff810daad6>] do_last+0x5be/0x6b2 [<ffffffff810dbc33>] path_openat+0xc7/0x360 [<ffffffff810dbfc5>] do_filp_open+0x3d/0x8c [<ffffffff810ccfd2>] do_sys_open+0x110/0x1a9 [<ffffffff810cd09e>] sys_open+0x20/0x22 [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b -> #0 (buffer_mutex){+.+...}: [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309 [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile] [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile] [<ffffffff81467b96>] notifier_call_chain+0x37/0x63 [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67 [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c [<ffffffff81039e8f>] do_exit+0x2a/0x6fc [<ffffffff8103a5e4>] do_group_exit+0x83/0xae [<ffffffff8103a626>] sys_exit_group+0x17/0x1b [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by oprofiled/4447: #0: ((task_exit_notifier).rwsem){++++..}, at: [<ffffffff81058026>] __blocking_notifier_call_chain+0x39/0x67 stack backtrace: Pid: 4447, comm: oprofiled Not tainted 2.6.39-00007-gcf4d8d4 #10 Call Trace: [<ffffffff81063193>] print_circular_bug+0xae/0xbc [<ffffffff81064dfb>] __lock_acquire+0x1085/0x1711 [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile] [<ffffffff8106557f>] lock_acquire+0xf8/0x11e [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile] [<ffffffff81062627>] ? mark_lock+0x42f/0x552 [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile] [<ffffffff814634f0>] mutex_lock_nested+0x63/0x309 [<ffffffffa0000e55>] ? sync_buffer+0x31/0x3ec [oprofile] [<ffffffffa0000e55>] sync_buffer+0x31/0x3ec [oprofile] [<ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67 [<ffffffff81058026>] ? __blocking_notifier_call_chain+0x39/0x67 [<ffffffffa0001226>] task_exit_notify+0x16/0x1a [oprofile] [<ffffffff81467b96>] notifier_call_chain+0x37/0x63 [<ffffffff8105803d>] __blocking_notifier_call_chain+0x50/0x67 [<ffffffff81058068>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff8105a718>] profile_task_exit+0x1a/0x1c [<ffffffff81039e8f>] do_exit+0x2a/0x6fc [<ffffffff81465031>] ? retint_swapgs+0xe/0x13 [<ffffffff8103a5e4>] do_group_exit+0x83/0xae [<ffffffff8103a626>] sys_exit_group+0x17/0x1b [<ffffffff8146ad4b>] system_call_fastpath+0x16/0x1b Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Carl Love <carll@us.ibm.com> Cc: <stable@kernel.org> # .36+ Signed-off-by: Robert Richter <robert.richter@amd.com>
*	oprofile: Free potentially owned tasks in case of errors	Robert Richter	2011-05-31	1	-4/+9
\| \| \| \| \| \| \| \| \|	After registering the task free notifier we possibly have tasks in our dying_tasks list. Free them after unregistering the notifier in case of an error. Cc: <stable@kernel.org> # .36+ Signed-off-by: Robert Richter <robert.richter@amd.com>
*	oprofile: Use linux/mutex.h	Anton Blanchard	2011-05-24	2	-2/+2
\| \| \| \| \| \| \| \|	The oprofile code is still including asm/mutex.h instead of linux/mutex.h. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	oprofile, s390: Rework hwsampler implementation	Robert Richter	2011-02-15	3	-46/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is a rework of the hwsampler oprofile implementation that has been applied recently. Now there are less non-architectural changes. The only changes are: * introduction of oprofile_add_ext_hw_sample(), and * removal of section attributes of oprofile_timer_init/_exit(). To setup hwsampler for oprofile we need to modify start()/stop() callbacks and additional hwsampler control files in oprofilefs. We do not reinitialize the timer or hwsampler mode by restarting calling init/exit() anymore, instead hwsampler_running is used to switch the mode directly in oprofile_hwsampler_start/_stop(). For locking reasons there is also hwsampler_file that reflects the value in oprofilefs. The overall diffstat of the oprofile s390 hwsampler implemenation shows the low impact to non-architectural code: arch/Kconfig \| 3 + arch/s390/Kconfig \| 1 + arch/s390/oprofile/Makefile \| 2 +- arch/s390/oprofile/hwsampler.c \| 1256 ++++++++++++++++++++++++++++++++++ arch/s390/oprofile/hwsampler.h \| 113 +++ arch/s390/oprofile/hwsampler_files.c \| 162 +++++ arch/s390/oprofile/init.c \| 6 +- drivers/oprofile/cpu_buffer.c \| 24 +- drivers/oprofile/timer_int.c \| 4 +- include/linux/oprofile.h \| 7 + 10 files changed, 1567 insertions(+), 11 deletions(-) Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	oprofile, s390: Enhance OProfile to support System zs hardware sampling feature	Heinz Graalfs	2011-02-15	3	-3/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OProfile is enhanced to export all files for controlling System z's hardware sampling, and to invoke hwsampler exported functions to initialize and use System z's hardware sampling. The patch invokes hwsampler_setup() during oprofile init and exports following hwsampler files under oprofilefs if hwsampler's setup succeeded: A new directory for hardware sampling based files /dev/oprofile/hwsampling/ The userland daemon must explicitly write to the following files to disable (or enable) hardware based sampling /dev/oprofile/hwsampling/hwsampler to modify the actual sampling rate /dev/oprofile/hwsampling/hw_interval to modify the amount of sampling memory (measured in 4K pages) /dev/oprofile/hwsampling/hw_sdbt_blocks The following files are read only and show the possible minimum sampling rate /dev/oprofile/hwsampling/hw_min_interval the possible maximum sampling rate /dev/oprofile/hwsampling/hw_max_interval The patch splits the oprofile_timer_[init/exit] function so that it can be also called through user context (oprofilefs) to avoid kernel oops. Applied with following changes: * whitespace changes in Makefile and timer_int.c Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Maran Pakkirisamy <maranp@linux.vnet.ibm.com> Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	oprofile: Introduce new oprofile sample add function ↵	Heinz Graalfs	2011-02-15	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(oprofile_add_ext_hw_sample) This patch introduces a new oprofile sample add function (oprofile_add_ext_hw_sample) that can also take task_struct as an argument, which is used by the hwsampler kernel module when copying hardware samples to OProfile buffers. Applied with following changes: * removed #include <linux/module.h> * whitespace changes * removed conditional compilation (CONFIG_HAVE_HWSAMPLER) * modified order of functions * fix missing function definition in header file Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Maran Pakkirisamy <maranp@linux.vnet.ibm.com> Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of ↵	Linus Torvalds	2010-10-30	4	-4/+22
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: jump label: Add work around to i386 gcc asm goto bug x86, ftrace: Use safe noops, drop trap test jump_label: Fix unaligned traps on sparc. jump label: Make arch_jump_label_text_poke_early() optional jump label: Fix error with preempt disable holding mutex oprofile: Remove deprecated use of flush_scheduled_work() oprofile: Fix the hang while taking the cpu offline jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex jump label: Fix module __init section race * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Check irq_remapped instead of remapping_enabled in destroy_irq()
\| *	Merge branch 'tip/perf/jump-label-2' of ↵	Ingo Molnar	2010-10-30	2	-1/+11
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
\| * \|	oprofile: Remove deprecated use of flush_scheduled_work()	Tejun Heo	2010-10-29	3	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flush_scheduled_work() is deprecated and scheduled to be removed. sync_stop() currently cancels cpu_buffer works inside buffer_mutex and flushes the system workqueue outside. Instead, split end_cpu_work() into two parts - stopping further work enqueues and flushing works - and do the former inside buffer_mutex and latter outside. For stable kernels v2.6.35.y and v2.6.36.y. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>
\| * \|	oprofile: Fix the hang while taking the cpu offline	Santosh Shilimkar	2010-10-29	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kernel build with CONFIG_OPROFILE and CPU_HOTPLUG enabled. The oprofile is initialised using system timer in absence of hardware counters supports. Oprofile isn't started from userland. In this setup while doing a CPU offline the kernel hangs in infinite for loop inside lock_hrtimer_base() function This happens because as part of oprofile_cpu_notify(, it tries to stop an hrtimer which was never started. These per-cpu hrtimers are started when the oprfile is started. echo 1 > /dev/oprofile/enable This problem also existwhen the cpu is booted with maxcpus parameter set. When bringing the remaining cpus online the timers are started even if oprofile is not yet enabled. This patch fix this issue by adding a state variable so that these hrtimer start/stop is only attempted when oprofile is started For stable kernels v2.6.35.y and v2.6.36.y. Reported-by: Jan Sebastien <s-jan@ti.com> Tested-by: sricharan <r.sricharan@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>
* \| \|	convert get_sb_single() users	Al Viro	2010-10-29	1	-4/+4
\| \|/ \|/\| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	fs: do not assign default i_ino in new_inode	Christoph Hellwig	2010-10-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl	Linus Torvalds	2010-10-22	2	-1/+10
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek
\| *	llseek: automatically add .llseek fop	Arnd Bergmann	2010-10-15	2	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
* \|	oprofile: make !CONFIG_PM function stubs static inline	Robert Richter	2010-10-15	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make !CONFIG_PM function stubs static inline and remove section attribute. Signed-off-by: Robert Richter <robert.richter@amd.com>
* \|	oprofile: fix linker errors	Anand Gadiyar	2010-10-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit e9677b3ce (oprofile, ARM: Use oprofile_arch_exit() to cleanup on failure) caused oprofile_perf_exit to be called in the cleanup path of oprofile_perf_init. The __exit tag for oprofile_perf_exit should therefore be dropped. The same has to be done for exit_driverfs as well, as this function is called from oprofile_perf_exit. Else, we get the following two linker errors. LD .tmp_vmlinux1 `oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o make: * [.tmp_vmlinux1] Error 1 LD .tmp_vmlinux1 `exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o make: * [.tmp_vmlinux1] Error 1 Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
* \|	oprofile: include platform_device.h to fix build break	Anand Gadiyar	2010-10-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	oprofile_perf.c needs to include platform_device.h Otherwise we get the following build break. CC arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.o arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: 'struct platform_device' declared inside parameter list arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: its scope is only this definition or declaration, which is probably not what you want arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:201: warning: 'struct platform_device' declared inside parameter list arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:210: error: variable 'oprofile_driver' has initializer but incomplete type arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: unknown field 'driver' specified in initializer arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: extra brace group at end of initializer arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: (near initialization for 'oprofile_driver') arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: excess elements in struct initializer arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: (near initialization for 'oprofile_driver') arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: error: unknown field 'resume' specified in initializer arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: excess elements in struct initializer arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: (near initialization for 'oprofile_driver') arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: error: unknown field 'suspend' specified in initializer arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: excess elements in struct initializer arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: (near initialization for 'oprofile_driver') arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c: In function 'init_driverfs': Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Cc: Matt Fleming <matt@console-pimps.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
* \|	Merge remote branch 'tip/perf/core' into oprofile/core	Robert Richter	2010-10-15	1	-1/+1
\|\\| \| \| \| \| \| \| \| \| \| \|	Conflicts: arch/arm/oprofile/common.c kernel/perf_event.c
* \|	oprofile: disable write access to oprofilefs while profiler is running	Robert Richter	2010-10-12	4	-20/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Oprofile counters are setup when profiling is disabled. Thus, writing to oprofilefs has no immediate effect. Changes are updated only after oprofile is reenabled. To keep userland and kernel states synchronized, we now allow configuration of oprofile only if profiling is disabled. In this case it checks if the profiler is running and then disables write access to oprofilefs by returning -EBUSY. The change should be backward compatible with current oprofile userland daemon. Acked-by: Maynard Johnson <maynardj@us.ibm.com> Cc: William Cohen <wcohen@redhat.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
* \|	Merge branch 'oprofile/perf' into oprofile/core	Robert Richter	2010-10-11	1	-0/+323
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: arch/arm/oprofile/common.c Signed-off-by: Robert Richter <robert.richter@amd.com>
\| * \|	oprofile, ARM: Use oprofile_arch_exit() to cleanup on failure	Robert Richter	2010-10-11	1	-28/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is duplicate cleanup code in the init and exit functions. Now, oprofile_arch_exit() is also used if oprofile_arch_init() fails. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
\| * \|	oprofile, ARM: Rework op_create_counter()	Robert Richter	2010-10-11	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch simplifies op_create_counter(). Removing if/else if paths and return code variable by direct returning from function. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
\| * \|	oprofile, ARM: Remove some goto statements	Robert Richter	2010-10-11	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes some unnecessary goto statements. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
\| * \|	oprofile, ARM: Release resources on failure	Robert Richter	2010-10-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a resource leak on failure, where the oprofilefs and some counters may not released properly. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: <stable@kernel.org> # .35.x LKML-Reference: <20100929145225.GJ13563@erda.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| * \|	Merge branch 'oprofile/urgent' (early part) into oprofile/perf	Robert Richter	2010-10-11	2	-15/+14
\| \|\\|
\| * \|	oprofile: Abstract the perf-events backend	Matt Fleming	2010-10-11	1	-0/+326
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the perf-events backend from arch/arm/oprofile into drivers/oprofile so that the code can be shared between architectures. This allows each architecture to maintain only a single copy of the PMU accessor functions instead of one for both perf and OProfile. It also becomes possible for other architectures to delete much of their OProfile code in favour of the common code now available in drivers/oprofile/oprofile_perf.c. Signed-off-by: Matt Fleming <matt@console-pimps.org> Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
* \| \|	oprofile: Remove duplicate code around __oprofilefs_create_file()	Robert Richter	2010-10-04	1	-32/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removing duplicate code by assigning the inodes private data pointer in __oprofilefs_create_file(). Extending the function interface to pass the pointer. Signed-off-by: Robert Richter <robert.richter@amd.com>
* \| \|	Merge branch 'oprofile/urgent' into oprofile/core	Robert Richter	2010-10-01	2	-15/+14
\|\ \ \ \| \|/ / \|/\| / \| \|/ \| \| \| \| \| \|	Conflicts: arch/arm/oprofile/common.c Signed-off-by: Robert Richter <robert.richter@amd.com>
\| *	oprofile: fix crash when accessing freed task structs	Robert Richter	2010-08-25	2	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a crash during shutdown reported below. The crash is caused by accessing already freed task structs. The fix changes the order for registering and unregistering notifier callbacks. All notifiers must be initialized before buffers start working. To stop buffer synchronization we cancel all workqueues, unregister the notifier callback and then flush all buffers. After all of this we finally can free all tasks listed. This should avoid accessing freed tasks. On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote: > So the initial observation is a spinlock bad magic followed by a crash > in the spinlock debug code: > > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136 > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03 > > Backtrace looks like: > > spin_bug+0x74/0xd4 > ._raw_spin_lock+0x48/0x184 > ._spin_lock+0x10/0x24 > .get_task_mm+0x28/0x8c > .sync_buffer+0x1b4/0x598 > .wq_sync_buffer+0xa0/0xdc > .worker_thread+0x1d8/0x2a8 > .kthread+0xa8/0xb4 > .kernel_thread+0x54/0x70 > > So we are accessing a freed task struct in the work queue when > processing the samples. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>
* \|	oprofile: don't call arch exit code from init code on failure	Will Deacon	2010-08-31	1	-9/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	oprofile_init calls oprofile_arch_init to initialise the architecture-specific backend code. If this backend code returns failure, oprofile_arch_exit is called immediately, making it difficult to allocate and free resources correctly. This patch removes the oprofile_arch_exit call from oprofile_init, meaning that all architectures must ensure that oprofile_arch_init cleans up any mess it's made before returning an error. As far as I can tell, this only affects the code for ARM. Cc: Robert Richter <robert.richter@amd.com> Cc: Matt Fleming <matt@console-pimps.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	oprofile: make event buffer nonseekable	Arnd Bergmann	2010-07-26	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	The event buffer cannot deal with seeks, so we should forbid that outright. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Robert Richter <robert.richter@amd.com> Cc: oprofile-list@lists.sf.net Signed-off-by: Robert Richter <robert.richter@amd.com>
*	oprofile: protect from not being in an IRQ context	Phil Carmody	2010-05-03	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	http://lkml.org/lkml/2010/4/27/285 Protect against dereferencing regs when it's NULL, and force a magic number into pc to prevent too deep processing. This approach permits the dropped samples to be tallied as invalid Instruction Pointer events. e.g. output from about 15mins at 10kHz sample rate: Nr. samples received: 2565380 Nr. samples lost invalid pc: 4 Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	Merge commit 'tip/tracing/core' into oprofile/core	Robert Richter	2010-04-23	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/oprofile/cpu_buffer.c Signed-off-by: Robert Richter <robert.richter@amd.com>
\| *	Merge branch 'linus' into tracing/core	Ingo Molnar	2010-04-08	1	-0/+1
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: include/linux/module.h kernel/module.c Semantic conflict: include/trace/events/module.h Merge reason: Resolve the conflict with upstream commit 5fbfb18 ("Fix up possibly racy module refcounting") Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| * \|	ring-buffer: Add place holder recording of dropped events	Steven Rostedt	2010-04-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* \| \|	oprofile: remove double ring buffering	Andi Kleen	2010-04-23	1	-50/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	oprofile used a double buffer scheme for its cpu event buffer to avoid races on reading with the old locked ring buffer. But that is obsolete now with the new ring buffer, so simply use a single buffer. This greatly simplifies the code and avoids a lot of sample drops on large runs, especially with call graph. Based on suggestions from Steven Rostedt For stable kernels from v2.6.32, but not earlier. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable <stable@kernel.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
* \| \|	Merge commit 'v2.6.34-rc5' into oprofile/core	Robert Richter	2010-04-23	1	-0/+1
\|\ \ \ \| \| \|/ \| \|/\|
\| * \|	include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵	Tejun Heo	2010-03-30	1	-0/+1
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* /	oprofile: convert oprofile from timer_hook to hrtimer	Martin Schwidefsky	2010-03-02	3	-14/+79
\|/ \| \| \| \| \| \| \| \| \| \|	Oprofile is currently broken on systems running with NOHZ enabled. A maximum of 1 tick is accounted via the timer_hook if a cpu sleeps for a longer period of time. This does bad things to the percentages in the profiler output. To solve this problem convert oprofile to use a restarting hrtimer instead of the timer_hook. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2009-12-14	3	-14/+13
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
\| *	percpu: make percpu symbols in oprofile unique	Tejun Heo	2009-10-29	3	-14/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates percpu related symbols in oprofile such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * drivers/oprofile/cpu_buffer.c: s/cpu_buffer/op_cpu_buffer/ Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Robert Richter <robert.richter@amd.com> Cc: Rusty Russell <rusty@rustcorp.com.au>
* \|	oprofile: warn on freeing event buffer too early	Robert Richter	2009-10-09	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A race shouldn't happen since all workqueues or handlers are canceled or flushed before the event buffer is freed. A warning is triggered now if the buffer is freed too early. Also, this patch adds some comments about event buffer protection, reworks some code and adds code to clear buffer_pos during alloc and free of the event buffer. Cc: David Rientjes <rientjes@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
* \|	oprofile: fix race condition in event_buffer free	David Rientjes	2009-10-09	1	-1/+13
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Looking at the 2.6.31-rc9 code, it appears there is a race condition in the event_buffer cleanup code path (shutdown). This could lead to kernel panic as some CPUs may be operating on the event buffer AFTER it has been freed. The attached patch solves the problem and makes sure CPUs check if the buffer is not NULL before they access it as some may have been spinning on the mutex while the buffer was being freed. The race may happen if the buffer is freed during pending reads. But it is not clear why there are races in add_event_entry() since all workqueues or handlers are canceled or flushed before the event buffer is freed. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
*	cpumask: use zalloc_cpumask_var() where possible	Li Zefan	2009-09-24	1	-2/+1
\| \| \| \| \| \| \|	Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>