summaryrefslogtreecommitdiffstats
path: root/include/trace/ftrace.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* tracing: Consolidate event trigger codeSteven Rostedt (Red Hat)2014-01-101-19/+4
| | | | | | | | | | | | | | | | | The event trigger code that checks for callback triggers before and after recording of an event has lots of flags checks. This code is duplicated throughout the ftrace events, kprobes and system calls. They all do the exact same checks against the event flags. Added helper functions ftrace_trigger_soft_disabled(), event_trigger_unlock_commit() and event_trigger_unlock_commit_regs() that consolidated the code and these are used instead. Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.home Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Add and use generic set_trigger_filter() implementationTom Zanussi2013-12-221-12/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a generic event_command.set_trigger_filter() op implementation and have the current set of trigger commands use it - this essentially gives them all support for filters. Syntactically, filters are supported by adding 'if <filter>' just after the command, in which case only events matching the filter will invoke the trigger. For example, to add a filter to an enable/disable_event command: echo 'enable_event:system:event if common_pid == 999' > \ .../othersys/otherevent/trigger The above command will only enable the system:event event if the common_pid field in the othersys:otherevent event is 999. As another example, to add a filter to a stacktrace command: echo 'stacktrace if common_pid == 999' > \ .../somesys/someevent/trigger The above command will only trigger a stacktrace if the common_pid field in the event is 999. The filter syntax is the same as that described in the 'Event filtering' section of Documentation/trace/events.txt. Because triggers can now use filters, the trigger-invoking logic needs to be moved in those cases - e.g. for ftrace_raw_event_calls, if a trigger has a filter associated with it, the trigger invocation now needs to happen after the { assign; } part of the call, in order for the trigger condition to be tested. There's still a SOFT_DISABLED-only check at the top of e.g. the ftrace_raw_events function, so when an event is soft disabled but not because of the presence of a trigger, the original SOFT_DISABLED behavior remains unchanged. There's also a bit of trickiness in that some triggers need to avoid being invoked while an event is currently in the process of being logged, since the trigger may itself log data into the trace buffer. Thus we make sure the current event is committed before invoking those triggers. To do that, we split the trigger invocation in two - the first part (event_triggers_call()) checks the filter using the current trace record; if a command has the post_trigger flag set, it sets a bit for itself in the return value, otherwise it directly invoks the trigger. Once all commands have been either invoked or set their return flag, event_triggers_call() returns. The current record is then either committed or discarded; if any commands have deferred their triggers, those commands are finally invoked following the close of the current event by event_triggers_post_call(). To simplify the above and make it more efficient, the TRIGGER_COND bit is introduced, which is set only if a soft-disabled trigger needs to use the log record for filter testing or needs to wait until the current log record is closed. The syscall event invocation code is also changed in analogous ways. Because event triggers need to be able to create and free filters, this also adds a couple external wrappers for the existing create_filter and free_filter functions, which are too generic to be made extern functions themselves. Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Add basic event trigger frameworkTom Zanussi2013-12-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a 'trigger' file for each trace event, enabling 'trace event triggers' to be set for trace events. 'trace event triggers' are patterned after the existing 'ftrace function triggers' implementation except that triggers are written to per-event 'trigger' files instead of to a single file such as the 'set_ftrace_filter' used for ftrace function triggers. The implementation is meant to be entirely separate from ftrace function triggers, in order to keep the respective implementations relatively simple and to allow them to diverge. The event trigger functionality is built on top of SOFT_DISABLE functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file flags which is checked when any trace event fires. Triggers set for a particular event need to be checked regardless of whether that event is actually enabled or not - getting an event to fire even if it's not enabled is what's already implemented by SOFT_DISABLE mode, so trigger mode directly reuses that. Event trigger essentially inherit the soft disable logic in __ftrace_event_enable_disable() while adding a bit of logic and trigger reference counting via tm_ref on top of that in a new trace_event_trigger_enable_disable() function. Because the base __ftrace_event_enable_disable() code now needs to be invoked from outside trace_events.c, a wrapper is also added for those usages. The triggers for an event are actually invoked via a new function, event_triggers_call(), and code is also added to invoke them for ftrace_raw_event calls as well as syscall events. The main part of the patch creates a new trace_events_trigger.c file to contain the trace event triggers implementation. The standard open, read, and release file operations are implemented here. The open() implementation sets up for the various open modes of the 'trigger' file. It creates and attaches the trigger iterator and sets up the command parser. If opened for reading set up the trigger seq_ops. The read() implementation parses the event trigger written to the 'trigger' file, looks up the trigger command, and passes it along to that event_command's func() implementation for command-specific processing. The release() implementation does whatever cleanup is needed to release the 'trigger' file, like releasing the parser and trigger iterator, etc. A couple of functions for event command registration and unregistration are added, along with a list to add them to and a mutex to protect them, as well as an (initially empty) registration function to add the set of commands that will be added by future commits, and call to it from the trace event initialization code. also added are a couple trigger-specific data structures needed for these implementations such as a trigger iterator and a struct for trigger-specific data. A couple structs consisting mostly of function meant to be implemented in command-specific ways, event_command and event_trigger_ops, are used by the generic event trigger command implementations. They're being put into trace.h alongside the other trace_event data structures and functions, in the expectation that they'll be needed in several trace_event-related files such as trace_events_trigger.c and trace_events.c. The event_command.func() function is meant to be called by the trigger parsing code in order to add a trigger instance to the corresponding event. It essentially coordinates adding a live trigger instance to the event, and arming the triggering the event. Every event_command func() implementation essentially does the same thing for any command: - choose ops - use the value of param to choose either a number or count version of event_trigger_ops specific to the command - do the register or unregister of those ops - associate a filter, if specified, with the triggering event The reg() and unreg() ops allow command-specific implementations for event_trigger_op registration and unregistration, and the get_trigger_ops() op allows command-specific event_trigger_ops selection to be parameterized. When a trigger instance is added, the reg() op essentially adds that trigger to the triggering event and arms it, while unreg() does the opposite. The set_filter() function is used to associate a filter with the trigger - if the command doesn't specify a set_filter() implementation, the command will ignore filters. Each command has an associated trigger_type, which serves double duty, both as a unique identifier for the command as well as a value that can be used for setting a trigger mode bit during trigger invocation. The signature of func() adds a pointer to the event_command struct, used to invoke those functions, along with a command_data param that can be passed to the reg/unreg functions. This allows func() implementations to use command-specific blobs and supports code re-use. The event_trigger_ops.func() command corrsponds to the trigger 'probe' function that gets called when the triggering event is actually invoked. The other functions are used to list the trigger when needed, along with a couple mundane book-keeping functions. This also moves event_file_data() into trace.h so it can be used outside of trace_events.c. Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Idea-by: Steve Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2013-12-021-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel and tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools lib traceevent: Fix conversion of pointer to integer of different size perf/trace: Properly use u64 to hold event_id perf: Remove fragile swevent hlist optimization ftrace, perf: Avoid infinite event generation loop tools lib traceevent: Fix use of multiple options in processing field perf header: Fix possible memory leaks in process_group_desc() perf header: Fix bogus group name perf tools: Tag thread comm as overriden
| * ftrace, perf: Avoid infinite event generation loopPeter Zijlstra2013-11-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vince's perf-trinity fuzzer found yet another 'interesting' problem. When we sample the irq_work_exit tracepoint with period==1 (or PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an infinite event generation loop: ,-> <IPI> | irq_work_exit() -> | trace_irq_work_exit() -> | ... | __perf_event_overflow() -> (due to fasync) | irq_work_queue() -> (irq_work_list must be empty) '--------- arch_irq_work_raise() Similar things can happen due to regular poll() wakeups if we exceed the ring-buffer wakeup watermark, or have an event_limit. To avoid this, dis-allow sampling this particular tracepoint. In order to achieve this, create a special perf_perm function pointer for each event and call this (when set) on trying to create a tracepoint perf event. [ roasted: use expr... to allow for ',' in your expression ] Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jones <davej@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | tracing: Allow events to have NULL stringsSteven Rostedt (Red Hat)2013-11-261-2/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an TRACE_EVENT() uses __assign_str() or __get_str on a NULL pointer then the following oops will happen: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c127a17b>] strlen+0x10/0x1a *pde = 00000000 ^M Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc1-test+ #2 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006^M task: f5cde9f0 ti: f5e5e000 task.ti: f5e5e000 EIP: 0060:[<c127a17b>] EFLAGS: 00210046 CPU: 1 EIP is at strlen+0x10/0x1a EAX: 00000000 EBX: c2472da8 ECX: ffffffff EDX: c2472da8 ESI: c1c5e5fc EDI: 00000000 EBP: f5e5fe84 ESP: f5e5fe80 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 01f32000 CR4: 000007d0 Stack: f5f18b90 f5e5feb8 c10687a8 0759004f 00000005 00000005 00000005 00200046 00000002 00000000 c1082a93 f56c7e28 c2472da8 c1082a93 f5e5fee4 c106bc61^M 00000000 c1082a93 00000000 00000000 00000001 00200046 00200082 00000000 Call Trace: [<c10687a8>] ftrace_raw_event_lock+0x39/0xc0 [<c1082a93>] ? ktime_get+0x29/0x69 [<c1082a93>] ? ktime_get+0x29/0x69 [<c106bc61>] lock_release+0x57/0x1a5 [<c1082a93>] ? ktime_get+0x29/0x69 [<c10824dd>] read_seqcount_begin.constprop.7+0x4d/0x75 [<c1082a93>] ? ktime_get+0x29/0x69^M [<c1082a93>] ktime_get+0x29/0x69 [<c108a46a>] __tick_nohz_idle_enter+0x1e/0x426 [<c10690e8>] ? lock_release_holdtime.part.19+0x48/0x4d [<c10bc184>] ? time_hardirqs_off+0xe/0x28 [<c1068c82>] ? trace_hardirqs_off_caller+0x3f/0xaf [<c108a8cb>] tick_nohz_idle_enter+0x59/0x62 [<c1079242>] cpu_startup_entry+0x64/0x192 [<c102299c>] start_secondary+0x277/0x27c Code: 90 89 c6 89 d0 88 c4 ac 38 e0 74 09 84 c0 75 f7 be 01 00 00 00 89 f0 48 5e 5d c3 55 89 e5 57 66 66 66 66 90 83 c9 ff 89 c7 31 c0 <f2> ae f7 d1 8d 41 ff 5f 5d c3 55 89 e5 57 66 66 66 66 90 31 ff EIP: [<c127a17b>] strlen+0x10/0x1a SS:ESP 0068:f5e5fe80 CR2: 0000000000000000 ---[ end trace 01bc47bf519ec1b2 ]--- New tracepoints have been added that have allowed for NULL pointers being assigned to strings. To fix this, change the TRACE_EVENT() code to check for NULL and if it is, it will assign "(null)" to it instead (similar to what glibc printf does). Reported-by: Shuah Khan <shuah.kh@samsung.com> Reported-by: Jovi Zhangwei <jovi.zhangwei@gmail.com> Link: http://lkml.kernel.org/r/CAGdX0WFeEuy+DtpsJzyzn0343qEEjLX97+o1VREFkUEhndC+5Q@mail.gmail.com Link: http://lkml.kernel.org/r/528D6972.9010702@samsung.com Fixes: 9cbf117662e2 ("tracing/events: provide string with undefined size support") Cc: stable@vger.kernel.org # 2.6.31+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Update event filters for multibufferTom Zanussi2013-11-051-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possibleOleg Nesterov2013-08-141-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL) make no sense if hlist_empty(head). Change perf_trace_##call() to check ->perf_events beforehand and do nothing if it is empty. This removes the overhead for tasks without events associated with them. For example, "perf record -e sched:sched_switch -p1" attaches the counter(s) to the single task, but every task in system will do perf_trace_buf_prepare/submit() just to realize that it was not attached to this event. However, we can only do this if __task == NULL, so we also add the __builtin_constant_p(__task) check. With this patch "perf bench sched pipe" shows approximately 4% improvement when "perf record -p1" runs in parallel, many thanks to Steven for the testing. Link: http://lkml.kernel.org/r/20130806160847.GA2746@redhat.com Tested-by: David Ahern <dsahern@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing/perf: Reimplement TP_perf_assign() logicOleg Nesterov2013-08-141-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | The next patch tries to avoid the costly perf_trace_buf_* calls when possible but there is a problem. We can only do this if __task == NULL, perf_tp_event(task != NULL) has the additional code for this case. Unfortunately, TP_perf_assign/__perf_xxx which changes the default values of __count/__task variables for perf_trace_buf_submit() is called "too late", after we already did perf_trace_buf_prepare(), and the optimization above can't work. So this patch simply embeds __perf_xxx() into TP_ARGS(), this way DECLARE_EVENT_CLASS() can use the result of assignments hidden in "args" right after ftrace_get_offsets_##call() which is mostly trivial. This allows us to have the fast-path "__task != NULL" check at the start, see the next patch. Link: http://lkml.kernel.org/r/20130806160844.GA2739@redhat.com Tested-by: David Ahern <dsahern@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing/perf: Expand TRACE_EVENT(sched_stat_runtime)Oleg Nesterov2013-08-141-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To simplify the review of the next patches: 1. We are going to reimplent __perf_task/counter and embedd them into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use different TP_ARGS's. 2. Change perf_trace_##call() macro to do perf_fetch_caller_regs() right before perf_trace_buf_prepare(). This way it evaluates TP_ARGS() asap, the next patch explores this fact. Note: after 87f44bbc perf_trace_buf_prepare() doesn't need "struct pt_regs *regs", perhaps it makes sense to remove this argument. And perhaps we can teach perf_trace_buf_submit() to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1) in this case. 3. Cosmetic, but the typecast from "void*" buys nothing. It just adds the noise, remove it. Link: http://lkml.kernel.org/r/20130806160841.GA2736@redhat.com Acked-by: Peter Zijlstra <peterz@infradead.org> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()Oleg Nesterov2013-07-191-4/+0
| | | | | | | | | | | | | | | | | | | | Every perf_trace_buf_prepare() caller does WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is almost the same. Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes the meaning of _ONCE, but I think this is fine. - 4947014 2932448 10104832 17984294 1126b26 vmlinux + 4948422 2932448 10104832 17985702 11270a6 vmlinux on my build. Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Add DEFINE_EVENT_FN() macroSteven Rostedt2013-06-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | Each TRACE_EVENT() adds several helper functions. If two or more trace events share the same structure and print format, they can also share most of these helper functions and save a lot of space from duplicate code. This is why the DECLARE_EVENT_CLASS() and DEFINE_EVENT() were created. Some events require a trigger to be called at registering and unregistering of the event and to do so they use TRACE_EVENT_FN(). If multiple events require a trigger, they currently have no choice but to use TRACE_EVENT_FN() as there's no DEFINE_EVENT_FN() available. This unfortunately causes a lot of wasted duplicate code created. By adding a DEFINE_EVENT_FN(), these events can still use a DECLARE_EVENT_CLASS() and then define their own triggers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/51C3236C.8030508@hds.com Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* tracing: Remove obsolete macro guard _TRACE_PROFILE_INITzhangwei(Jovi)2013-04-131-2/+0
| | | | | | | | | | The macro _TRACE_PROFILE_INIT was removed a long time ago, but an "#undef" guard was left behind. Remove it. Link: http://lkml.kernel.org/r/514684EE.6000805@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Add a way to soft disable trace eventsSteven Rostedt (Red Hat)2013-03-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to let triggers enable or disable events, we need a 'soft' method for doing so. For example, if a function probe is added that lets a user enable or disable events when a function is called, that change must be done without taking locks or a mutex, and definitely it can't sleep. But the full enabling of a tracepoint is expensive. By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated without the protection of a mutex (using set/clear_bit()), this soft disable flag can be used to allow critical sections to enable or disable events from being traced (after the event has been placed into "SOFT_MODE"). Some caveats though: The comm recorder (to map pids with a comm) can not be soft disabled (yet). If you disable an event with with a "soft" disable and wait a while before reading the trace, the comm cache may be replaced and you'll get a bunch of <...> for comms in the trace. Reading the "enable" file for an event that is disabled will now give you "0*" where the '*' denotes that the tracepoint is still active but the event itself is "disabled". [ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ] Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Fix some section mismatch warningsLi Zefan2013-03-151-1/+1
| | | | | | | | | | | | As we've added __init annotation to field-defining functions, we should add __refdata annotation to event_call variables, which reference those functions. Link: http://lkml.kernel.org/r/51343C1F.2050502@huawei.com Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Annotate event field-defining functions with __initLi Zefan2013-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Those functions are called either during kernel boot or module init. Before: $ dmesg | grep 'Freeing unused kernel memory' Freeing unused kernel memory: 1208k freed Freeing unused kernel memory: 1360k freed Freeing unused kernel memory: 1960k freed After: $ dmesg | grep 'Freeing unused kernel memory' Freeing unused kernel memory: 1236k freed Freeing unused kernel memory: 1388k freed Freeing unused kernel memory: 1960k freed Link: http://lkml.kernel.org/r/5125877D.5000201@huawei.com Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Add a helper function for event print functionsLi Zefan2013-03-151-17/+6
| | | | | | | | | | | | | | | Move duplicate code in event print functions to a helper function. This shrinks the size of the kernel by ~13K. text data bss dec hex filename 6596137 1743966 10138672 18478775 119f6b7 vmlinux.o.old 6583002 1743849 10138672 18465523 119c2f3 vmlinux.o.new Link: http://lkml.kernel.org/r/51258746.2060304@huawei.com Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Pass the ftrace_file to the buffer lock reserve codeSteven Rostedt2013-03-151-4/+5
| | | | | | | | | | | | | | Pass the struct ftrace_event_file *ftrace_file to the trace_event_buffer_lock_reserve() (new function that replaces the trace_current_buffer_lock_reserver()). The ftrace_file holds a pointer to the trace_array that is in use. In the case of multiple buffers with different trace_arrays, this allows different events to be recorded into different buffers. Also fixed some of the stale comments in include/trace/ftrace.h Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Separate out trace events from global variablesSteven Rostedt2013-03-151-1/+2
| | | | | | | | | | | | | | | | The trace events for ftrace are all defined via global variables. The arrays of events and event systems are linked to a global list. This prevents multiple users of the event system (what to enable and what not to). By adding descriptors to represent the event/file relation, as well as to which trace_array descriptor they are associated with, allows for more than one set of events to be defined. Once the trace events files have a link between the trace event and the trace_array they are associated with, we can create multiple trace_arrays that can record separate events in separate buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Kill unused and puzzled sample code in ftrace.hShan Wei2012-11-131-73/+0
| | | | | | | | | | | | | | | | | | | | | When doing per-cpu helper optimizing work, find that this code is so puzzled. 1. It's mark as comment text, maybe a sample function for guidelines or a todo work. 2. But, this sample code is odd where struct perf_trace_buf is nonexistent. commit ce71b9 delete struct perf_trace_buf definition. Author: Frederic Weisbecker <fweisbec@gmail.com> Date: Sun Nov 22 05:26:55 2009 +0100 tracing: Use the perf recursion protection from trace event Is it necessary to keep there? just compile test. Link: http://lkml.kernel.org/r/50949FC9.6050202@gmail.com Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Use irq_work for wake ups and remove *_nowake_*() functionsSteven Rostedt2012-11-021-2/+1
| | | | | | | | | | | | | | | | | | Have the ring buffer commit function use the irq_work infrastructure to wake up any waiters waiting on the ring buffer for new data. The irq_work was created for such a purpose, where doing the actual wake up at the time of adding data is too dangerous, as an event or function trace may be in the midst of the work queue locks and cause deadlocks. The irq_work will either delay the action to the next timer interrupt, or trigger an IPI to itself forcing an interrupt to do the work (in a safe location). With irq_work, all ring buffer commits can safely do wakeups, removing the need for the ring buffer commit "nowake" variants, which were used by events and function tracing. All commits can now safely use the normal commit, and the "nowake" variants can be removed. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* perf/trace: Add ability to set a target task for eventsAndrew Vagin2012-07-311-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | A few events are interesting not only for a current task. For example, sched_stat_* events are interesting for a task which wakes up. For this reason, it will be good if such events will be delivered to a target task too. Now a target task can be set by using __perf_task(). The original idea and a draft patch belongs to Peter Zijlstra. I need these events for profiling sleep times. sched_switch is used for getting callchains and sched_stat_* is used for getting time periods. These events are combined in user space, then it can be analyzed by perf tools. Inspired-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arun Sharma <asharma@fb.com> Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* tracing/kvm: Use __print_hex() for kvm_emulate_insn tracepointNamhyung Kim2012-06-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvm_emulate_insn tracepoint used __print_insn() for printing its instructions. However it makes the format of the event hard to parse as it reveals TP internals. Fortunately, kernel provides __print_hex for almost same purpose, we can use it instead of open coding it. The user-space can be changed to parse it later. That means raw kernel tracing will not be affected by this change: # cd /sys/kernel/debug/tracing/ # cat events/kvm/kvm_emulate_insn/format name: kvm_emulate_insn ID: 29 format: ... print fmt: "%x:%llx:%s (%s)%s", REC->csbase, REC->rip, __print_hex(REC->insn, REC->len), \ __print_symbolic(REC->flags, { 0, "real" }, { (1 << 0) | (1 << 1), "vm16" }, \ { (1 << 0), "prot16" }, { (1 << 0) | (1 << 2), "prot32" }, { (1 << 0) | (1 << 3), "prot64" }), \ REC->failed ? " failed" : "" # echo 1 > events/kvm/kvm_emulate_insn/enable # cat trace # tracer: nop # # entries-in-buffer/entries-written: 2183/2183 #P:12 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | qemu-kvm-1782 [002] ...1 140.931636: kvm_emulate_insn: 0:c102fa25:89 10 (prot32) qemu-kvm-1781 [004] ...1 140.931637: kvm_emulate_insn: 0:c102fa25:89 10 (prot32) Link: http://lkml.kernel.org/n/tip-wfw6y3b9ugtey8snaow9nmg5@git.kernel.org Link: http://lkml.kernel.org/r/1340757701-10711-2-git-send-email-namhyung@kernel.org Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: kvm@vger.kernel.org Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* perf: Fix counter of ftrace eventsAndrew Vagin2011-10-041-0/+3
| | | | | | | | | | | | | | | | | | Each event adds some points to its counters. By default it adds 1, and a number of points may be transmited in event's parameters. E.g. sched:sched_stat_runtime adds how long process has been running. But this functionality was broken by v2.6.31-rc5-392-gf413cdb and now the event's parameters doesn't affect on a number of points. TP_perf_assign isn't defined, so __perf_count(c) isn't executed and __count is always equal to 1. Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317052535-1765247-2-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
* tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machineliubo2011-05-261-0/+13
| | | | | | | | | | Filesystem, like Btrfs, has some "ULL" macros, and when these macros are passed to tracepoints'__print_symbolic(), there will be 64->32 truncate WARNINGS during compiling on 32bit box. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Link: http://lkml.kernel.org/r/4DACE6E0.7000507@cn.fujitsu.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Replace trace_event struct array with pointer arraySteven Rostedt2011-02-031-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the trace_event structures are placed in the _ftrace_events section, and at link time, the linker makes one large array of all the trace_event structures. On boot up, this array is read (much like the initcall sections) and the events are processed. The problem is that there is no guarantee that gcc will place complex structures nicely together in an array format. Two structures in the same file may be placed awkwardly, because gcc has no clue that they are suppose to be in an array. A hack was used previous to force the alignment to 4, to pack the structures together. But this caused alignment issues with other architectures (sparc). Instead of packing the structures into an array, the structures' addresses are now put into the _ftrace_event section. As pointers are always the natural alignment, gcc should always pack them tightly together (otherwise initcall, extable, etc would also fail). By having the pointers to the structures in the section, we can still iterate the trace_events without causing unnecessary alignment problems with other architectures, or depending on the current behaviour of gcc that will likely change in the future just to tick us kernel developers off a little more. The _ftrace_event section is also moved into the .init.data section as it is now only needed at boot up. Suggested-by: David Miller <davem@davemloft.net> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing/events: Show real number in array fieldsSteven Rostedt2010-11-191-4/+10
| | | | | | | | | | | | | | | | | | | | | Currently we have in something like the sched_switch event: field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1; When a userspace tool such as perf tries to parse this, the TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro simply uses a #len to show the string of the length. When the length is an enum, we get a string that means nothing for tools. By adding a static buffer and a mutex to protect it, we can store the string into that buffer with snprintf and show the actual number. Now we get: field:char prev_comm[16]; offset:12; size:16; signed:1; Something much more useful. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Allow syscall trace events for non privileged usersFrederic Weisbecker2010-11-181-6/+1
| | | | | | | | | | | | | | | As for the raw syscalls events, individual syscall events won't leak system wide information on task bound tracing. Allow non privileged users to use them in such workflow. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jason Baron <jbaron@redhat.com>
* tracing: New macro to set up initial event flags valueFrederic Weisbecker2010-11-181-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | This introduces the new TRACE_EVENT_FLAGS() macro in order to set up initial event flags value. This macro must simply follow the definition of a trace event and take the event name and the flag value as parameters: TRACE_EVENT(my_event, ..... .... ); TRACE_EVENT_FLAGS(my_event, 1) This will set up 1 as the initial my_event->flags value. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jason Baron <jbaron@redhat.com>
* tracing: Drop cpparg() macroFrederic Weisbecker2010-08-021-5/+2
| | | | | | | | | | Drop the cpparg() macro that wraps CPP parameters. We already have the PARAM() macro for that, no need to have several versions. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com>
* tracing: Reduce latency and remove percpu trace_seqLai Jiangshan2010-07-211-9/+3
| | | | | | | | | | | | | | | __print_flags() and __print_symbolic() use percpu trace_seq: 1) Its memory is allocated at compile time, it wastes memory if we don't use tracing. 2) It is percpu data and it wastes more memory for multi-cpus system. 3) It disables preemption when it executes its core routine "trace_seq_printf(s, "%s: ", #call);" and introduces latency. So we move this trace_seq to struct trace_iterator. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4C078350.7090106@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Use class->reg() for all registering of eventsSteven Rostedt2010-06-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Because kprobes and syscalls need special processing to register events, the class->reg() method was created to handle the differences. But instead of creating a default ->reg for perf and ftrace events, the code was scattered with: if (class->reg) class->reg(); else default_reg(); This is messy and can also lead to bugs. This patch cleans up this code and creates a default reg() entry for the events allowing for the code to directly call the class->reg() without the condition. Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Merge branch 'perf/core' of ↵Ingo Molnar2010-06-091-1/+1
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
| * perf: Drop the skip argument from perf_arch_fetch_regs_callerFrederic Weisbecker2010-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop this argument now that we always want to rewind only to the state of the first caller. It means frame pointers are not necessary anymore to reliably get the source of an event. But this also means we need this helper to be a macro now, as an inline function is not an option since we need to know when to provide a default implentation. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf_events, trace: Fix probe unregister racePeter Zijlstra2010-05-311-1/+1
|/ | | | | | | | | | | | | tracepoint_probe_unregister() does not synchronize against the probe callbacks, so do that explicitly. This properly serializes the callbacks and the free of the data used therein. Also, use this_cpu_ptr() where possible. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1274438476.1674.1702.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2010-05-281-158/+91
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) tracing: Add __used annotation to event variable perf, trace: Fix !x86 build bug perf report: Support multiple events on the TUI perf annotate: Fix up usage of the build id cache x86/mmiotrace: Remove redundant instruction prefix checks perf annotate: Add TUI interface perf tui: Remove annotate from popup menu after failure perf report: Don't start the TUI if -D is used perf: Fix getline undeclared perf: Optimize perf_tp_event_match() perf: Remove more code from the fastpath perf: Optimize the !vmalloc backed buffer perf: Optimize perf_output_copy() perf: Fix wakeup storm for RO mmap()s perf-record: Share per-cpu buffers perf-record: Remove -M perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig ...
| * tracing: Add __used annotation to event variableSteven Rostedt2010-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The TRACE_EVENT() macros automate creation of trace events. To automate initialization, the set up variables are loaded in a special section that is read on boot up. GCC is not aware that these static variables are used and will complain about them if we do not inform GCC that they are indeed used. One of the declarations of the event element was missing a __used annotation. This patch adds it. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * Merge branch 'perf/core' of ↵Steven Rostedt2010-05-211-10/+10
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-7 Conflicts: include/linux/ftrace_event.h include/trace/ftrace.h kernel/trace/trace_event_perf.c kernel/trace/trace_kprobe.c kernel/trace/trace_syscalls.c Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| | * perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to ↵Peter Zijlstra2010-05-211-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | track events Avoid the swevent hash-table by using per-tracepoint hlists. Also, avoid conditionals on the fast path by ordering with probe unregister so that we should never get on the callback path without the data being there. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20100521090710.473188012@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * perf, trace: Optimize tracepoints by removing IRQ-disable from ↵Peter Zijlstra2010-05-211-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf/tracepoint interaction Improves performance. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1274259525.5605.10352.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * Merge branch 'perf/urgent' of ↵Ingo Molnar2010-05-201-14/+19
| | |\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
| | * | perf/ftrace: Optimize perf/tracepoint interaction for single eventsPeter Zijlstra2010-05-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we've got but a single event per tracepoint there is no reason to try and multiplex it so don't. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge branch 'perf/core' of ↵Steven Rostedt2010-05-181-6/+7
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-6 Conflicts: include/trace/ftrace.h kernel/trace/trace_kprobe.c Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Remove duplicate id information in event structureSteven Rostedt2010-05-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the trace_event structure is embedded in the ftrace_event_call structure, there is no need for the ftrace_event_call id field. The id field is the same as the trace_event type field. Removing the id and re-arranging the structure brings down the tracepoint footprint by another 5K. text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4895024 1023812 861512 6780348 6775bc vmlinux.print 4894944 1018052 861512 6774508 675eec vmlinux.id Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Move print functions into event classSteven Rostedt2010-05-141-26/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, every event has its own trace_event structure. This is fine since the structure is needed anyway. But the print function structure (trace_event_functions) is now separate. Since the output of the trace event is done by the class (with the exception of events defined by DEFINE_EVENT_PRINT), it makes sense to have the class define the print functions that all events in the class can use. This makes a bigger deal with the syscall events since all syscall events use the same class. The savings here is another 30K. text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4900382 1048964 861512 6810858 67ecea vmlinux.init 4900446 1049028 861512 6810986 67ed6a vmlinux.preprint 4895024 1023812 861512 6780348 6775bc vmlinux.print To accomplish this, and to let the class know what event is being printed, the event structure is embedded in the ftrace_event_call structure. This should not be an issues since the event structure was created for each event anyway. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Allow events to share their print functionsSteven Rostedt2010-05-141-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple events may use the same method to print their data. Instead of having all events have a pointer to their print funtions, the trace_event structure now points to a trace_event_functions structure that will hold the way to print ouf the event. The event itself is now passed to the print function to let the print function know what kind of event it should print. This opens the door to consolidating the way several events print their output. text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4900382 1048964 861512 6810858 67ecea vmlinux.init 4900446 1049028 861512 6810986 67ed6a vmlinux.preprint This change slightly increases the size but is needed for the next change. v3: Fix the branch tracer events to handle this change. v2: Fix the new function graph tracer event calls to handle this change. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Move raw_init from events to classSteven Rostedt2010-05-141-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The raw_init function pointer in the event is used to initialize various kinds of events. The type of initialization needed is usually classed to the kind of event it is. Two events with the same class will always have the same initialization function, so it makes sense to move this to the class structure. Perhaps even making a special system structure would work since the initialization is the same for all events within a system. But since there's no system structure (yet), this will just move it to the class. text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4900375 1053380 861512 6815267 67fe23 vmlinux.fields 4900382 1048964 861512 6810858 67ecea vmlinux.init The text grew very slightly, but this is a constant growth that happened with the changing of the C files that call the init code. The bigger savings is the data which will be saved the more events share a class. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Move fields from event to class structureSteven Rostedt2010-05-141-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the defined fields from the event to the class structure. Since the fields of the event are defined by the class they belong to, it makes sense to have the class hold the information instead of the individual events. The events of the same class would just hold duplicate information. After this change the size of the kernel dropped another 3K: text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4900252 1057412 861512 6819176 680d68 vmlinux.regs 4900375 1053380 861512 6815267 67fe23 vmlinux.fields Although the text increased, this was mainly due to the C files having to adapt to the change. This is a constant increase, where new tracepoints will not increase the Text. But the big drop is in the data size (as well as needed allocations to hold the fields). This will give even more savings as more tracepoints are created. Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS() with several DEFINE_EVENT()s, then the savings will be lost. But we are pushing developers to consolidate events with DEFINE_EVENT() so this should not be an issue. The kprobes define a unique class to every new event, but are dynamic so it should not be a issue. The syscalls however have a single class but the fields for the individual events are different. The syscalls use a metadata to define the fields. I moved the fields list from the event to the metadata and added a "get_fields()" function to the class. This function is used to find the fields. For normal events and kprobes, get_fields() just returns a pointer to the fields list_head in the class. For syscall events, it returns the fields list_head in the metadata for the event. v2: Fixed the syscall fields. The syscall metadata needs a list of fields for both enter and exit. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Remove per event trace registeringSteven Rostedt2010-05-141-95/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the register functions of TRACE_EVENT() to enable and disable tracepoints. The registering of a event is now down directly in the trace_events.c file. The tracepoint_probe_register() is now called directly. The prototypes are no longer type checked, but this should not be an issue since the tracepoints are created automatically by the macros. If a prototype is incorrect in the TRACE_EVENT() macro, then other macros will catch it. The trace_event_class structure now holds the probes to be called by the callbacks. This removes needing to have each event have a separate pointer for the probe. To handle kprobes and syscalls, since they register probes in a different manner, a "reg" field is added to the ftrace_event_class structure. If the "reg" field is assigned, then it will be called for enabling and disabling of the probe for either ftrace or perf. To let the reg function know what is happening, a new enum (trace_reg) is created that has the type of control that is needed. With this new rework, the 82 kernel events and 618 syscall events has their footprint dramatically lowered: text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4914025 1088868 861512 6864405 68be15 vmlinux.class 4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint 4900252 1057412 861512 6819176 680d68 vmlinux.regs The size went from 6863829 to 6819176, that's a total of 44K in savings. With tracepoints being continuously added, this is critical that the footprint becomes minimal. v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf specific structure in trace_events.c. v4: Fixed trace self tests to check probe because regfunc no longer exists. v3: Updated to handle void *data in beginning of probe parameters. Also added the tracepoint: check_trace_callback_type_##call(). v2: Changed the callback probes to pass void * and typecast the value within the function. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | tracing: Let tracepoints have data passed to tracepoint callbacksSteven Rostedt2010-05-141-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds data to be passed to tracepoint callbacks. The created functions from DECLARE_TRACE() now need a mandatory data parameter. For example: DECLARE_TRACE(mytracepoint, int value, value) Will create the register function: int register_trace_mytracepoint((void(*)(void *data, int value))probe, void *data); As the first argument, all callbacks (probes) must take a (void *data) parameter. So a callback for the above tracepoint will look like: void myprobe(void *data, int value) { } The callback may choose to ignore the data parameter. This change allows callbacks to register a private data pointer along with the function probe. void mycallback(void *data, int value); register_trace_mytracepoint(mycallback, mydata); Then the mycallback() will receive the "mydata" as the first parameter before the args. A more detailed example: DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status)); /* In the C file */ DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status)); [...] trace_mytracepoint(status); /* In a file registering this tracepoint */ int my_callback(void *data, int status) { struct my_struct my_data = data; [...] } [...] my_data = kmalloc(sizeof(*my_data), GFP_KERNEL); init_my_data(my_data); register_trace_mytracepoint(my_callback, my_data); The same callback can also be registered to the same tracepoint as long as the data registered is different. Note, the data must also be used to unregister the callback: unregister_trace_mytracepoint(my_callback, my_data); Because of the data parameter, tracepoints declared this way can not have no args. That is: DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS()); will cause an error. If no arguments are needed, a new macro can be used instead: DECLARE_TRACE_NOARGS(mytracepoint); Since there are no arguments, the proto and args fields are left out. This is part of a series to make the tracepoint footprint smaller: text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4914025 1088868 861512 6864405 68be15 vmlinux.class 4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint Again, this patch also increases the size of the kernel, but lays the ground work for decreasing it. v5: Fixed net/core/drop_monitor.c to handle these updates. v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the #ifdef CONFIG_TRACE_POINTS, since the two are the same in both cases. The __DECLARE_TRACE() is what changes. Thanks to Frederic Weisbecker for pointing this out. v3: Made all register_* functions require data to be passed and all callbacks to take a void * parameter as its first argument. This makes the calling functions comply with C standards. Also added more comments to the modifications of DECLARE_TRACE(). v2: Made the DECLARE_TRACE() have the ability to pass arguments and added a new DECLARE_TRACE_NOARGS() for tracepoints that do not need any arguments. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>