summaryrefslogtreecommitdiffstats
path: root/tools/perf (follow)
Commit message (Collapse)AuthorAgeFilesLines
* perf tools: Add time-based utility functionsDavid Ahern2016-12-013-0/+98
| | | | | | | | | | | | | | | | Add function to parse a user time string of the form <start>,<stop> where start and stop are time in sec.nsec format. Both start and stop times are optional. Add function to determine if a sample time is within a given time time window of interest. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1480439746-42695-2-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf script: Add option to stop printing callchainDavid Ahern2016-11-295-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow user to specify list of symbols which cause the dump of callchains to stop at that symbol. Committer notes: Testing it: # perf record -ag usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.177 MB perf.data (33 samples) ] # # # Without it: # # perf script swapper 0 [000] 9693.370039: 1 cycles:ppp: 2072ad x86_pmu_enable (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a29d7 perf_pmu_enable.part.90 (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a713a ctx_resched (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a76c1 __perf_event_enable (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a0390 event_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a1cff remote_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 326978 flush_smp_call_function_queue (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 327413 generic_smp_call_function_single_interrupt (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 249b37 smp_call_function_single_interrupt (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) a04b2c call_function_single_interrupt (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 889427 cpuidle_enter (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 2e534a call_cpuidle (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 2e5730 cpu_startup_entry (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 9f5167 rest_init (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 137ffeb start_kernel ([kernel.vmlinux].init.text) 137f2ca x86_64_start_reservations ([kernel.vmlinux].init.text) 137f419 x86_64_start_kernel ([kernel.vmlinux].init.text) swapper 0 [000] 9693.370044: 1 cycles:ppp: 20ca1b intel_pmu_handle_irq (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 205b0c perf_event_nmi_handler (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 22a14a nmi_handle (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 22a6b3 default_do_nmi (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 22a83c do_nmi (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) a03fb1 end_repeat_nmi (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a29d7 perf_pmu_enable.part.90 (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a713a ctx_resched (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a76c1 __perf_event_enable (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a0390 event_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a1cff remote_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 326978 flush_smp_call_function_queue (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 327413 generic_smp_call_function_single_interrupt (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 249b37 smp_call_function_single_interrupt (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) a04b2c call_function_single_interrupt (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 889427 cpuidle_enter (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 2e534a call_cpuidle (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 2e5730 cpu_startup_entry (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 9f5167 rest_init (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 137ffeb start_kernel ([kernel.vmlinux].init.text) 137f2ca x86_64_start_reservations ([kernel.vmlinux].init.text) # # # Using it to see just what are the calls from the 'remote_function' function: # # perf script --stop-bt remote_function swapper 0 [000] 9693.370039: 1 cycles:ppp: 2072ad x86_pmu_enable (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a29d7 perf_pmu_enable.part.90 (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a713a ctx_resched (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a76c1 __perf_event_enable (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a0390 event_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a1cff remote_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) swapper 0 [000] 9693.370044: 1 cycles:ppp: 20ca1b intel_pmu_handle_irq (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 205b0c perf_event_nmi_handler (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 22a14a nmi_handle (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 22a6b3 default_do_nmi (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 22a83c do_nmi (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) a03fb1 end_repeat_nmi (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a29d7 perf_pmu_enable.part.90 (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a713a ctx_resched (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a76c1 __perf_event_enable (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a0390 event_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) 3a1cff remote_function (/usr/lib/debug/lib/modules/4.8.8-300.fc25.x86_64/vmlinux) Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1480104021-36275-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf kmem stat: Track memory freedDavid Ahern2016-11-291-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Track freed memory as well as allocations and show the net in the summary. Committer notes: Testing it: # perf kmem record usleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.626 MB perf.data (4208 samples) ] [root@jouet ~]# perf kmem stat --slab SUMMARY (SLAB allocator) ======================== Total bytes requested: 234,011 Total bytes allocated: 234,504 Total bytes freed: 213,328 <------ Net total bytes allocated: 21,176 Total bytes wasted on internal fragmentation: 493 Internal fragmentation: 0.210231% Cross CPU allocations: 4/1,963 # Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1480110133-37039-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf test: Remove "test" and similar strings from test descriptionsArnaldo Carvalho de Melo2016-11-294-59/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having "test" in almost all test descriptions is redundant, simplify it removing and rewriting tests with such descriptions. End result: # perf test 1: vmlinux symtab matches kallsyms : Ok 2: Detect openat syscall event : Ok 3: Detect openat syscall event on all cpus : Ok 4: Read samples using the mmap interface : Ok 5: Parse event definition strings : Ok 6: PERF_RECORD_* events & perf_sample fields : Ok 7: Parse perf pmu format : Ok 8: DSO data read : Ok 9: DSO data cache : Ok 10: DSO data reopen : Ok 11: Roundtrip evsel->name : Ok 12: Parse sched tracepoints fields : Ok 13: syscalls:sys_enter_openat event fields : Ok 14: Setup struct perf_event_attr : Ok 15: Match and link multiple hists : Ok 16: 'import perf' in python : Ok 17: Breakpoint overflow signal handler : Ok 18: Breakpoint overflow sampling : Ok 19: Number of exit events of a simple workload : Ok 20: Software clock events period values : Ok 21: Object code reading : Ok 22: Sample parsing : Ok 23: Use a dummy software event to keep tracking: Ok 24: Parse with no sample_id_all bit set : Ok 25: Filter hist entries : Ok 26: Lookup mmap thread : Ok 27: Share thread mg : Ok 28: Sort output of hist entries : Ok 29: Cumulate child hist entries : Ok 30: Track with sched_switch : Ok 31: Filter fds with revents mask in a fdarray : Ok 32: Add fd to a fdarray, making it autogrow : Ok 33: kmod_path__parse : Ok 34: Thread map : Ok 35: LLVM search and compile : 35.1: Basic BPF llvm compile : Ok 35.2: kbuild searching : Ok 35.3: Compile source for BPF prologue generation: Ok 35.4: Compile source for BPF relocation : Ok 36: Session topology : Ok 37: BPF filter : 37.1: Basic BPF filtering : Ok 37.2: BPF prologue generation : Ok 37.3: BPF relocation checker : Ok 38: Synthesize thread map : Ok 39: Synthesize cpu map : Ok 40: Synthesize stat config : Ok 41: Synthesize stat : Ok 42: Synthesize stat round : Ok 43: Synthesize attr update : Ok 44: Event times : Ok 45: Read backward ring buffer : Ok 46: Print cpu map : Ok 47: Probe SDT events : Ok 48: is_printable_array : Ok 49: Print bitmap : Ok 50: perf hooks : Ok 51: x86 rdpmc : Ok 52: Convert perf time to TSC : Ok 53: DWARF unwind : Ok 54: x86 instruction decoder - new instructions : Ok 55: Intel cqm nmi context read : Skip # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-rx2lbfcrrio2yx1fxcljqy0e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Introduce perf hooksWang Nan2016-11-299-0/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perf hooks allow hooking user code at perf events. They can be used for manipulation of BPF maps, taking snapshot and reporting results. In this patch two perf hook points are introduced: record_start and record_end. To avoid buggy user actions, a SIGSEGV signal handler is introduced into 'perf record'. It turns off perf hook if it causes a segfault and report an error to help debugging. A test case for perf hook is introduced. Test result: $ ./buildperf/perf test -v hook 50: Test perf hooks : --- start --- test child forked, pid 10311 SIGSEGV is observed as expected, try to recover. Fatal error (SEGFAULT) in perf hook 'test' test child finished with 0 ---- end ---- Test perf hooks: Ok Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Stringer <joe@ovn.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/20161126070354.141764-5-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf trace: Update tid/pid filtering option to leverage symbol_confDavid Ahern2016-11-251-40/+9
| | | | | | | | | Leverage pid/tid filtering done by symbol_conf hooks. Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1480091392-35645-1-git-send-email-dsa@cumulusnetworks.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sched timehist: Handle cpu migration eventsDavid Ahern2016-11-252-2/+99
| | | | | | | | | | | | Add handlers for sched:sched_migrate_task event. Total number of migrations is added to summary display and -M/--migrations can be used to show migration events. Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1480091321-35591-1-git-send-email-dsa@cumulusnetworks.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Show invalid jump offset in error messageArnaldo Carvalho de Melo2016-11-251-2/+4
| | | | | | | | | | | | | | | | | To help in debugging when the wrong offset is being used, like in: │13d98: ↓ jne 13dd1 <lzma_lzma_preset@@XZ_5.0+0x28e1> That is the full line from objdump, and it seems what should be used is 13dd1, not 28e1. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-4nc0marsgst1ft6inmvqber7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf ui helpline: Provide a printf variantArnaldo Carvalho de Melo2016-11-252-0/+11
| | | | | | | | | | | | | To print some values, like in the annotation code with invalid jump offsets. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1vk0g5twas2ioswn1mmvnvwq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add missing struct definition in probe_event.hWang Nan2016-11-251-0/+2
| | | | | | | | | | | | | | | | | Commit 0b3c2264ae30 ("perf symbols: Fix kallsyms perf test on ppc64le") refers struct symbol in probe_event.h, but forgets to include its definition. Gcc will complain about it when that definition is not added, by sheer luck, by some other header included before probe_event.h. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: He Kuang <hekuang@huawei.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/20161115040617.69788-4-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Fix segfault when running with suid and kptr_restrict is 1Wang Nan2016-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch perf panics if kptr_restrict is set to 1 and perf is owned by root with suid set: $ whoami wangnan $ ls -l ./perf -rwsr-xr-x 1 root root 19781908 Sep 21 19:29 /home/wangnan/perf $ cat /proc/sys/kernel/kptr_restrict 1 $ cat /proc/sys/kernel/perf_event_paranoid -1 $ ./perf record -a Segmentation fault (core dumped) $ The reason is that perf assumes it is allowed to read kptr from /proc/kallsyms when euid is root, but in fact the kernel doesn't allow reading kptr when euid and uid do not match with each other: $ cp /bin/cat . $ sudo chown root:root ./cat $ sudo chmod u+s ./cat $ cat /proc/kallsyms | grep do_fork 0000000000000000 T _do_fork <--- kptr is hidden even euid is root $ sudo cat /proc/kallsyms | grep do_fork ffffffff81080230 T _do_fork See lib/vsprintf.c for kernel side code. This patch fixes this problem by checking both uid and euid. Signed-off-by: Wang Nan <wangnan0@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: He Kuang <hekuang@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/20161115040617.69788-3-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix kernel version error in ubuntuWang Nan2016-11-251-2/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ubuntu the internal kernel version code is different from what can be retrived from uname: $ uname -r 4.4.0-47-generic $ cat /lib/modules/`uname -r`/build/include/generated/uapi/linux/version.h #define LINUX_VERSION_CODE 263192 #define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c)) $ cat /lib/modules/`uname -r`/build/include/generated/utsrelease.h #define UTS_RELEASE "4.4.0-47-generic" #define UTS_UBUNTU_RELEASE_ABI 47 $ cat /proc/version_signature Ubuntu 4.4.0-47.68-generic 4.4.24 The macro LINUX_VERSION_CODE is set to 4.4.24 (263192 == 0x40418), but `uname -r` reports 4.4.0. This mismatch causes LINUX_VERSION_CODE macro passed to BPF script become an incorrect value, results in magic failure in BPF loading: $ sudo ./buildperf/perf record -e ./tools/perf/tests/bpf-script-example.c ls event syntax error: './tools/perf/tests/bpf-script-example.c' \___ Failed to load program for unknown reason According to Ubuntu document (https://wiki.ubuntu.com/Kernel/FAQ), the correct kernel version can be retrived through /proc/version_signature, which is ubuntu specific. This patch checks the existance of /proc/version_signature, and returns version number through parsing this file instead of uname. Version string is untouched (value returns from uname) because `uname -r` is required to be consistence with path of kbuild directory in /lib/module. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: He Kuang <hekuang@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/20161115040617.69788-2-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sched timehist: Enlarge max stack depth by 2Namhyung Kim2016-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When it records callchains, they will always have 2 scheduler functions (__schedule + schedule or __schedule + preempt_schedule) and get ignored. So it should collect 2 more functions to show the expected number of callchains to user. Committer Notes: Example of final result, using the same perf.data file as in the previous cset comment, but this time redirecting the output of 'perf sched timehist' to a file instead of copy'n'pasting from xterm: [root@jouet experimental]# perf sched timehist > /tmp/bla [root@jouet experimental]# cat /tmp/bla time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ---- -------------------- ------ ------ ----- 6.494998 [01] <idle> 0.000 0.000 0.000 6.495027 [02] perf[519] 0.000 0.000 0.000 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 6.495096 [03] <idle> 0.000 0.000 0.000 6.495100 [03] rcuos/0[9] 0.000 0.005 0.003 rcu_nocb_kthread <- kthread <- ret_from_fork 6.495113 [01] perf[520] 0.000 0.008 0.114 preempt_schedule_common <- _cond_resched <- wait_for_completion <- stop_one_cpu <- sched_exec <- do_execveat_common.isra.35 6.495121 [00] <idle> 0.000 0.000 0.000 6.495129 [01] migration/1[17] 0.000 0.003 0.016 smpboot_thread_fn <- kthread <- ret_from_fork 6.496085 [02] <idle> 0.000 0.000 1.057 6.496096 [02] kworker/u16:1[31169] 0.000 0.004 0.011 worker_thread <- kthread <- ret_from_fork 6.496096 [03] <idle> 0.003 0.000 0.996 6.496169 [02] <idle> 0.011 0.000 0.072 6.496171 [00] ls[520] 0.008 0.000 1.049 do_exit <- do_group_exit <- [unknown] <- entry_SYSCALL_64_fastpath 6.496172 [03] gnome-terminal-[4391] 0.000 0.003 0.076 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161124011114.7102-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sched timehist: Mark schedule function in callchainsNamhyung Kim2016-11-252-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sched_switch event always captured from the scheduler function. So it'd be great omit them from the callchain. This patch marks the functions to be omitted by later patch. Committer notes: Testing it: Before: [root@jouet experimental]# perf sched record -g ls Dockerfile perf.data x-mips64 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.355 MB perf.data (29 samples) ] [root@jouet experimental]# perf sched timehist time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ----------- ----- ----------------- ------ ------ ------ 6.494998 [001] <idle> 0.000 0.000 0.000 6.495027 [002] perf[519] 0.000 0.000 0.000 __schedule <- schedule <- schedule_hrtimeout_range_clock <- schedule_hrtimeou 6.495096 [003] <idle> 0.000 0.000 0.000 6.495100 [003] rcuos/0[9] 0.000 0.005 0.003 __schedule <- schedule <- rcu_nocb_kthread <- kthread <- ret_from_fork 6.495113 [001] perf[520] 0.000 0.008 0.114 __schedule <- preempt_schedule_common <- _cond_resched <- wait_for_completion 6.495121 [000] <idle> 0.000 0.000 0.000 6.495129 [001] migration/1[17] 0.000 0.003 0.016 __schedule <- schedule <- smpboot_thread_fn <- kthread <- ret_from_fork 6.496085 [002] <idle> 0.000 0.000 1.057 6.496096 [002] kworker/u16:1[31169] 0.000 0.004 0.011 __schedule <- schedule <- worker_thread <- kthread <- ret_from_fork 6.496096 [003] <idle> 0.003 0.000 0.996 6.496169 [002] <idle> 0.011 0.000 0.072 6.496171 [000] ls[520] 0.008 0.000 1.049 __schedule <- schedule <- do_exit <- do_group_exit <- [unknown] 6.496172 [003] gnome-terminal-[4391] 0.000 0.003 0.076 __schedule <- schedule <- schedule_hrtimeout_range_clock <- schedule_hrtimeo After: [root@jouet experimental]# perf sched timehist time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ----------- ----- ----------------- ----- ----- ------ 6.494998 [001] <idle> 0.000 0.000 0.000 6.495027 [002] perf[519] 0.000 0.000 0.000 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_t 6.495096 [003] <idle> 0.000 0.000 0.000 6.495100 [003] rcuos/0[9] 0.000 0.005 0.003 rcu_nocb_kthread <- kthread <- ret_from_fork 6.495113 [001] perf[520] 0.000 0.008 0.114 preempt_schedule_common <- _cond_resched <- wait_for_completion <- stop_one_c 6.495121 [000] <idle> 0.000 0.000 0.000 6.495129 [001] migration/1[17] 0.000 0.003 0.016 smpboot_thread_fn <- kthread <- ret_from_fork 6.496085 [002] <idle> 0.000 0.000 1.057 6.496096 [002] kworker/u16:1[31169] 0.000 0.004 0.011 worker_thread <- kthread <- ret_from_fork 6.496096 [003] <idle> 0.003 0.000 0.996 6.496169 [002] <idle> 0.011 0.000 0.072 6.496171 [000] ls[520] 0.008 0.000 1.049 do_exit <- do_group_exit <- [unknown] 6.496172 [003] gnome-terminal-[4391] 0.000 0.003 0.076 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_ [root@jouet experimental]# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161124011114.7102-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Add option to skip ignore symbol when printing callchainsNamhyung Kim2016-11-253-2/+9
| | | | | | | | | | | | | | | For tracepoint events, callchains always contain certain functions. Sometimes it'd be better to skip those functions as they have no value. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161124011114.7102-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Initial PowerPC supportRavi Bangoria2016-11-252-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the PowerPC architecture using the ins_ops association method. Committer notes: Testing it with a perf.data file collected on a PowerPC machine and cross-annotated on a x86_64 workstation, using the associated vmlinux file: $ perf report -i perf.data.f22vm.powerdev --vmlinux vmlinux.powerpc .ktime_get vmlinux.powerpc │ clrldi r9,r28,63 8.57 │ ┌──bne e0 <- TUI cursor positioned here │54:│ lwsync 2.86 │ │ std r2,40(r1) │ │ ld r9,144(r31) │ │ ld r3,136(r31) │ │ ld r30,184(r31) │ │ ld r10,0(r9) │ │ mtctr r10 │ │ ld r2,8(r9) 8.57 │ │→ bctrl │ │ ld r2,40(r1) │ │ ld r10,160(r31) │ │ ld r5,152(r31) │ │ lwz r7,168(r31) │ │ ld r9,176(r31) 8.57 │ │ lwz r6,172(r31) │ │ lwsync 2.86 │ │ lwz r8,128(r31) │ │ cmpw cr7,r8,r28 2.86 │ │↑ bne 48 │ │ subf r10,r10,r3 │ │ mr r3,r29 │ │ and r10,r10,r5 2.86 │ │ mulld r10,r10,r7 │ │ add r9,r10,r9 │ │ srd r9,r9,r6 │ │ add r9,r9,r30 │ │ std r9,0(r29) │ │ addi r1,r1,144 │ │ ld r0,16(r1) │ │ ld r28,-32(r1) │ │ ld r29,-24(r1) │ │ ld r30,-16(r1) │ │ mtlr r0 │ │ ld r31,-8(r1) │ │← blr 5.71 │e0:└─→mr r1,r1 11.43 │ mr r2,r2 11.43 │ lwz r28,128(r31) Press 'h' for help on key bindings $ perf report -i perf.data.f22vm.powerdev --header-only # ======== # captured on: Thu Nov 24 12:40:38 2016 # hostname : pdev-f22-qemu # os release : 4.4.10-200.fc22.ppc64 # perf version : 4.9.rc1.g6298ce # arch : ppc64 # nrcpus online : 48 # nrcpus avail : 48 # cpudesc : POWER7 (architected), altivec supported # cpuid : 74,513 # total memory : 4158976 kB # cmdline : /home/ravi/Workspace/linux/tools/perf/perf record -a # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1 # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # pmu mappings: cpu = 4, software = 1, tracepoint = 2, breakpoint = 5 # missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK HEADER_GROUP_DESC HEADER_AUXTRACE HEADER_STAT HEADER_CACHE # ======== # $ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Link: http://lkml.kernel.org/n/tip-tbjnp40ddoxxl474uvhwi6g4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Improve support for ARMArnaldo Carvalho de Melo2016-11-252-96/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By using arch->init() to set up some regular expressions to associate ins_ops to ARM instructions, ditching that old table that has instructions not present on ARM. Take advantage of having an arch->init() to hide more arm specific stuff from the common code, like the objdump details. The regular expressions comes from a patch written by Kim Phillips. Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-77m7lufz9ajjimkrebtg5ead@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Allow arches to have a init routine and a priv areaArnaldo Carvalho de Melo2016-11-251-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | Arches like ARM will want to use regular expressions when deciding what instructions to associate with what ins_ops, provide infrastructure for that. Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7dmnk9el2ipu3nxog092k9z5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Introduce alternative method of keeping instructions tableArnaldo Carvalho de Melo2016-11-251-1/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some arches may want to dynamically populate the table using regular expressions on the instruction names to associate them with a set of parsing/formatting/etc functions (struct ins_ops), so provide a fallback for when the ins__find() method fails. That fall back will be able to resize the arch->instructions, setting arch->nr_instructions appropriately, helper functions to associate an ins_ops to an instruction name, growing the arch->instructions if needed and resorting it are provided, all the arch specific callback needs to do is to decide if the missing instruction should be added to arch->instructions with a ins_ops association. Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-auu13yradxf7g5dgtpnzt97a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Remove duplicate 'name' field from disasm_lineArnaldo Carvalho de Melo2016-11-253-57/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The disasm_line::name field is always equal to ins::name, being used just to locate the instruction's ins_ops from the per-arch instructions table. Eliminate this duplication, nuking that field and instead make ins__find() return an ins_ops, store it in disasm_line::ins.ops, and keep just in disasm_line::ins.name what was in disasm_line::name, this way we end up not keeping a reference to entries in the per-arch instructions table. This in turn will help supporting multiple ways to manage the per-arch instructions table, allowing resorting that array, for instance, when the entries will move after references to its addresses were made. The same problem is avoided when one grows the array with realloc. So architectures simply keeping a constant array will work as well as architectures building the table using regular expressions or other logic that involves resorting the table. Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vr899azvabnw9gtuepuqfd9t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge tag 'perf-core-for-mingo-20161123' of ↵Ingo Molnar2016-11-2418-175/+1370
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New tool: - 'perf sched timehist' provides an analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ------ ---------------- --------- --------- -------- 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 1.874604 [0011] <idle> 1.148 0.000 0.035 1.874723 [0005] <idle> 0.016 0.000 1.383 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. (David Ahern, Namhyung Kim) Improvements: - Make 'perf c2c report' support -f/--force, to allow skipping the ownership check for root users, for instance, just like the other tools (Jiri Olsa) - Allow sorting cachelines by total number of HITMs, in addition to local and remote numbers (Jiri Olsa) Fixes: - Make sure errors aren't suppressed by the TUI reset at the end of a 'perf c2c report' session (Jiri Olsa) Infrastructure changes: - Initial work on having the annotate code better support multiple architectures, including the ability to cross-annotate, i.e. to annotate perf.data files collected on an ARM system on a x86_64 workstation (Arnaldo Carvalho de Melo, Ravi Bangoria, Kim Phillips) - Use USECS_PER_SEC instead of hard coded number in libtraceevent (Steven Rostedt) - Add retrieval of preempt count and latency flags in libtraceevent (Steven Rostedt) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf sched timehist: Add -V/--cpu-visual optionDavid Ahern2016-11-232-2/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -V option provides a visual aid for sched switches by cpu: $ perf sched timehist -V time cpu 0123456789abc task name b/n time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------- -------------------- --------- --------- --------- ... 2412598.429696 [0009] i <idle> 0.000 0.000 0.000 2412598.429767 [0002] s perf[7219] 0.000 0.000 0.000 2412598.429783 [0009] s perf[7220] 0.000 0.006 0.087 2412598.429794 [0010] i <idle> 0.000 0.000 0.000 2412598.429795 [0009] s migration/9[53] 0.000 0.003 0.011 2412598.430370 [0010] s sleep[7220] 0.011 0.000 0.576 2412598.432584 [0003] i <idle> 0.000 0.000 0.000 ... Committer notes: 'i' marks idle time, 's' are scheduler events. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-8-namhyung@kernel.org [ Add documentation based on above commit message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf sched timehist: Add call graph optionsDavid Ahern2016-11-232-6/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If callchains were recorded they are appended to the line with a default stack depth of 5: 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 wait_for_completion_killable <- do_fork <- sys_vfork <- stub_vfork <- __vfork 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 __cond_resched <- _cond_resched <- wait_for_completion <- stop_one_cpu <- sched_exec 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 smpboot_thread_fn <- kthread <- ret_from_fork 1.874604 [0011] <idle> 1.148 0.000 0.035 cpu_startup_entry <- start_secondary 1.874723 [0005] <idle> 0.016 0.000 1.383 cpu_startup_entry <- start_secondary 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 do_wait sys_wait4 <- system_call_fastpath <- __GI___waitpid --no-call-graph can be used to not show the callchains. --max-stack is used to control the number of frames shown (default of 5). -x/--excl options can be used to collapse redundant callchains to get more relevant data on screen. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-7-namhyung@kernel.org [ Add documentation based on above commit message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf sched timehist: Add -w/--wakeups optionDavid Ahern2016-11-232-4/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -w option is to show wakeup events with timehist. $ perf sched timehist -w time cpu task name b/n time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ -------------------- --------- --------- --------- 2412598.429689 [0002] perf[7219] awakened: perf[7220] 2412598.429696 [0009] <idle> 0.000 0.000 0.000 2412598.429767 [0002] perf[7219] 0.000 0.000 0.000 2412598.429780 [0009] perf[7220] awakened: migration/9[53] ... Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-6-namhyung@kernel.org [ Add documentation based on above commit message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf sched timehist: Add summary optionsDavid Ahern2016-11-231-6/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -s/--summary option is to show process runtime statistics. And the -S/--with-summary option is to show the stats with the normal output. $ perf sched timehist -s Runtime summary comm parent sched-in run-time min-run avg-run max-run stddev (count) (msec) (msec) (msec) (msec) % --------------------------------------------------------------------------------------------------------- ksoftirqd/0[3] 2 2 0.011 0.004 0.005 0.006 14.87 rcu_preempt[7] 2 11 0.071 0.002 0.006 0.017 20.23 watchdog/0[11] 2 1 0.002 0.002 0.002 0.002 0.00 watchdog/1[12] 2 1 0.004 0.004 0.004 0.004 0.00 ... Terminated tasks: sleep[7220] 7219 3 0.770 0.087 0.256 0.576 62.28 Idle stats: CPU 0 idle for 2352.006 msec CPU 1 idle for 2764.497 msec CPU 2 idle for 2998.229 msec CPU 3 idle for 2967.800 msec Total number of unique tasks: 52 Total number of context switches: 2532 Total run time (msec): 218.036 Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-5-namhyung@kernel.org [ Add documentation from last commit, so that docs comes with the cset that introduces the feature ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf sched timehist: Introduce timehist commandDavid Ahern2016-11-232-7/+637
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'perf sched timehist' provides an analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------------- ------ -------------------- --------- --------- --------- 79371.874569 [0011] gcc[31949] 0.014 0.000 1.148 79371.874591 [0010] gcc[31951] 0.000 0.000 0.024 79371.874603 [0010] migration/10[59] 3.350 0.004 0.011 79371.874604 [0011] <idle> 1.148 0.000 0.035 79371.874723 [0005] <idle> 0.016 0.000 1.383 79371.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. Committer note: Add above explanation as the 'perf sched timehist' entry for 'man perf-sched'. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf evsel: Support printing callchains with arrowsNamhyung Kim2016-11-232-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EVSEL__PRINT_CALLCHAIN_ARROW options can be used to print callchains with arrows for readability. It will be used 'sched timehist' command like below: __schedule <- schedule <- schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork __schedule <- schedule <- schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork __schedule <- schedule <- worker_thread <- kthread <- ret_from_fork Suggested-and-Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf symbols: Print symbol offsets conditionallyNamhyung Kim2016-11-233-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __symbol__fprintf_symname_offs() always shows symbol offsets. So there's no difference between 'perf script -F ip,sym' and 'perf script -F ip,sym,symoff'. I don't think it's a desired behavior.. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20161116060634.28477-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf c2c: Support cascading optionsJiri Olsa2016-11-231-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding support for cascading options added by Namhyung in: commit 369a2478973a ("tools lib subcmd: Support cascading options") This way the report and record command share options with with c2c command and can save some option duplicates. For now it's the 'v' option. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1479764011-10732-7-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf c2c report: Display total HITMs on defaultJiri Olsa2016-11-232-7/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we display the cacheline list sorted on remote HITMs by default. The problem is that they might not be always counted and 'perf c2c report' displays an empty output. Thus it's more convenient to display and sort the cacheline list based on the total of HITMs and have the best change to see data in the default report run. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1479764011-10732-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf c2c report: Add struct c2c_stats::tot_hitm fieldJiri Olsa2016-11-232-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Count total number of HITMs in a special field. This will ease up addition of total HITM sorting into c2c report in the following patch. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1479764011-10732-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf c2c report: Add -f/--force optionJiri Olsa2016-11-232-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding -f/--force option to go through ownership validation: $ sudo perf c2c report File perf.data not owned by current user or root (use -f to override) $ $ sudo perf c2c report -f < c2c report output > $ Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1479764011-10732-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf c2c report: Setup browser after opening perf.dataJiri Olsa2016-11-231-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of the early browser switch we won't get possible error messages, as it will clear the screen right after showing the message, e.g.: Before: $ sudo perf c2c report -d lcl $ After: $ sudo perf c2c report -d lcl File perf.data not owned by current user or root (use -f to override) $ $ ls -la perf.data -rw-------. 1 acme acme 26648 Nov 22 15:11 perf.data $ Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1479764011-10732-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf tools: Show event fd in debug outputJiri Olsa2016-11-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is useful for debug to see file descriptors for each event. Before: $ perf stat -vvv -e cycles,cache-misses ls ... sys_perf_event_open: pid 12146 cpu -1 group_fd -1 flags 0x8 ... sys_perf_event_open: pid 12146 cpu -1 group_fd 3 flags 0x8 sys_perf_event_open failed, error -13 Now: $ perf stat -vvv -e cycles,cache-misses ls ... sys_perf_event_open: pid 12858 cpu -1 group_fd -1 flags 0x8 = 3 ... sys_perf_event_open: pid 12858 cpu -1 group_fd 3 flags 0x8 sys_perf_event_open failed, error -13 Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1479764011-10732-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf annotate: Add per arch instructions annotate handlersArnaldo Carvalho de Melo2016-11-173-106/+198
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Another step in supporting cross annotation. The arch specific tables are put in: tools/perf/arch/$ARCH/annotation/instructions.c which, so far, just plug instructions to a bunch of parsers/formatters, but may have more as the need arises. This is an alternative implementation to a previous attempt made by Ravi Bangoria. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-g3wt282lfa51j4qd0813e3az@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf annotate: Allow arches to specify functions to skipArnaldo Carvalho de Melo2016-11-171-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is to cope with an ARM specific kludge introduced in the original patch supporting ARM annotation, cfef25b8daf7 ("perf annotate: ARM support") that made functions with a '+' in its name to be skipped when processing call instructions. With this patchkit it should be possible to collect a perf.data file on a ARM machine and then annotate it on a x86 workstation and have those ARM kludges used. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2fi3sy7q3sssdi7m7cbe07gy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf annotate: Start supporting cross arch annotationArnaldo Carvalho de Melo2016-11-175-23/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a 'struct arch', where arch specific stuff will live, starting with objdump's choice of comment delimitation character, that is '#' in x86 while a ';' in arm. This has some bits and pieces from a patch submitted by Ravi. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-f337tzjjcl8vtapgvjxmhrbx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar2016-11-242-16/+29
|\ \ | |/ |/| | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf hists: Fix column length on --hierarchyNamhyung Kim2016-11-091-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Markus reported that there's a weird behavior on perf top --hierarchy regarding the column length. Looking at the code, I found a dubious code which affects the symptoms. When --hierarchy option is used, the last column length might be inaccurate since it skips to update the length on leaf entries. I cannot remember why it did and looks like a leftover from previous version during the development. Anyway, updating the column length often is not harmful. So let's move the code out. Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 1a3906a7e6b9 ("perf hists: Resort hist entries with hierarchy") Link: http://lkml.kernel.org/r/20161108130833.9263-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf hists browser: Fix column indentation on --hierarchyNamhyung Kim2016-11-091-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When horizontall scrolling is used in hierarchy mode, the the right most column has unnecessary indentation. Actually it's needed only if some of left (overhead) columns were shown. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161108130833.9263-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf hists browser: Show folded sign properly on --hierarchyNamhyung Kim2016-11-091-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When horizontal scrolling is used in hierarchy mode, the folded signed disappears at the right most column. Committer note: To test it, run 'perf top --hierarchy, see the '+' symbol at the first column, then press the right arrow key, the '+' symbol will disappear, this patch fixes that. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161108130833.9263-3-namhyung@kernel.org [ Move 'width -= 2' invariant to right after the if/else ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf hists browser: Fix indentation of folded sign on --hierarchyNamhyung Kim2016-11-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | It should indent 2 spaces for folded sign and a whitespace. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161108130833.9263-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf hist browser: Fix hierarchy column countsNamhyung Kim2016-11-091-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf report/top on TUI supports horizontal scrolling using LEFT and RIGHT keys. But it calculate the number of columns incorrectly when hierarchy mode is enabled so that keep pressing RIGHT key can make the output disappeared. In the hierarchy mode, all sort keys are collapsed into a single column, so it needs to be applied when calculating column numbers. Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161024162110.17918-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Show branch info in callchain entry for browser modeJin Yao2016-11-141-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the branch is 100% predicted then the "predicted" is hidden. Similarly, if there is no branch tsx abort, the "abort" is hidden. There is only cycles shown (cycle is supported on skylake platform, older platform would be 0). If no iterations, the "iterations" is hidden. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-6-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Show branch info in callchain entry for stdio modeJin Yao2016-11-141-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the branch is 100% predicted then the "predicted" is hidden. Similarly, if there is no branch tsx abort, the "abort" is hidden. There is only cycles shown (cycle is supported on skylake platform, older platform would be 0). If no iterations, the "iterations" is hidden. For example: |--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18) | main div.c:44 (predicted:50.6%, cycles:1) | | | --22.69%--main div.c:42 (cycles:2, iterations:17) | compute_flag div.c:28 (cycles:2) | | | --10.52%--compute_flag div.c:27 (cycles:1) | rand rand.c:28 (cycles:1) | rand rand.c:28 (cycles:1) | __random random.c:298 (cycles:1) | __random random.c:297 (cycles:1) | __random random.c:295 (cycles:1) | __random random.c:295 (cycles:1) | __random random.c:295 (cycles:1) | __random random.c:295 (cycles:6) Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-5-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Calculate and return the branch flag countingJin Yao2016-11-142-1/+202
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create some branch counters in per callchain list entry. Each counter is for a branch flag. For example, predicted_count counts all the *predicted* branches. The counters get updated by processing the callchain cursor nodes. It also provides functions to retrieve or print the values of counters in callchain list. Besides the counting for branch flags, it also counts and returns the average number of iterations. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-4-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Create a symbol_conf flag for showing branch flag countingJin Yao2016-11-142-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a new flag show_branchflag_count in symbol_conf. The flag is used to control if showing the branch flag counting information. The flag depends on if the perf.data has branch data and if user chooses the "branch-history" option in perf report command line. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-3-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Add branch flag to callchain cursor nodeJin Yao2016-11-143-18/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the branch ip has been added to call stack for easier browsing, this patch adds more branch information. For example, add a flag to indicate if this ip is a branch, and also add with the branch flag. Then we can know if the cursor node represents a branch and know what the branch flag it has. The branch history code has a loop detection pass that removes loops. It would be nice for knowing how many loops were removed then in next steps, we can compute out the average number of iterations. For example: Before remove_loops(), entry0: from = 0x100, to = 0x200 entry1: from = 0x300, to = 0x250 entry2: from = 0x300, to = 0x250 entry3: from = 0x300, to = 0x250 entry4: from = 0x700, to = 0x800 After remove_loops() entry0: from = 0x100, to = 0x200 entry1: from = 0x300, to = 0x250 entry2: from = 0x700, to = 0x800 The original entry2 and entry3 are removed. So the number of iterations (from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1). iterations = removed number + 1; average iteractions = Sum(iteractions) / number of samples This formula ignores other cases, for example, iterations cross multiple buffers and one buffer contains 2+ loops. Because in practice, it's good enough. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com [ Renamed 'iter' to 'nr_loop_iter' for clarity ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf config: Mark where are config items from (user or system)Taeung Song2016-11-143-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To write config items to a particular config file, we should know where is each config section and item from. Current setting functionality of perf-config use autogenerating way by overwriting collected config items to a config file. For example, when collecting config items from user and system config files (i.e. ~/.perfconfig and $(sysconf)/perfconfig), perf_config_set can contain both user and system config items. So we should know where each value is from to avoid merging user and system config items on user config file. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Nambong Ha <over3025@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wookje Kwon <aweee0@gmail.com> Link: http://lkml.kernel.org/r/1478241862-31230-7-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf config: Add support setting variables in a config fileTaeung Song2016-11-144-7/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add setting feature that can add config variables with their values to a config file (i.e. user or system config file) or modify config key-value pairs in a config file. For the syntax examples: perf config [<file-option>] [section.name[=value] ...] e.g. You can set the ui.show-headers to false with # perf config ui.show-headers=false If you want to add or modify several config items, you can do like # perf config annotate.show_nr_jumps=false kmem.default=slab Committer notes: Testing it: $ perf config -l top.children=true report.children=false $ $ perf config top.children=false $ perf config -l top.children=false report.children=false $ $ perf config kmem.default=slab $ perf config -l top.children=false report.children=false kmem.default=slab $ Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Nambong Ha <over3025@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wookje Kwon <aweee0@gmail.com> Link: http://lkml.kernel.org/r/1478241862-31230-5-git-send-email-treeze.taeung@gmail.com [ Combined patch with docs update with this one ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>