summaryrefslogtreecommitdiffstats
path: root/tools/perf/util/callchain.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* perf maps: Hide maps internalsIan Rogers2024-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Move the struct into the C file. Add maps__equal to work around exposing the struct for reference count checking. Add accessors for the unwind_libunwind_ops. Move maps_list_node to its only use in symbol.c. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Artem Savkov <asavkov@redhat.com> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240210031746.4057262-6-irogers@google.com
* perf callchain: Fix spelling mistake "statisitcs" -> "statistics"Colin Ian King2023-10-281-2/+2
| | | | | | | | | There are a couple of spelling mistakes in perror messages. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20231027084633.1167530-1-colin.i.king@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
* perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exitIan Rogers2023-10-251-18/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix leak where mem_info__put wouldn't release the maps/map as used by perf mem. Add exit functions and use elsewhere that the maps and map are released. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
* perf callchain: Make brtype_stat in callchain_list optionalIan Rogers2023-10-251-8/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct callchain_list is 352bytes in size, 232 of which are brtype_stat. brtype_stat is only used for certain callchain_list items so make it optional, allocating when necessary. So that printing doesn't need to deal with an optional brtype_stat, pass an empty/zero version. Before: ``` struct callchain_list { u64 ip; /* 0 8 */ struct map_symbol ms; /* 8 24 */ struct { _Bool unfolded; /* 32 1 */ _Bool has_children; /* 33 1 */ }; /* 32 2 */ /* XXX 6 bytes hole, try to pack */ u64 branch_count; /* 40 8 */ u64 from_count; /* 48 8 */ u64 predicted_count; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 abort_count; /* 64 8 */ u64 cycles_count; /* 72 8 */ u64 iter_count; /* 80 8 */ u64 iter_cycles; /* 88 8 */ struct branch_type_stat brtype_stat; /* 96 232 */ /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */ const char * srcline; /* 328 8 */ struct list_head list; /* 336 16 */ /* size: 352, cachelines: 6, members: 13 */ /* sum members: 346, holes: 1, sum holes: 6 */ /* last cacheline: 32 bytes */ }; ``` After: ``` struct callchain_list { u64 ip; /* 0 8 */ struct map_symbol ms; /* 8 24 */ struct { _Bool unfolded; /* 32 1 */ _Bool has_children; /* 33 1 */ }; /* 32 2 */ /* XXX 6 bytes hole, try to pack */ u64 branch_count; /* 40 8 */ u64 from_count; /* 48 8 */ u64 predicted_count; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 abort_count; /* 64 8 */ u64 cycles_count; /* 72 8 */ u64 iter_count; /* 80 8 */ u64 iter_cycles; /* 88 8 */ struct branch_type_stat * brtype_stat; /* 96 8 */ const char * srcline; /* 104 8 */ struct list_head list; /* 112 16 */ /* size: 128, cachelines: 2, members: 13 */ /* sum members: 122, holes: 1, sum holes: 6 */ }; ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
* perf callchain: Make display use of branch_type_stat constIan Rogers2023-10-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Display code doesn't modify the branch_type_stat so switch uses to const. This is done to aid refactoring struct callchain_list where current the branch_type_stat is embedded even if not used. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
* libperf rc_check: Add RC_CHK_EQUALIan Rogers2023-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Comparing pointers with reference count checking is tricky to avoid a SEGV. Add a convenience macro to simplify and use. Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
* perf callchain: Use pthread keys for tls callchain_cursorIan Rogers2023-06-121-2/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pthread keys are more portable than __thread and allow the association of a destructor with the key. Use the destructor to clean up TLS callchain cursors to aid understanding memory leaks. Committer notes: Had to fixup a series of unconverted places and also check for the return of get_tls_callchain_cursor() as it may fail and return NULL. In that unlikely case we now either print something to a file, if the caller was expecting to print a callchain, or return an error code to state that resolving the callchain isn't possible. In some cases this was made easier because thread__resolve_callchain() already can fail for other reasons, so this new one (cursor == NULL) can be added and the callers don't have to explicitely check for this new condition. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-25-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf map/maps/thread: Changes to reference countingIan Rogers2023-06-121-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix missed reference count gets and puts as detected with leak sanitizer and reference count checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-21-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf map: Changes to reference countingIan Rogers2023-04-071-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a pointer to a map exists do a get, when that pointer is overwritten or freed, put the map. This avoids issues with gets and puts being inconsistently used causing, use after puts, etc. For example, the map in struct addr_location is changed to hold a reference count. Reference count checking and address sanitizer were used to identify issues. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools: Rename __fallthrough to fallthroughLiam Howlett2023-04-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the fallthrough attribute to better align with the kernel version. Copy the definition from include/linux/compiler_attributes.h including the #else clause. Adding the #else clause allows the tools compiler.h header to drop the check for a definition entirely and keeps both definitions together. Change any __fallthrough statements to fallthrough anywhere it was used within perf. This allows other tools to use the same key word as the kernel. Committer notes: Did some missing conversions to: builtin-list.c Also included gtk.h before the 'fallthrough' definition in: tools/perf/ui/gtk/hists.c tools/perf/ui/gtk/helpline.c tools/perf/ui/gtk/browser.c As it is the arg name for a macro in glib.h: /var/home/acme/git/perf-tools-next/tools/include/linux/compiler-gcc.h:16:55: error: missing binary operator before token "(" 16 | # define fallthrough __attribute__((__fallthrough__)) | ^ /usr/include/glib-2.0/glib/gmacros.h:637:28: note: in expansion of macro ‘fallthrough’ 637 | #if g_macro__has_attribute(fallthrough) Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tom Rix <trix@redhat.com> Cc: linux-sparse@vger.kernel.org <linux-sparse@vger.kernel.org> Cc: llvm@lists.linux.dev <llvm@lists.linux.dev> Link: https://lore.kernel.org/r/20221125154947.2163498-1-Liam.Howlett@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf map: Add accessor for dsoIan Rogers2023-04-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf maps: Add functions to access mapsIan Rogers2023-04-041-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce functions to access struct maps. These functions reduce the number of places reference counting is necessary. While tidying APIs do some small const-ification, in particlar to unwind_libunwind_ops. Committer notes: Fixed up tools/perf/util/unwind-libunwind.c: - return ops->get_entries(cb, arg, thread, data, max_stack); + return ops->get_entries(cb, arg, thread, data, max_stack, best_effort); Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Remove unneeded 'result' variableye xingchen2022-10-041-10/+2
| | | | | | | | | | | | | | Return the value scnprintf() directly instead of storing it in a redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Allow to specify max stack depth of fp callchainNamhyung Kim2022-07-121-6/+12
| | | | | | | | | | | | | | | | | | Currently it has no interface to specify the max stack depth for perf record. Extend the command line parameter to accept a number after 'fp' to specify the depth like '--call-graph fp,32'. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220615163222.1275500-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf maps: Use a pointer for kmapsIan Rogers2022-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct maps is reference counted, using a pointer is more idiomatic. Committer notes: Delay: maps = machine__kernel_maps(&vmlinux); To after: machine__init(&vmlinux, "", HOST_KERNEL_ID); To avoid this on f34: In file included from /var/home/acme/git/perf/tools/perf/util/build-id.h:10, from /var/home/acme/git/perf/tools/perf/util/dso.h:13, from tests/vmlinux-kallsyms.c:8: In function ‘machine__kernel_maps’, inlined from ‘test__vmlinux_matches_kallsyms’ at tests/vmlinux-kallsyms.c:122:22: /var/home/acme/git/perf/tools/perf/util/machine.h:86:23: error: ‘vmlinux.kmaps’ is used uninitialized [-Werror=uninitialized] 86 | return machine->kmaps; | ~~~~~~~^~~~~~~ tests/vmlinux-kallsyms.c: In function ‘test__vmlinux_matches_kallsyms’: tests/vmlinux-kallsyms.c:121:34: note: ‘vmlinux’ declared here 121 | struct machine kallsyms, vmlinux; | ^~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220211103415.2737789-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Enable dwarf_callchain_users on arm64Alexandre Truong2021-12-211-1/+13
| | | | | | | | | | | | | | | | | | | | Enable dwarf_callchain_users on arm64 which will be needed to do a DWARF unwind in order to get the caller of the leaf frame. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211217154521.80603-5-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix various typos in commentsIngo Molnar2021-03-231-1/+1
| | | | | | | | | | | Fix ~124 single-word typos and a few spelling errors in the perf tooling code, accumulated over the years. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf streams: Report hot streamsJin Yao2020-10-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | We show the streams separately. They are divided into different sections. 1. "Matched hot streams" 2. "Hot streams in old perf data only" 3. "Hot streams in new perf data only". For each stream, we report the cycles and hot percent (hits%). For example, cycles: 2, hits: 4.08% -------------------------- main div.c:42 compute_flag div.c:28 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-7-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf streams: Calculate the sum of total streams hitsJin Yao2020-10-141-0/+32
| | | | | | | | | | | | | | | We have used callchain_node->hit to measure the hot level of one stream. This patch calculates the sum of hits of total streams. Thus in next patch, we can use following formula to report hot percent for one stream. hot percent = callchain_node->hit / sum of total hits Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf streams: Compare two streamsJin Yao2020-10-141-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | Stream is the branch history which is aggregated by the branch records from perf samples. Now we support the callchain as stream. If the callchain entries of one stream are fully matched with the callchain entries of another stream, we think two streams are matched. For example, cycles: 1, hits: 26.80% cycles: 1, hits: 27.30% ----------------------- ----------------------- main div.c:39 main div.c:39 main div.c:44 main div.c:44 Above two streams are matched (we don't consider the case that source code is changed). The matching logic is, compare the chain string first. If it's not matched, fallback to dso address comparison. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Setup callchain properly in pipe modeJiri Olsa2020-05-281-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Callchains are automatically initialized by checking on event's sample_type. For pipe mode we need to put this check into attr event code. Moving the callchains setup code into callchain_param_setup function and calling it from attr event process code. This enables pipe output having callchains, like: # perf record -g -e 'raw_syscalls:sys_enter' true | perf script # perf record -g -e 'raw_syscalls:sys_enter' true | perf report Committer notes: We still need the next patch for the above output to work. Reported-by: Paul Khuong <pvk@pvk.ca> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200507095024.2789147-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf map_symbol: Rename ms->mg to ms->mapsArnaldo Carvalho de Melo2019-11-261-1/+1
| | | | | | | | | | | One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-61rra2wg392rhvdgw421wzpt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf addr_location: Rename al->mg to al->mapsArnaldo Carvalho de Melo2019-11-261-4/+4
| | | | | | | | | | | One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-foo95pyyp3bhocbt7yd8qrvq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add a 'struct map_groups' pointer to 'struct map_symbol'Arnaldo Carvalho de Melo2019-11-121-0/+1
| | | | | | | | | | | | And fill it whenever we setup a a 'struct map_symbol', now we need to use it, next cset. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fzwfcnddenz1o7uj1fzw3g46@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Use 'struct map_symbol' in 'struct callchain_cursor_node'Arnaldo Carvalho de Melo2019-11-121-17/+16
| | | | | | | | | | | To ease passing around map+symbol, just like done for other parts of the tree recently. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add map_groups to 'struct addr_location'Arnaldo Carvalho de Melo2019-11-121-3/+3
| | | | | | | | | | | | | | From there we can get al->mg->machine, so replace that field with the more useful 'struct map_groups' that for now we're obtaining from al->map->groups, and that is one thing getting into the way of maps being fully shareable. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-4qdducrm32tgrjupcp0kjh1e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Move event synthesizing routines to separate headerArnaldo Carvalho de Melo2019-09-201-0/+1
| | | | | | | | | | | | | Those are the only routines using the perf_event__handler_t typedef and are all related, so move to a separate header to reduce the header dependency tree, lots of places were getting event.h and even stdio.h, limits.h indirectly, so fix those as well. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-yvx9u1mf7baq6cu1abfhbqgs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf dsos: Move the dsos struct and its methods to separate source filesArnaldo Carvalho de Melo2019-09-011-0/+1
| | | | | | | | | | | | | So that we can reduce the header dependency tree further, in the process noticed that lots of places were getting even things like build-id routines and 'struct perf_tool' definition indirectly, so fix all those too. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ti0btma9ow5ndrytyoqdk62j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf debug: Remove needless include directives from debug.hArnaldo Carvalho de Melo2019-09-011-0/+1
| | | | | | | | | | | | All we need there is a forward declaration for 'union perf_event', so remove it from there and add missing header directives in places using things from this indirect include. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Remove needless perf.h include directive from headersArnaldo Carvalho de Melo2019-08-291-0/+1
| | | | | | | | | | | Its not needed there, add it to the places that need it and were getting it via those headers. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-5yulx1u16vyd0zmrbg1tjhju@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sort: Remove needless headers from sort.h, provide fwd struct declsArnaldo Carvalho de Melo2019-08-261-0/+1
| | | | | | | | | | | Reducing the includes hell a bit more, speeding up the build and avoiding needless rebuilds when just one of those files gets updated. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-u63el2vqsovsmnhebx1rcixo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Rename struct perf_evsel to struct evselJiri Olsa2019-07-291-1/+1
| | | | | | | | | | | | | | | | | | | Rename struct perf_evsel to struct evsel, so we don't have a name clash when we add struct perf_evsel in libperf. Committer notes: Added fixes for arm64, provided by Jiri. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Use list_del_init() more thorouglyArnaldo Carvalho de Melo2019-07-091-5/+5
| | | | | | | | | | | | To allow for destructors to check if they're operating on a object still in a list, and to avoid going from use after free list entries into still valid, or even also other already removed from list entries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-deh17ub44atyox3j90e6rksu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools lib: Adopt zalloc()/zfree() from tools/perfArnaldo Carvalho de Melo2019-07-091-1/+1
| | | | | | | | | | Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist: Remove symbol.h from hist.h, just fwd decls are neededArnaldo Carvalho de Melo2019-02-061-0/+1
| | | | | | | | | | To reduce the includes dependencies. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-cmvg5ght75mmfg1efeyna9rn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* pref tools: Add missing map.h includesArnaldo Carvalho de Melo2019-02-061-0/+1
| | | | | | | | | | | | Lots of places get the map.h file indirectly, and since we're going to remove it from machine.h, then those need to include it directly, do it now, before we remove that dep. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ob8jehdjda8h5jsrv9dqj9tf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Uninline callchain_cursor_reset() to remove map.h dependencyArnaldo Carvalho de Melo2019-02-061-0/+15
| | | | | | | | | | | | | That was the only thing that made including map.h in callchain.h a requiriment, so uninline it and just add a 'struct map' forward declaration. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7fjz4hvv1bpzqaeriku44fn4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Fix wrong iteration count in --branch-historyJin Yao2019-01-041-12/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By calculating the removed loops, we can get the iteration count. But the iteration count could be reported incorrectly, reporting impossibly high counts. That's because previous code uses the number of removed LBR entries for the iteration count. That's not good. Fix this by increasing the iteration count when a loop is detected. When matching the chain, the iteration count would be added up, finally we need to compute the average value when printing out. For example, $ perf report --branch-history --stdio --no-children Before: ---f2 +0 | |--33.62%--f1 +9 (cycles:1) | f1 +0 | main +22 (cycles:1) | main +17 | main +38 (cycles:1) | main +27 | f1 +26 (cycles:1) | f1 +24 | f2 +27 (cycles:7) | f2 +0 | f1 +19 (cycles:1) | f1 +14 | f2 +27 (cycles:11) | f2 +0 | f1 +9 (cycles:1 iter:2968 avg_cycles:3) | f1 +0 | main +22 (cycles:1 iter:2968 avg_cycles:3) | main +17 | main +38 (cycles:1 iter:2968 avg_cycles:3) 2968 is an impossible high iteration count and avg_cycles is too small. After: ---f2 +0 | |--33.62%--f1 +9 (cycles:1) | f1 +0 | main +22 (cycles:1) | main +17 | main +38 (cycles:1) | main +27 | f1 +26 (cycles:1) | f1 +24 | f2 +27 (cycles:7) | f2 +0 | f1 +19 (cycles:1) | f1 +14 | f2 +27 (cycles:11) | f2 +0 | f1 +9 (cycles:1 iter:1 avg_cycles:23) | f1 +0 | main +22 (cycles:1 iter:1 avg_cycles:23) | main +17 | main +38 (cycles:1 iter:1 avg_cycles:23) avg_cycles:23 is the average cycles of this iteration. Fixes: c4ee06251d42 ("perf report: Calculate the average cycles of iterations") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1546582230-17507-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf unwind: Do not look just at the global callchain_param.record_modeArnaldo Carvalho de Melo2018-01-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When setting up DWARF callchains on specific events, without using 'record' or 'trace' --call-graph, but instead doing it like: perf trace -e cycles/call-graph=dwarf/ The unwind__prepare_access() call in thread__insert_map() when we process PERF_RECORD_MMAP(2) metadata events were not being performed, precluding us from using per-event DWARF callchains, handling them just when we asked for all events to be DWARF, using "--call-graph dwarf". We do it in the PERF_RECORD_MMAP because we have to look at one of the executable maps to figure out the executable type (64-bit, 32-bit) of the DSO laid out in that mmap. Also to look at the architecture where the perf.data file was recorded. All this probably should be deferred to when we process a sample for some thread that has callchains, so that we do this processing only for the threads with samples, not for all of them. For now, fix using DWARF on specific events. Before: # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.048 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.048/0.048/0.048/0.000 ms 0.000 probe_libc:inet_pton:(7fe9597bb350)) Problem processing probe_libc:inet_pton callchain, skipping... # After: # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.060 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.060/0.060/0.060/0.000 ms 0.000 probe_libc:inet_pton:(7fd4aa930350)) __inet_pton (inlined) gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so) __GI_getaddrinfo (inlined) [0xffffaa804e51af3f] (/usr/bin/ping) __libc_start_main (/usr/lib64/libc-2.26.so) [0xffffaa804e51b379] (/usr/bin/ping) # # perf trace --call-graph=dwarf --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.057 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.057/0.057/0.057/0.000 ms 0.000 probe_libc:inet_pton:(7f9363b9e350)) __inet_pton (inlined) gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so) __GI_getaddrinfo (inlined) [0xffffa9e8a14e0f3f] (/usr/bin/ping) __libc_start_main (/usr/lib64/libc-2.26.so) [0xffffa9e8a14e1379] (/usr/bin/ping) # # perf trace --call-graph=fp --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.077 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.077/0.077/0.077/0.000 ms 0.000 probe_libc:inet_pton:(7f4947e1c350)) __inet_pton (inlined) gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so) __GI_getaddrinfo (inlined) [0xffffaa716d88ef3f] (/usr/bin/ping) __libc_start_main (/usr/lib64/libc-2.26.so) [0xffffaa716d88f379] (/usr/bin/ping) # # perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=fp/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.078 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms 0.000 probe_libc:inet_pton:(7fa157696350)) __GI___inet_pton (/usr/lib64/libc-2.26.so) getaddrinfo (/usr/lib64/libc-2.26.so) [0xffffa9ba39c74f40] (/usr/bin/ping) # Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/r/20180116182650.GE16107@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge branch 'linus' into perf/core, to fix conflictsIngo Molnar2017-11-071-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: tools/perf/arch/arm/annotate/instructions.c tools/perf/arch/arm64/annotate/instructions.c tools/perf/arch/powerpc/annotate/instructions.c tools/perf/arch/s390/annotate/instructions.c tools/perf/arch/x86/tests/intel-cqm.c tools/perf/ui/tui/progress.c tools/perf/util/zlib.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2017-11-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | perf callchain: Fix double mapping al->addr for children without self periodNamhyung Kim2017-10-311-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Milian Wolff found a problem he described in [1] and that for him would get fixed: "Note how most of the large offset values are now gone. Most notably, we get proper srcline resolution for the random.h and complex headers." Then Namhyung found the root cause: "I looked into it and found a bug handling cumulative (children) entries. For children entries that have no self period, the al->addr (so he->ip) ends up having an doubly-mapped address. It seems to be there from the beginning but only affects entries that have no srclines - finding srcline itself is done using a different address but it will show the invalid address if no srcline was found. I think we should fix the commit c7405d85d7a3 ("perf tools: Update cpumode for each cumulative entry")." [1] https://lkml.kernel.org/r/20171018185350.14893-7-milian.wolff@kdab.com Reported-by: Milian Wolff <milian.wolff@kdab.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: kernel-team@lge.com Fixes: c7405d85d7a3 ("perf tools: Update cpumode for each cumulative entry") Link: https://lkml.kernel.org/r/20171020051533.GA2746@sejong Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Use srcline from callchain for hist entriesMilian Wolff2017-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This also removes the symbol name from the srcline column, more on this below. This ensures we use the correct srcline, which could originate from a potentially inlined function. The hist entries used to query for the srcline based purely on the IP, which leads to wrong results for inlined entries. Before: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ .................................................................................................................................. # 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 44.58% 0.00% main+100 44.58% 0.00% std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% std::__complex_abs+100 44.58% 0.00% std::abs<double>+100 44.58% 0.00% std::norm<double>+100 36.01% 0.00% hypot+18446603487892193300 25.81% 0.00% main+41 25.81% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.75% 25.75% random.h:143 18.39% 0.00% main+57 18.39% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% random.tcc:3330 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ After: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ ........................................... # 94.30% 1.19% main.cpp:39 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 48.44% 1.70% random.h:1823 48.44% 0.00% random.h:1814 46.74% 2.53% random.h:185 44.68% 0.10% complex:589 44.68% 0.00% complex:597 44.68% 0.00% complex:654 44.68% 0.00% complex:664 40.61% 13.80% random.tcc:3330 36.01% 0.00% hypot+18446603487892193300 26.81% 0.00% random.h:151 26.81% 0.00% random.h:332 25.75% 25.75% random.h:143 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ Note that this change removes the symbol from the source:line hist column. If this information is desired, users should explicitly query for it if needed. I.e. run this command instead: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... ........................................... # 94.30% 1.19% [.] main main.cpp:39 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1814 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1823 46.74% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) random.h:185 44.68% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) complex:654 44.68% 0.00% [.] std::__complex_abs (inlined) complex:589 44.68% 0.00% [.] std::abs<double> (inlined) complex:597 44.68% 0.00% [.] std::norm<double> (inlined) complex:664 39.80% 13.59% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 36.01% 0.00% [.] hypot hypot+18446603487892193300 26.81% 0.00% [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) random.h:151 26.81% 0.00% [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) random.h:332 25.75% 0.00% [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) random.h:143 25.19% 25.19% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Compared to the old behavior, this reduces duplication in the output. Before we used to print the symbol name in the srcline column even when the sym column was explicitly requested. I.e. the output was: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... .................................................................................................................................. # 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 44.58% 0.00% [.] main main+100 44.58% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% [.] std::__complex_abs (inlined) std::__complex_abs+100 44.58% 0.00% [.] std::abs<double> (inlined) std::abs<double>+100 44.58% 0.00% [.] std::norm<double> (inlined) std::norm<double>+100 36.01% 0.00% [.] hypot hypot+18446603487892193300 25.81% 0.00% [.] main main+41 25.81% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.69% 25.69% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 18.39% 0.00% [.] main main+57 18.39% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Properly handle branch count in match_chain()Milian Wolff2017-10-251-62/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the code paths I introduced before returned too early without running the code to handle a node's branch count. By refactoring match_chain to only have one exit point, this can be remedied. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf callchain: Compare symbol name for inlined frames when matchingMilian Wolff2017-10-241-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fake symbols we create for inlined frames will represent different functions but can use the symbol start address. This leads to issues when different inline branches all lead to the same function. Before: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | --37.57%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) | --36.36%--std::abs<double> (inlined) std::__complex_abs (inlined) | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~~~~~ Note that this backtrace representation is completely bogus. Complex abs does not call the linear congruential engine! It is just a side-effect of a longer inlined stack being appended to a shorter, different inlined stack, both of which originate in the same function (main). This patch fixes the issue: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) | std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) | std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) | std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) | --1.99%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) std::abs<double> (inlined) std::__complex_abs (inlined) ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> [ Fix up conflict with c1fbc0cf81f1 ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf callchain: Mark inlined frames in output by " (inlined)" suffixMilian Wolff2017-10-241-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original patch that introduced inline frame output in the various browsers used this suffix already. The new centralized approach that uses fake symbols for inlined frames was missing this approach so far. Instead of changing the symbol name itself, we only print the suffix where needed. This allows us to efficiently lookup the symbol for a given name without first having to append the suffix before the lookup. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf report: Fall-back to function name comparison for -g srclineMilian Wolff2017-10-241-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a callchain entry has no srcline available, we ended up comparing the instruction pointer. I consider this to be not too useful. Rather, I think we should group the entries by function name, which this patch adds. For people who want to split the data on the IP boundary, using `-g address` is the correct choice. Before: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--56.03%--hypot | | | | | |--8.45%--__hypot_finite | | | | | |--7.62%--__hypot_finite | | | | | |--2.29%--__hypot_finite | | | | | |--2.24%--__hypot_finite | | | | | |--2.06%--__hypot_finite | | | | | |--1.81%--__hypot_finite ... ~~~~~ After: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--60.29%--hypot | | | | | --56.03%--__hypot_finite | | | --0.85%--cabs ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf callchain: Store srcline in callchain_cursor_nodeMilian Wolff2017-10-241-22/+9
|/ | | | | | | | | | | | | | | | | | | | | | | This is mostly a preparation to enable the creation of full callchain nodes for inline frames. Such frames will reference the IP of the non-inlined frame, but hold the symbol and srcline for an inlined location. As such, we won't be able to query the srcline on-demand based on the IP alone. Instead, we will leverage the functionality provided by this patch here, and store the srcline for the inlined nodes in the new srcline member of callchain_cursor_node. Note that this patch on its own leaks the srcline, as there is no free_callchain_cursor_node or similar. A future patch will add caching of the srcline and handle deletion properly. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Compare dsos (as well) for CCKEY_FUNCTIONRavi Bangoria2017-10-051-1/+5
| | | | | | | | | | | | | | | | | | | | | | Two functions from different binaries can have same start address. Thus, comparing only start address in match_chain() leads to inconsistent callchains. Fix this by adding a check for dsos as well. Ex, https://www.spinics.net/lists/linux-perf-users/msg04067.html Reported-by: Alexander Pozdneev <pozdneyev@gmail.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/20171005091234.5874-1-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Fix debug messages with --call-graph optionMengting Zhang2017-09-251-14/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With --call-graph option, perf report can display call chains using type, min percent threshold, optional print limit and order. And the default call-graph parameter is 'graph,0.5,caller,function,percent'. Before this patch, 'perf report --call-graph' shows incorrect debug messages as below: # perf report --call-graph Invalid callchain mode: 0.5 Invalid callchain order: 0.5 Invalid callchain sort key: 0.5 Invalid callchain config key: 0.5 Invalid callchain mode: caller Invalid callchain mode: function Invalid callchain order: function Invalid callchain mode: percent Invalid callchain order: percent Invalid callchain sort key: percent That is because in function __parse_callchain_report_opt(),each field of the call-graph parameter is passed to parse_callchain_{mode,order, sort_key,value} in turn until it meets the matching value. For example, the order field "caller" is passed to parse_callchain_mode() firstly and obviously it doesn't match any mode field. Therefore parse_callchain_mode() will shows the debug message "Invalid callchain mode: caller", which could confuse users. The patch fixes this issue by moving the warning out of the function parse_callchain_{mode,order,sort_key,value}. Signed-off-by: Mengting Zhang <zhangmengting@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1506154694-39691-1-git-send-email-zhangmengting@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>