perf record: Enable off-cpu analysis with BPF

Add --off-cpu option to enable the off-cpu profiling with BPF. It'd use a bpf_output event and rename it to "offcpu-time". Samples will be synthesized at the end of the record session using data from a BPF map which contains the aggregated off-cpu time at context switches. So it needs root privilege to get the off-cpu profiling. Each sample will have a separate user stacktrace so it will skip kernel threads. The sample ip will be set from the stacktrace and other sample data will be updated accordingly. Currently it only handles some basic sample types. The sample timestamp is set to a dummy value just not to bother with other events during the sorting. So it has a very big initial value and increase it on processing each samples. Good thing is that it can be used together with regular profiling like cpu cycles. If you don't want to that, you can use a dummy event to enable off-cpu profiling only. Example output: $ sudo perf record --off-cpu perf bench sched messaging -l 1000 $ sudo perf report --stdio --call-graph=no # Total Lost Samples: 0 # # Samples: 41K of event 'cycles' # Event count (approx.): 42137343851 ... # Samples: 1K of event 'offcpu-time' # Event count (approx.): 587990831640 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. ......................... # 81.66% 0.00% sched-messaging libc-2.33.so [.] __libc_start_main 81.66% 0.00% sched-messaging perf [.] cmd_bench 81.66% 0.00% sched-messaging perf [.] main 81.66% 0.00% sched-messaging perf [.] run_builtin 81.43% 0.00% sched-messaging perf [.] bench_sched_messaging 40.86% 40.86% sched-messaging libpthread-2.33.so [.] __read 37.66% 37.66% sched-messaging libpthread-2.33.so [.] __write 2.91% 2.91% sched-messaging libc-2.33.so [.] __poll ... As you can see it spent most of off-cpu time in read and write in bench_sched_messaging(). The --call-graph=no was added just to make the output concise here. It uses perf hooks facility to control BPF program during the record session rather than adding new BPF/off-cpu specific calls. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Blake Jones <blakejones@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220518224725.742882-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
author: Namhyung Kim <namhyung@kernel.org> 2022-05-19 00:47:21 +0200
committer: Arnaldo Carvalho de Melo <acme@redhat.com> 2022-05-26 17:36:57 +0200
commit: edc41a1099c2d08ccfd4ed7d59688501e3749015 (patch)
tree: 30b5f519a0987aa8ad07440587bf933b643278a2 /tools/perf/Documentation
parent: perf report: Do not extend sample type of bpf-output event (diff)
download: linux-edc41a1099c2d08ccfd4ed7d59688501e3749015.tar.xz
linux-edc41a1099c2d08ccfd4ed7d59688501e3749015.zip
1 files changed, 10 insertions, 0 deletions
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 465be4e62a17..b4e9ef7edfef 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -758,6 +758,16 @@ include::intel-hybrid.txt[]
 	If the URLs is not specified, the value of DEBUGINFOD_URLS
 	system environment variable is used.
 
+--off-cpu::
+	Enable off-cpu profiling with BPF.  The BPF program will collect
+	task scheduling information with (user) stacktrace and save them
+	as sample data of a software event named "offcpu-time".  The
+	sample period will have the time the task slept in nanoseconds.
+
+	Note that BPF can collect stack traces using frame pointer ("fp")
+	only, as of now.  So the applications built without the frame
+	pointer might see bogus addresses.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1]
author	Namhyung Kim <namhyung@kernel.org>	2022-05-19 00:47:21 +0200
committer	Arnaldo Carvalho de Melo <acme@redhat.com>	2022-05-26 17:36:57 +0200
commit	edc41a1099c2d08ccfd4ed7d59688501e3749015 (patch)
tree	30b5f519a0987aa8ad07440587bf933b643278a2 /tools/perf/Documentation
parent	perf report: Do not extend sample type of bpf-output event (diff)
download	linux-edc41a1099c2d08ccfd4ed7d59688501e3749015.tar.xz linux-edc41a1099c2d08ccfd4ed7d59688501e3749015.zip