diff options
author | Steinar H. Gunderson <sesse@google.com> | 2022-03-22 09:24:52 +0100 |
---|---|---|
committer | Arnaldo Carvalho de Melo <acme@redhat.com> | 2023-02-17 15:02:44 +0100 |
commit | 7e55b95651d88e60368087c243525a0d97d43d3d (patch) | |
tree | 9011e1069c510ed8655695a0dab84d09647e3a7a /tools/perf/util/auxtrace.h | |
parent | perf c2c: Add report option to show false sharing in adjacent cachelines (diff) | |
download | linux-7e55b95651d88e60368087c243525a0d97d43d3d.tar.xz linux-7e55b95651d88e60368087c243525a0d97d43d3d.zip |
perf intel-pt: Synthesize cycle events
There is no good reason why we cannot synthesize "cycle" events from
Intel PT just as we can synthesize "instruction" events, in particular
when CYC packets are available. This enables using PT to getting much
more accurate cycle profiles than regular sampling (record -e cycles)
when the work last for very short periods (<10 ms). Thus, add support
for this, based off of the existing IPC calculation framework. The new
option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle
and instruction events can be synthesized together, and are by default.
The only real caveat is that CYC packets are only emitted whenever some
other packet is, which in practice is when a branch instruction is
encountered (and not even all branches). Thus, even at no subsampling
(e.g. --itrace=y0ns), it is impossible to get more accuracy than a
single basic block, and all cycles spent executing that block will get
attributed to the branch instruction that ends the packet. Thus, one
cannot know whether the cycles came from e.g. a specific load, a
mispredicted branch, or something else. When subsampling (which is the
default), the cycle events will get smeared out even more, but will
still be generally useful to attribute cycle counts to functions.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Diffstat (limited to 'tools/perf/util/auxtrace.h')
-rw-r--r-- | tools/perf/util/auxtrace.h | 7 |
1 files changed, 6 insertions, 1 deletions
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index 2cf63d377831..29eb82dff574 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -71,6 +71,9 @@ enum itrace_period_type { * @inject: indicates the event (not just the sample) must be fully synthesized * because 'perf inject' will write it out * @instructions: whether to synthesize 'instructions' events + * @cycles: whether to synthesize 'cycles' events + * (not fully accurate, since CYC packets are only emitted + * together with other events, such as branches) * @branches: whether to synthesize 'branches' events * (branch misses only for Arm SPE) * @transactions: whether to synthesize events for transactions @@ -119,6 +122,7 @@ struct itrace_synth_opts { bool default_no_sample; bool inject; bool instructions; + bool cycles; bool branches; bool transactions; bool ptwrites; @@ -643,6 +647,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session, #define ITRACE_HELP \ " i[period]: synthesize instructions events\n" \ +" y[period]: synthesize cycles events (same period as i)\n" \ " b: synthesize branches events (branch misses for Arm SPE)\n" \ " c: synthesize branches events (calls only)\n" \ " r: synthesize branches events (returns only)\n" \ @@ -674,7 +679,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session, " A: approximate IPC\n" \ " Z: prefer to ignore timestamps (so-called \"timeless\" decoding)\n" \ " PERIOD[ns|us|ms|i|t]: specify period to sample stream\n" \ -" concatenate multiple options. Default is ibxwpe or cewp\n" +" concatenate multiple options. Default is iybxwpe or cewp\n" static inline void itrace_synth_opts__set_time_range(struct itrace_synth_opts *opts, |