diff options
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/itrace.txt | 3 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-bench.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-c2c.txt | 16 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-intel-pt.txt | 66 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-list.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-lock.txt | 11 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-mem.txt | 7 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-probe.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-report.txt | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-script-perl.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-script-python.txt | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-script.txt | 7 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-test.txt | 3 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-top.txt | 2 |
14 files changed, 104 insertions, 27 deletions
diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt index 0916bbfe64cb..a97f95825b14 100644 --- a/tools/perf/Documentation/itrace.txt +++ b/tools/perf/Documentation/itrace.txt @@ -1,4 +1,5 @@ i synthesize instructions events + y synthesize cycles events b synthesize branches events (branch misses for Arm SPE) c synthesize branches events (calls only) r synthesize branches events (returns only) @@ -25,7 +26,7 @@ A approximate IPC Z prefer to ignore timestamps (so-called "timeless" decoding) - The default is all events i.e. the same as --itrace=ibxwpe, + The default is all events i.e. the same as --itrace=iybxwpe, except for perf script where it is --itrace=ce In addition, the period (default 100000, except for perf script where it is 1) diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt index a0529c7fa5ef..f04f0eaded98 100644 --- a/tools/perf/Documentation/perf-bench.txt +++ b/tools/perf/Documentation/perf-bench.txt @@ -18,7 +18,7 @@ COMMON OPTIONS -------------- -r:: --repeat=:: -Specify amount of times to repeat the run (default 10). +Specify number of times to repeat the run (default 10). -f:: --format=:: diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt index 5c5eb2def83e..856f0dfb8e5a 100644 --- a/tools/perf/Documentation/perf-c2c.txt +++ b/tools/perf/Documentation/perf-c2c.txt @@ -22,7 +22,11 @@ you to track down the cacheline contentions. On Intel, the tool is based on load latency and precise store facility events provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware -limitations, perf c2c is not supported on Zen3 cpus). +limitations, perf c2c is not supported on Zen3 cpus). On Arm64 it uses SPE to +sample load and store operations, therefore hardware and kernel support is +required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the +statistical nature of Arm SPE sampling, not every memory operation will be +sampled. These events provide: - memory address of the access @@ -121,11 +125,17 @@ REPORT OPTIONS perf c2c record --call-graph lbr. Disabled by default. In common cases with call stack overflows, it can recreate better call stacks than the default lbr call stack - output. But this approach is not full proof. There can be cases + output. But this approach is not foolproof. There can be cases where it creates incorrect call stacks from incorrect matches. The known limitations include exception handing such as setjmp/longjmp will have calls/returns not match. +--double-cl:: + Group the detection of shared cacheline events into double cacheline + granularity. Some architectures have an Adjacent Cacheline Prefetch + feature, which causes cacheline sharing to behave like the cacheline + size is doubled. + C2C RECORD ---------- The perf c2c record command setup options related to HITM cacheline analysis @@ -333,4 +343,4 @@ Check Joe's blog on c2c tool for detailed use case explanation: SEE ALSO -------- -linkperf:perf-record[1], linkperf:perf-mem[1] +linkperf:perf-record[1], linkperf:perf-mem[1], linkperf:perf-arm-spe[1] diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt index 7b6ccd2fa3bf..4c90cc176f81 100644 --- a/tools/perf/Documentation/perf-intel-pt.txt +++ b/tools/perf/Documentation/perf-intel-pt.txt @@ -101,12 +101,12 @@ data is available you can use the 'perf script' tool with all itrace sampling options, which will list all the samples. perf record -e intel_pt//u ls - perf script --itrace=ibxwpe + perf script --itrace=iybxwpe An interesting field that is not printed by default is 'flags' which can be displayed as follows: - perf script --itrace=ibxwpe -F+flags + perf script --itrace=iybxwpe -F+flags The flags are "bcrosyiABExghDt" which stand for branch, call, return, conditional, system, asynchronous, interrupt, transaction abort, trace begin, trace end, @@ -147,16 +147,17 @@ displayed as follows: There are two ways that instructions-per-cycle (IPC) can be calculated depending on the recording. -If the 'cyc' config term (see config terms section below) was used, then IPC is -calculated using the cycle count from CYC packets, otherwise MTC packets are -used - refer to the 'mtc' config term. When MTC is used, however, the values -are less accurate because the timing is less accurate. +If the 'cyc' config term (see config terms section below) was used, then IPC +and cycle events are calculated using the cycle count from CYC packets, otherwise +MTC packets are used - refer to the 'mtc' config term. When MTC is used, however, +the values are less accurate because the timing is less accurate. Because Intel PT does not update the cycle count on every branch or instruction, the values will often be zero. When there are values, they will be the number of instructions and number of cycles since the last update, and thus represent -the average IPC since the last IPC for that event type. Note IPC for "branches" -events is calculated separately from IPC for "instructions" events. +the average IPC cycle count since the last IPC for that event type. +Note IPC for "branches" events is calculated separately from IPC for "instructions" +events. Even with the 'cyc' config term, it is possible to produce IPC information for every change of timestamp, but at the expense of accuracy. That is selected by @@ -900,11 +901,12 @@ Having no option is the same as which, in turn, is the same as - --itrace=cepwx + --itrace=cepwxy The letters are: i synthesize "instructions" events + y synthesize "cycles" events b synthesize "branches" events x synthesize "transactions" events w synthesize "ptwrite" events @@ -927,6 +929,16 @@ The letters are: "Instructions" events look like they were recorded by "perf record -e instructions". +"Cycles" events look like they were recorded by "perf record -e cycles" +(ie., the default). Note that even with CYC packets enabled and no sampling, +these are not fully accurate, since CYC packets are not emitted for each +instruction, only when some other event (like an indirect branch, or a +TNT packet representing multiple branches) happens causes a packet to +be emitted. Thus, it is more effective for attributing cycles to functions +(and possibly basic blocks) than to individual instructions, although it +is not even perfect for functions (although it becomes better if the noretcomp +option is active). + "Branches" events look like they were recorded by "perf record -e branches". "c" and "r" can be combined to get calls and returns. @@ -934,9 +946,9 @@ and "r" can be combined to get calls and returns. 'flags' field can be used in perf script to determine whether the event is a transaction start, commit or abort. -Note that "instructions", "branches" and "transactions" events depend on code -flow packets which can be disabled by using the config term "branch=0". Refer -to the config terms section above. +Note that "instructions", "cycles", "branches" and "transactions" events +depend on code flow packets which can be disabled by using the config term +"branch=0". Refer to the config terms section above. "ptwrite" events record the payload of the ptwrite instruction and whether "fup_on_ptw" was used. "ptwrite" events depend on PTWRITE packets which are @@ -1821,6 +1833,36 @@ Can be compiled and traced: $ +Pipe mode +--------- +Pipe mode is a problem for Intel PT and possibly other auxtrace users. +It's not recommended to use a pipe as data output with Intel PT because +of the following reason. + +Essentially the auxtrace buffers do not behave like the regular perf +event buffers. That is because the head and tail are updated by +software, but in the auxtrace case the data is written by hardware. +So the head and tail do not get updated as data is written. + +In the Intel PT case, the head and tail are updated only when the trace +is disabled by software, for example: + - full-trace, system wide : when buffer passes watermark + - full-trace, not system-wide : when buffer passes watermark or + context switches + - snapshot mode : as above but also when a snapshot is made + - sample mode : as above but also when a sample is made + +That means finished-round ordering doesn't work. An auxtrace buffer +can turn up that has data that extends back in time, possibly to the +very beginning of tracing. + +For a perf.data file, that problem is solved by going through the trace +and queuing up the auxtrace buffers in advance. + +For pipe mode, the order of events and timestamps can presumably +be messed up. + + EXAMPLE ------- diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index c5a3cb0f57c7..d5f78e125efe 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -232,7 +232,7 @@ This can be overridden by setting the kernel.perf_event_paranoid sysctl to -1, which allows non root to use these events. For accessing trace point events perf needs to have read access to -/sys/kernel/debug/tracing, even when perf_event_paranoid is in a relaxed +/sys/kernel/tracing, even when perf_event_paranoid is in a relaxed setting. TRACING diff --git a/tools/perf/Documentation/perf-lock.txt b/tools/perf/Documentation/perf-lock.txt index 0f9f720e599d..37aae194a2a1 100644 --- a/tools/perf/Documentation/perf-lock.txt +++ b/tools/perf/Documentation/perf-lock.txt @@ -172,6 +172,11 @@ CONTENTION OPTIONS --lock-addr:: Show lock contention stat by address +-o:: +--lock-owner:: + Show lock contention stat by owners. Implies --threads and + requires --use-bpf. + -Y:: --type-filter=<value>:: Show lock contention only for given lock types (comma separated list). @@ -187,6 +192,12 @@ CONTENTION OPTIONS --lock-filter=<value>:: Show lock contention only for given lock addresses or names (comma separated list). +-S:: +--callstack-filter=<value>:: + Show lock contention only if the callstack contains the given string. + Note that it matches the substring so 'rq' would match both 'raw_spin_rq_lock' + and 'irq_enter_rcu'. + SEE ALSO -------- diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt index 005c95580b1e..19862572e3f2 100644 --- a/tools/perf/Documentation/perf-mem.txt +++ b/tools/perf/Documentation/perf-mem.txt @@ -23,6 +23,11 @@ Note that on Intel systems the memory latency reported is the use-latency, not the pure load (or store latency). Use latency includes any pipeline queueing delays in addition to the memory subsystem latency. +On Arm64 this uses SPE to sample load and store operations, therefore hardware +and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide. +Due to the statistical nature of SPE sampling, not every memory operation will +be sampled. + OPTIONS ------- <command>...:: @@ -93,4 +98,4 @@ all perf record options. SEE ALSO -------- -linkperf:perf-record[1], linkperf:perf-report[1] +linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1] diff --git a/tools/perf/Documentation/perf-probe.txt b/tools/perf/Documentation/perf-probe.txt index 7f8e8ba3a787..5c43a6edc0e5 100644 --- a/tools/perf/Documentation/perf-probe.txt +++ b/tools/perf/Documentation/perf-probe.txt @@ -222,7 +222,7 @@ probe syntax, 'SRC' means the source file path, 'ALN' is start line number, and 'ALN2' is end line number in the file. It is also possible to specify how many lines to show by using 'NUM'. Moreover, 'FUNC@SRC' combination is good for searching a specific function when several functions share same name. -So, "source.c:100-120" shows lines between 100th to l20th in source.c file. And "func:10+20" shows 20 lines from 10th line of func function. +So, "source.c:100-120" shows lines between 100th to 120th in source.c file. And "func:10+20" shows 20 lines from 10th line of func function. LAZY MATCHING ------------- diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 4fa509b15948..c242e8da6b1a 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -115,6 +115,8 @@ OPTIONS - p_stage_cyc: On powerpc, this presents the number of cycles spent in a pipeline stage. And currently supported only on powerpc. - addr: (Full) virtual address of the sampled instruction + - retire_lat: On X86, this reports pipeline stall of this instruction compared + to the previous instruction in cycles. And currently supported only on X86 By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol) @@ -507,7 +509,7 @@ include::itrace.txt[] perf record --call-graph lbr. Disabled by default. In common cases with call stack overflows, it can recreate better call stacks than the default lbr call stack - output. But this approach is not full proof. There can be cases + output. But this approach is not foolproof. There can be cases where it creates incorrect call stacks from incorrect matches. The known limitations include exception handing such as setjmp/longjmp will have calls/returns not match. diff --git a/tools/perf/Documentation/perf-script-perl.txt b/tools/perf/Documentation/perf-script-perl.txt index fa4f39d305a7..5b479f5e62ff 100644 --- a/tools/perf/Documentation/perf-script-perl.txt +++ b/tools/perf/Documentation/perf-script-perl.txt @@ -55,7 +55,7 @@ Traces meant to be processed using a script should be recorded with the above option: -a to enable system-wide collection. The format file for the sched_wakeup event defines the following fields -(see /sys/kernel/debug/tracing/events/sched/sched_wakeup/format): +(see /sys/kernel/tracing/events/sched/sched_wakeup/format): ---- format: diff --git a/tools/perf/Documentation/perf-script-python.txt b/tools/perf/Documentation/perf-script-python.txt index cf4b7f4b625a..6a8581012e16 100644 --- a/tools/perf/Documentation/perf-script-python.txt +++ b/tools/perf/Documentation/perf-script-python.txt @@ -319,7 +319,7 @@ So those are the essential steps in writing and running a script. The process can be generalized to any tracepoint or set of tracepoints you're interested in - basically find the tracepoint(s) you're interested in by looking at the list of available events shown by -'perf list' and/or look in /sys/kernel/debug/tracing/events/ for +'perf list' and/or look in /sys/kernel/tracing/events/ for detailed event and field info, record the corresponding trace data using 'perf record', passing it the list of interesting events, generate a skeleton script using 'perf script -g python' and modify the @@ -449,7 +449,7 @@ Traces meant to be processed using a script should be recorded with the above option: -a to enable system-wide collection. The format file for the sched_wakeup event defines the following fields -(see /sys/kernel/debug/tracing/events/sched/sched_wakeup/format): +(see /sys/kernel/tracing/events/sched/sched_wakeup/format): ---- format: diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 68e37de5fae4..777a0d8ba7d1 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -134,7 +134,7 @@ OPTIONS srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output, brstackinsn, brstackinsnlen, brstackoff, callindent, insn, insnlen, synth, phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat, - machine_pid, vcpu. + machine_pid, vcpu, cgroup, retire_lat. Field list can be prepended with the type, trace, sw or hw, to indicate to which event type the field list applies. e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace @@ -231,6 +231,9 @@ OPTIONS perf inject to insert a perf.data file recorded inside a virtual machine into a perf.data file recorded on the host at the same time. + The cgroup fields requires sample having the cgroup id which is saved + when "--all-cgroups" option is passed to 'perf record'. + Finally, a user may not set fields to none for all event types. i.e., -F "" is not allowed. @@ -502,7 +505,7 @@ include::itrace.txt[] perf record --call-graph lbr. Disabled by default. In common cases with call stack overflows, it can recreate better call stacks than the default lbr call stack - output. But this approach is not full proof. There can be cases + output. But this approach is not foolproof. There can be cases where it creates incorrect call stacks from incorrect matches. The known limitations include exception handing such as setjmp/longjmp will have calls/returns not match. diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt index b329c65d7f40..951a2f262872 100644 --- a/tools/perf/Documentation/perf-test.txt +++ b/tools/perf/Documentation/perf-test.txt @@ -34,3 +34,6 @@ OPTIONS -F:: --dont-fork:: Do not fork child for each test, run all tests within single process. + +--dso:: + Specify a DSO for the "Symbols" test. diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index e534d709cc5a..c60e615b7183 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -334,7 +334,7 @@ use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'. callgraph. The option must be used with --call-graph lbr recording. Disabled by default. In common cases with call stack overflows, it can recreate better call stacks than the default lbr call stack - output. But this approach is not full proof. There can be cases + output. But this approach is not foolproof. There can be cases where it creates incorrect call stacks from incorrect matches. The known limitations include exception handing such as setjmp/longjmp will have calls/returns not match. |