diff options
Diffstat (limited to 'Documentation/trace')
-rw-r--r-- | Documentation/trace/ftrace.rst | 132 | ||||
-rw-r--r-- | Documentation/trace/timerlat-tracer.rst | 78 |
2 files changed, 210 insertions, 0 deletions
diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 027437b745a0..f606c5bd1c0d 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -324,6 +324,12 @@ of ftrace. Here is a list of some of the key files: "set_graph_function", or "set_graph_notrace". (See the section "dynamic ftrace" below for more details.) + available_filter_functions_addrs: + + Similar to available_filter_functions, but with address displayed + for each function. The displayed address is the patch-site address + and can differ from /proc/kallsyms address. + dyn_ftrace_total_info: This file is for debugging purposes. The number of functions that @@ -1359,6 +1365,19 @@ Options for function_graph tracer: only a closing curly bracket "}" is displayed for the return of a function. + funcgraph-retval + When set, the return value of each traced function + will be printed after an equal sign "=". By default + this is off. + + funcgraph-retval-hex + When set, the return value will always be printed + in hexadecimal format. If the option is not set and + the return value is an error code, it will be printed + in signed decimal format; otherwise it will also be + printed in hexadecimal format. By default, this option + is off. + sleep-time When running function graph tracer, to include the time a task schedules out in its function. @@ -2704,6 +2723,119 @@ It is default disabled. 0) 1.757 us | } /* kmem_cache_free() */ 0) 2.861 us | } /* putname() */ +The return value of each traced function can be displayed after +an equal sign "=". When encountering system call failures, it +can be verfy helpful to quickly locate the function that first +returns an error code. + + - hide: echo nofuncgraph-retval > trace_options + - show: echo funcgraph-retval > trace_options + + Example with funcgraph-retval:: + + 1) | cgroup_migrate() { + 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ + 1) | cgroup_migrate_execute() { + 1) | cpu_cgroup_can_attach() { + 1) | cgroup_taskset_first() { + 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ + 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ + 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ + 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */ + 1) 4.369 us | } /* cgroup_migrate_execute = -22 */ + 1) 7.143 us | } /* cgroup_migrate = -22 */ + +The above example shows that the function cpu_cgroup_can_attach +returned the error code -22 firstly, then we can read the code +of this function to get the root cause. + +When the option funcgraph-retval-hex is not set, the return value can +be displayed in a smart way. Specifically, if it is an error code, +it will be printed in signed decimal format, otherwise it will +printed in hexadecimal format. + + - smart: echo nofuncgraph-retval-hex > trace_options + - hexadecimal: echo funcgraph-retval-hex > trace_options + + Example with funcgraph-retval-hex:: + + 1) | cgroup_migrate() { + 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ + 1) | cgroup_migrate_execute() { + 1) | cpu_cgroup_can_attach() { + 1) | cgroup_taskset_first() { + 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ + 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ + 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ + 1) 2.335 us | } /* cpu_cgroup_can_attach = 0xffffffea */ + 1) 4.369 us | } /* cgroup_migrate_execute = 0xffffffea */ + 1) 7.143 us | } /* cgroup_migrate = 0xffffffea */ + +At present, there are some limitations when using the funcgraph-retval +option, and these limitations will be eliminated in the future: + +- Even if the function return type is void, a return value will still + be printed, and you can just ignore it. + +- Even if return values are stored in multiple registers, only the + value contained in the first register will be recorded and printed. + To illustrate, in the x86 architecture, eax and edx are used to store + a 64-bit return value, with the lower 32 bits saved in eax and the + upper 32 bits saved in edx. However, only the value stored in eax + will be recorded and printed. + +- In certain procedure call standards, such as arm64's AAPCS64, when a + type is smaller than a GPR, it is the responsibility of the consumer + to perform the narrowing, and the upper bits may contain UNKNOWN values. + Therefore, it is advisable to check the code for such cases. For instance, + when using a u8 in a 64-bit GPR, bits [63:8] may contain arbitrary values, + especially when larger types are truncated, whether explicitly or implicitly. + Here are some specific cases to illustrate this point: + + **Case One**: + + The function narrow_to_u8 is defined as follows:: + + u8 narrow_to_u8(u64 val) + { + // implicitly truncated + return val; + } + + It may be compiled to:: + + narrow_to_u8: + < ... ftrace instrumentation ... > + RET + + If you pass 0x123456789abcdef to this function and want to narrow it, + it may be recorded as 0x123456789abcdef instead of 0xef. + + **Case Two**: + + The function error_if_not_4g_aligned is defined as follows:: + + int error_if_not_4g_aligned(u64 val) + { + if (val & GENMASK(31, 0)) + return -EINVAL; + + return 0; + } + + It could be compiled to:: + + error_if_not_4g_aligned: + CBNZ w0, .Lnot_aligned + RET // bits [31:0] are zero, bits + // [63:32] are UNKNOWN + .Lnot_aligned: + MOV x0, #-EINVAL + RET + + When passing 0x2_0000_0000 to it, the return value may be recorded as + 0x2_0000_0000 instead of 0. + You can put some comments on specific functions by using trace_printk() For example, if you want to put a comment inside the __might_sleep() function, you just have to include diff --git a/Documentation/trace/timerlat-tracer.rst b/Documentation/trace/timerlat-tracer.rst index db17df312bc8..53a56823e903 100644 --- a/Documentation/trace/timerlat-tracer.rst +++ b/Documentation/trace/timerlat-tracer.rst @@ -180,3 +180,81 @@ dummy_load_1ms_pd_init, which had the following code (on purpose):: return 0; } + +User-space interface +--------------------------- + +Timerlat allows user-space threads to use timerlat infra-structure to +measure scheduling latency. This interface is accessible via a per-CPU +file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd. + +This interface is accessible under the following conditions: + + - timerlat tracer is enable + - osnoise workload option is set to NO_OSNOISE_WORKLOAD + - The user-space thread is affined to a single processor + - The thread opens the file associated with its single processor + - Only one thread can access the file at a time + +The open() syscall will fail if any of these conditions are not met. +After opening the file descriptor, the user space can read from it. + +The read() system call will run a timerlat code that will arm the +timer in the future and wait for it as the regular kernel thread does. + +When the timer IRQ fires, the timerlat IRQ will execute, report the +IRQ latency and wake up the thread waiting in the read. The thread will be +scheduled and report the thread latency via tracer - as for the kernel +thread. + +The difference from the in-kernel timerlat is that, instead of re-arming +the timer, timerlat will return to the read() system call. At this point, +the user can run any code. + +If the application rereads the file timerlat file descriptor, the tracer +will report the return from user-space latency, which is the total +latency. If this is the end of the work, it can be interpreted as the +response time for the request. + +After reporting the total latency, timerlat will restart the cycle, arm +a timer, and go to sleep for the following activation. + +If at any time one of the conditions is broken, e.g., the thread migrates +while in user space, or the timerlat tracer is disabled, the SIG_KILL +signal will be sent to the user-space thread. + +Here is an basic example of user-space code for timerlat:: + + int main(void) + { + char buffer[1024]; + int timerlat_fd; + int retval; + long cpu = 0; /* place in CPU 0 */ + cpu_set_t set; + + CPU_ZERO(&set); + CPU_SET(cpu, &set); + + if (sched_setaffinity(gettid(), sizeof(set), &set) == -1) + return 1; + + snprintf(buffer, sizeof(buffer), + "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd", + cpu); + + timerlat_fd = open(buffer, O_RDONLY); + if (timerlat_fd < 0) { + printf("error opening %s: %s\n", buffer, strerror(errno)); + exit(1); + } + + for (;;) { + retval = read(timerlat_fd, buffer, 1024); + if (retval < 0) + break; + } + + close(timerlat_fd); + exit(0); + } |