linux - linux

	Commit message (Collapse)	Author	Files	Lines
2024-11-13	samples/bpf: Remove unused variables in tc_l2_redirect_kern.c	Zhu Jun	1	-4/+0
	These variables are never referenced in the code, just remove them. Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111062312.3541-1-zhujun2@cmss.chinamobile.com
2024-11-13	bpftool: Cast variable `var` to long long	Luo Yifan	1	-1/+1
	When the SIGNED condition is met, the variable `var` should be cast to `long long` instead of `unsigned long long`. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20241112073701.283362-1-luoyifan@cmss.chinamobile.com
2024-11-13	bpf, x86: Propagate tailcall info only for subprogs	Leon Hwang	1	-1/+1
	In x64 JIT, propagate tailcall info only for subprogs, not for helpers or kfuncs. Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20241107134529.8602-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf: Add kernel symbol for struct_ops trampoline	Xu Kuohai	4	-5/+89
	Without kernel symbols for struct_ops trampoline, the unwinder may produce unexpected stacktraces. For example, the x86 ORC and FP unwinders check if an IP is in kernel text by verifying the presence of the IP's kernel symbol. When a struct_ops trampoline address is encountered, the unwinder stops due to the absence of symbol, resulting in an incomplete stacktrace that consists only of direct and indirect child functions called from the trampoline. The arm64 unwinder is another example. While the arm64 unwinder can proceed across a struct_ops trampoline address, the corresponding symbol name is displayed as "unknown", which is confusing. Thus, add kernel symbol for struct_ops trampoline. The name is bpf__<struct_ops_name>_<member_name>, where <struct_ops_name> is the type name of the struct_ops, and <member_name> is the name of the member that the trampoline is linked to. Below is a comparison of stacktraces captured on x86 by perf record, before and after this patch. Before: ffffffff8116545d __lock_acquire+0xad ([kernel.kallsyms]) ffffffff81167fcc lock_acquire+0xcc ([kernel.kallsyms]) ffffffff813088f4 __bpf_prog_enter+0x34 ([kernel.kallsyms]) After: ffffffff811656bd __lock_acquire+0x30d ([kernel.kallsyms]) ffffffff81167fcc lock_acquire+0xcc ([kernel.kallsyms]) ffffffff81309024 __bpf_prog_enter+0x34 ([kernel.kallsyms]) ffffffffc000d7e9 bpf__tcp_congestion_ops_cong_avoid+0x3e ([kernel.kallsyms]) ffffffff81f250a5 tcp_ack+0x10d5 ([kernel.kallsyms]) ffffffff81f27c66 tcp_rcv_established+0x3b6 ([kernel.kallsyms]) ffffffff81f3ad03 tcp_v4_do_rcv+0x193 ([kernel.kallsyms]) ffffffff81d65a18 __release_sock+0xd8 ([kernel.kallsyms]) ffffffff81d65af4 release_sock+0x34 ([kernel.kallsyms]) ffffffff81f15c4b tcp_sendmsg+0x3b ([kernel.kallsyms]) ffffffff81f663d7 inet_sendmsg+0x47 ([kernel.kallsyms]) ffffffff81d5ab40 sock_write_iter+0x160 ([kernel.kallsyms]) ffffffff8149c67b vfs_write+0x3fb ([kernel.kallsyms]) ffffffff8149caf6 ksys_write+0xc6 ([kernel.kallsyms]) ffffffff8149cb5d __x64_sys_write+0x1d ([kernel.kallsyms]) ffffffff81009200 x64_sys_call+0x1d30 ([kernel.kallsyms]) ffffffff82232d28 do_syscall_64+0x68 ([kernel.kallsyms]) ffffffff8240012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms]) Fixes: 85d33df357b6 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112145849.3436772-4-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf: Use function pointers count as struct_ops links count	Xu Kuohai	1	-10/+25
	Only function pointers in a struct_ops structure can be linked to bpf progs, so set the links count to the function pointers count, instead of the total members count in the structure. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20241112145849.3436772-3-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf: Remove unused member rcu from bpf_struct_ops_map	Xu Kuohai	1	-1/+0
	The rcu member in bpf_struct_ops_map is not used after commit b671c2067a04 ("bpf: Retire the struct_ops map kvalue->refcnt.") Remove it. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20241112145849.3436772-2-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	selftests/bpf: Add struct_ops prog private stack tests	Yonghong Song	6	-0/+389
	Add three tests for struct_ops using private stack. ./test_progs -t struct_ops_private_stack #336/1 struct_ops_private_stack/private_stack:OK #336/2 struct_ops_private_stack/private_stack_fail:OK #336/3 struct_ops_private_stack/private_stack_recur:OK #336 struct_ops_private_stack:OK The following is a snippet of a struct_ops check_member() implementation: u32 moff = __btf_member_bit_offset(t, member) / 8; switch (moff) { case offsetof(struct bpf_testmod_ops3, test_1): prog->aux->priv_stack_requested = true; prog->aux->recursion_detected = test_1_recursion_detected; fallthrough; default: break; } return 0; The first test is with nested two different callback functions where the first prog has more than 512 byte stack size (including subprogs) with private stack enabled. The second test is a negative test where the second prog has more than 512 byte stack size without private stack enabled. The third test is the same callback function recursing itself. At run time, the jit trampoline recursion check kicks in to prevent the recursion. The recursion_detected() callback function is implemented by the bpf_testmod, the following message in dmesg bpf_testmod: oh no, recursing into test_1, recursion_misses 1 demonstrates the callback function is indeed triggered when recursion miss happens. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163938.2225528-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf: Support private stack for struct_ops progs	Yonghong Song	4	-1/+13
	For struct_ops progs, whether a particular prog uses private stack depends on prog->aux->priv_stack_requested setting before actual insn-level verification for that prog. One particular implementation is to piggyback on struct_ops->check_member(). The next patch has an example for this. The struct_ops->check_member() sets prog->aux->priv_stack_requested to be true which enables private stack usage. The struct_ops prog follows the same rule as kprobe/tracing progs after function bpf_enable_priv_stack(). For example, even a struct_ops prog requests private stack, it could still use normal kernel stack if the stack size is small (< 64 bytes). Similar to tracing progs, nested same cpu same prog run will be skipped. A field (recursion_detected()) is added to bpf_prog_aux structure. If bpf_prog->aux->recursion_detected is implemented by the struct_ops subsystem and nested same cpu/prog happens, the function will be triggered to report an error, collect related info, etc. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163933.2224962-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	selftests/bpf: Add tracing prog private stack tests	Yonghong Song	2	-0/+274
	Some private stack tests are added including: - main prog only with stack size greater than BPF_PSTACK_MIN_SIZE. - main prog only with stack size smaller than BPF_PSTACK_MIN_SIZE. - prog with one subprog having MAX_BPF_STACK stack size and another subprog having non-zero small stack size. - prog with callback function. - prog with exception in main prog or subprog. - prog with async callback without nesting - prog with async callback with possible nesting Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163927.2224750-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf, x86: Support private stack in jit	Yonghong Song	2	-0/+137
	Private stack is allocated in function bpf_int_jit_compile() with alignment 8. Private stack allocation size includes the stack size determined by verifier and additional space to protect stack overflow and underflow. See below an illustration: ---> memory address increasing [8 bytes to protect overflow] [normal stack] [8 bytes to protect underflow] If overflow/underflow is detected, kernel messages will be emited in dmesg like BPF private stack overflow/underflow detected for prog Fx BPF Private stack overflow/underflow detected for prog bpf_prog_a41699c234a1567a_subprog1x Those messages are generated when I made some changes to jitted code to intentially cause overflow for some progs. For the jited prog, The x86 register 9 (X86_REG_R9) is used to replace bpf frame register (BPF_REG_10). The private stack is used per subprog per cpu. The X86_REG_R9 is saved and restored around every func call (not including tailcall) to maintain correctness of X86_REG_R9. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163922.2224385-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf, x86: Avoid repeated usage of bpf_prog->aux->stack_depth	Yonghong Song	1	-4/+7
	Refactor the code to avoid repeated usage of bpf_prog->aux->stack_depth in do_jit() func. If the private stack is used, the stack_depth will be 0 for that prog. Refactoring make it easy to adjust stack_depth. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163917.2224189-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf: Enable private stack for eligible subprogs	Yonghong Song	2	-0/+12
	If private stack is used by any subprog, set that subprog prog->aux->jits_use_priv_stack to be true so later jit can allocate private stack for that subprog properly. Also set env->prog->aux->jits_use_priv_stack to be true if any subprog uses private stack. This is a use case for a single main prog (no subprogs) to use private stack, and also a use case for later struct-ops progs where env->prog->aux->jits_use_priv_stack will enable recursion check if any subprog uses private stack. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163912.2224007-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13	bpf: Find eligible subprogs for private stack support	Yonghong Song	4	-10/+99
	Private stack will be allocated with percpu allocator in jit time. To avoid complexity at runtime, only one copy of private stack is available per cpu per prog. So runtime recursion check is necessary to avoid stack corruption. Current private stack only supports kprobe/perf_event/tp/raw_tp which has recursion check in the kernel, and prog types that use bpf trampoline recursion check. For trampoline related prog types, currently only tracing progs have recursion checking. To avoid complexity, all async_cb subprogs use normal kernel stack including those subprogs used by both main prog subtree and async_cb subtree. Any prog having tail call also uses kernel stack. To avoid jit penalty with private stack support, a subprog stack size threshold is set such that only if the stack size is no less than the threshold, private stack is supported. The current threshold is 64 bytes. This avoids jit penality if the stack usage is small. A useless 'continue' is also removed from a loop in func check_max_stack_depth(). Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241112163907.2223839-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: update send_signal to lower perf evemts frequency	Eduard Zingerman	1	-1/+2
	Similar to commit [1] sample perf events less often in test_send_signal_nmi(). This should reduce perf events throttling. [1] 7015843afcaf ("selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: allow send_signal test to timeout	Eduard Zingerman	1	-13/+19
	The following invocation: $ t1=send_signal/send_signal_perf_thread_remote \ t2=send_signal/send_signal_nmi_thread_remote \ ./test_progs -t $t1,$t2 Leads to send_signal_nmi_thread_remote to be stuck on a line 180: /* wait for result */ err = read(pipe_c2p[0], buf, 1); In this test case: - perf event PERF_COUNT_HW_CPU_CYCLES is created for parent process; - BPF program is attached to perf event, and sends a signal to child process when event occurs; - parent program burns some CPU in busy loop and calls read() to get notification from child that it received a signal. The perf event is declared with .sample_period = 1. This forces perf to throttle events, and under some unclear conditions the event does not always occur while parent is in busy loop. After parent enters read() system call CPU cycles event won't be generated for parent anymore. Thus, if perf event had not occurred already the test is stuck. This commit updates the parent to wait for notification with a timeout, doing several iterations of busy loop + read_with_timeout(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: add read_with_timeout() utility function	Eduard Zingerman	3	-0/+29
	int read_with_timeout(int fd, char *buf, size_t count, long usec) As a regular read(2), but allows to specify a timeout in micro-seconds. Returns -EAGAIN on timeout. Implemented using select(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	selftests/bpf: watchdog timer for test_progs	Eduard Zingerman	4	-6/+116
	This commit provides a watchdog timer that sets a limit of how long a single sub-test could run: - if sub-test runs for 10 seconds, the name of the test is printed (currently the name of the test is printed only after it finishes); - if sub-test runs for 120 seconds, the running thread is terminated with SIGSEGV (to trigger crash_handler() and get a stack trace). Specifically: - the timer is armed on each call to run_one_test(); - re-armed at each call to test__start_subtest(); - is stopped when exiting run_one_test(). Default timeout could be overridden using '-w' or '--watchdog-timeout' options. Value 0 can be used to turn the timer off. Here is an example execution: $ ./ssh-exec.sh ./test_progs -w 5 -t \ send_signal/send_signal_perf_thread_remote,send_signal/send_signal_nmi_thread_remote WATCHDOG: test case send_signal/send_signal_nmi_thread_remote executes for 5 seconds, terminating with SIGSEGV Caught signal #11! Stack trace: ./test_progs(crash_handler+0x1f)[0x9049ef] /lib64/libc.so.6(+0x40d00)[0x7f1f1184fd00] /lib64/libc.so.6(read+0x4a)[0x7f1f1191cc4a] ./test_progs[0x720dd3] ./test_progs[0x71ef7a] ./test_progs(test_send_signal+0x1db)[0x71edeb] ./test_progs[0x9066c5] ./test_progs(main+0x5ed)[0x9054ad] /lib64/libc.so.6(+0x2a088)[0x7f1f11839088] /lib64/libc.so.6(__libc_start_main+0x8b)[0x7f1f1183914b] ./test_progs(_start+0x25)[0x527385] #292 send_signal:FAIL test_send_signal_common:PASS:reading pipe 0 nsec test_send_signal_common:PASS:reading pipe error: size 0 0 nsec test_send_signal_common:PASS:incorrect result 0 nsec test_send_signal_common:PASS:pipe_write 0 nsec test_send_signal_common:PASS:setpriority 0 nsec Timer is implemented using timer_{create,start} librt API. Internally librt uses pthreads for SIGEV_THREAD timers, so this change adds a background timer thread to the test process. Because of this a few checks in tests 'bpf_iter' and 'iters' need an update to account for an extra thread. For parallelized scenario the watchdog is also created for each worker fork. If one of the workers gets stuck, it would be terminated by a watchdog. In theory, this might lead to a scenario when all worker threads are exhausted, however this should not be a problem for server_main(), as it would exit with some of the tests not run. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241112110906.3045278-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-12	libbpf: Stringify errno in log messages in the remaining code	Mykyta Yatsenko	6	-53/+56
	Convert numeric error codes into the string representations in log messages in the rest of libbpf source files. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-5-mykyta.yatsenko5@gmail.com
2024-11-12	libbpf: Stringify errno in log messages in btf*.c	Mykyta Yatsenko	2	-13/+16
	Convert numeric error codes into the string representations in log messages in btf.c and btf_dump.c. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-4-mykyta.yatsenko5@gmail.com
2024-11-12	libbpf: Stringify errno in log messages in libbpf.c	Mykyta Yatsenko	1	-200/+156
	Convert numeric error codes into the string representations in log messages in libbpf.c. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-3-mykyta.yatsenko5@gmail.com
2024-11-12	libbpf: Introduce errstr() for stringifying errno	Mykyta Yatsenko	2	-0/+78
	Add function errstr(int err) that allows converting numeric error codes into string representations. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111212919.368971-2-mykyta.yatsenko5@gmail.com
2024-11-12	bpf: Replace the document for PTR_TO_BTF_ID_OR_NULL	Menglong Dong	1	-4/+4
	Commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX \| PTR_MAYBE_NULL") moved the fields around and misplaced the documentation for "PTR_TO_BTF_ID_OR_NULL". So, let's replace it in the proper place. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111124911.1436911-1-dongml2@chinatelecom.cn
2024-11-12	tools/bpf: Fix the wrong format specifier in bpf_jit_disasm	Luo Yifan	1	-1/+1
	There is a static checker warning that the %d in format string is mismatched with the corresponding argument type, which could result in incorrect printed data. This patch fixes it. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111021004.272293-1-luoyifan@cmss.chinamobile.com
2024-11-12	kbuild,bpf: Pass make jobs' value to pahole	Florian Schmaus	1	-2/+4
	Pass the value of make's -j/--jobs argument to pahole, to avoid out of memory errors and make pahole respect the "jobs" value of make. On systems with little memory but many cores, invoking pahole using -j without argument potentially creates too many pahole instances, causing an out-of-memory situation. Instead, we should pass make's "jobs" value as an argument to pahole's -j, which is likely configured to be (much) lower than the actual core count on such systems. If make was invoked without -j, either via cmdline or MAKEFLAGS, then JOBS will be simply empty, resulting in the existing behavior, as expected. Signed-off-by: Florian Schmaus <flo@geekplace.eu> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Link: https://lore.kernel.org/bpf/20241102100452.793970-1-flo@geekplace.eu
2024-11-11	bpf: Drop special callback reference handling	Kumar Kartikeya Dwivedi	3	-39/+11
	Logic to prevent callbacks from acquiring new references for the program (i.e. leaving acquired references), and releasing caller references (i.e. those acquired in parent frames) was introduced in commit 9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks"). This was necessary because back then, the verifier simulated each callback once (that could potentially be executed N times, where N can be zero). This meant that callbacks that left lingering resources or cleared caller resources could do it more than once, operating on undefined state or leaking memory. With the fixes to callback verification in commit ab5cfac139ab ("bpf: verify callbacks as if they are called unknown number of times"), all of this extra logic is no longer necessary. Hence, drop it as part of this commit. Cc: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241109231430.2475236-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	bpf: Refactor active lock management	Kumar Kartikeya Dwivedi	2	-67/+132
	When bpf_spin_lock was introduced originally, there was deliberation on whether to use an array of lock IDs, but since bpf_spin_lock is limited to holding a single lock at any given time, we've been using a single ID to identify the held lock. In preparation for introducing spin locks that can be taken multiple times, introduce support for acquiring multiple lock IDs. For this purpose, reuse the acquired_refs array and store both lock and pointer references. We tag the entry with REF_TYPE_PTR or REF_TYPE_LOCK to disambiguate and find the relevant entry. The ptr field is used to track the map_ptr or btf (for bpf_obj_new allocations) to ensure locks can be matched with protected fields within the same "allocation", i.e. bpf_obj_new object or map value. The struct active_lock is changed to an int as the state is part of the acquired_refs array, and we only need active_lock as a cheap way of detecting lock presence. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241109231430.2475236-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: skip the timer_lockup test for single-CPU nodes	Viktor Malik	1	-0/+6
	The timer_lockup test needs 2 CPUs to work, on single-CPU nodes it fails to set thread affinity to CPU 1 since it doesn't exist: # ./test_progs -t timer_lockup test_timer_lockup:PASS:timer_lockup__open_and_load 0 nsec test_timer_lockup:PASS:pthread_create thread1 0 nsec test_timer_lockup:PASS:pthread_create thread2 0 nsec timer_lockup_thread:PASS:cpu affinity 0 nsec timer_lockup_thread:FAIL:cpu affinity unexpected error: 22 (errno 0) test_timer_lockup:PASS: 0 nsec #406 timer_lockup:FAIL Skip the test if only 1 CPU is available. Signed-off-by: Viktor Malik <vmalik@redhat.com> Fixes: 50bd5a0c658d1 ("selftests/bpf: Add timer lockup selftest") Tested-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241107115231.75200-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Test the update operations for htab of maps	Hou Tao	2	-1/+161
	Add test cases to verify the following four update operations on htab of maps don't trigger lockdep warning: (1) add then delete (2) add, overwrite, then delete (3) add, then lookup_and_delete (4) add two elements, then lookup_and_delete_batch Test cases are added for pre-allocated and non-preallocated htab of maps respectively. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241106063542.357743-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Move ENOTSUPP from bpf_util.h	Hou Tao	6	-20/+3
	Moving the definition of ENOTSUPP into bpf_util.h to remove the duplicated definitions in multiple files. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241106063542.357743-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	bpf: Call free_htab_elem() after htab_unlock_bucket()	Hou Tao	1	-17/+39
	For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_free_id() to free the id of the removed map element. However, bpf_map_fd_put_ptr() is invoked while holding a bucket lock (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock (spinlock_t), triggering the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.11.0-rc4+ #49 Not tainted ----------------------------- test_maps/4881 is trying to lock: ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 other info that might help us debug this: context-{5:5} 2 locks held by test_maps/4881: #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 stack backtrace: CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x73e/0x36c0 lock_acquire+0x182/0x450 _raw_spin_lock_irqsave+0x43/0x70 bpf_map_free_id.part.0+0x21/0x70 bpf_map_put+0xcf/0x110 bpf_map_fd_put_ptr+0x9a/0xb0 free_htab_elem+0x69/0xe0 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 bpf_map_update_value+0x266/0x380 __sys_bpf+0x21bb/0x36b0 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e One way to fix the lockdep warning is using raw_spinlock_t for map_idr_lock as well. However, bpf_map_alloc_id() invokes idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a similar lockdep warning because the slab's lock (s->cpu_slab->lock) is still a spinlock. Instead of changing map_idr_lock's type, fix the issue by invoking htab_put_fd_value() after htab_unlock_bucket(). However, only deferring the invocation of htab_put_fd_value() is not enough, because the old map pointers in htab of maps can not be saved during batched deletion. Therefore, also defer the invocation of free_htab_elem(), so these to-be-freed elements could be linked together similar to lru map. There are four callers for ->map_fd_put_ptr: (1) alloc_htab_elem() (through htab_put_fd_value()) It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of htab_put_fd_value() can not simply move after htab_unlock_bucket(), because the old element has already been stashed in htab->extra_elems. It may be reused immediately after htab_unlock_bucket() and the invocation of htab_put_fd_value() after htab_unlock_bucket() may release the newly-added element incorrectly. Therefore, saving the map pointer of the old element for htab of maps before unlocking the bucket and releasing the map_ptr after unlock. Beside the map pointer in the old element, should do the same thing for the special fields in the old element as well. (2) free_htab_elem() (through htab_put_fd_value()) Its caller includes __htab_map_lookup_and_delete_elem(), htab_map_delete_elem() and __htab_map_lookup_and_delete_batch(). For htab_map_delete_elem(), simply invoke free_htab_elem() after htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just like lru map, linking the to-be-freed element into node_to_free list and invoking free_htab_elem() for these element after unlock. It is safe to reuse batch_flink as the link for node_to_free, because these elements have been removed from the hash llist. Because htab of maps doesn't support lookup_and_delete operation, __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept it as is. (3) fd_htab_map_free() It invokes ->map_fd_put_ptr without raw_spinlock_t. (4) bpf_fd_htab_map_update_elem() It invokes ->map_fd_put_ptr without raw_spinlock_t. After moving free_htab_elem() outside htab bucket lock scope, using pcpu_freelist_push() instead of __pcpu_freelist_push() to disable the irq before freeing elements, and protecting the invocations of bpf_mem_cache_free() with migrate_{disable\|enable} pair. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241106063542.357743-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Add threads to consumer test	Jiri Olsa	1	-18/+80
	With recent uprobe fix [1] the sync time after unregistering uprobe is much longer and prolongs the consumer test which creates and destroys hundreds of uprobes. This change adds 16 threads (which fits the test logic) and speeds up the test. Before the change: # perf stat --null ./test_progs -t uprobe_multi_test/consumers #421/9 uprobe_multi_test/consumers:OK #421 uprobe_multi_test:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Performance counter stats for './test_progs -t uprobe_multi_test/consumers': 28.818778973 seconds time elapsed 0.745518000 seconds user 0.919186000 seconds sys After the change: # perf stat --null ./test_progs -t uprobe_multi_test/consumers 2>&1 #421/9 uprobe_multi_test/consumers:OK #421 uprobe_multi_test:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Performance counter stats for './test_progs -t uprobe_multi_test/consumers': 3.504790814 seconds time elapsed 0.012141000 seconds user 0.751760000 seconds sys [1] commit 87195a1ee332 ("uprobes: switch to RCU Tasks Trace flavor for better performance") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-14-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe sessions to consumer test	Jiri Olsa	2	-24/+52
	Adding uprobe session consumers to the consumer test, so we get the session into the test mix. In addition scaling down the test to have just 1 uprobe and 1 uretprobe, otherwise the test time grows and is unsuitable for CI even with threads. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-13-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session single consumer test	Jiri Olsa	2	-0/+77
	Testing that the session ret_handler bypass works on single uprobe with multiple consumers, each with different session ignore return value. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-12-jolsa@kernel.org
2024-11-11	selftests/bpf: Add kprobe session verifier test for return value	Jiri Olsa	2	-0/+33
	Making sure kprobe.session program can return only [0,1] values. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-11-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session verifier test for return value	Jiri Olsa	2	-0/+33
	Making sure uprobe.session program can return only [0,1] values. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-10-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session recursive test	Jiri Olsa	2	-0/+101
	Adding uprobe session test that verifies the cookie value is stored properly when single uprobe-ed function is executed recursively. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-9-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session cookie test	Jiri Olsa	2	-0/+79
	Adding uprobe session test that verifies the cookie value get properly propagated from entry to return program. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-8-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session test	Jiri Olsa	2	-0/+118
	Adding uprobe session test and testing that the entry program return value controls execution of the return probe program. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-7-jolsa@kernel.org
2024-11-11	libbpf: Add support for uprobe multi session attach	Jiri Olsa	3	-3/+20
	Adding support to attach program in uprobe session mode with bpf_program__attach_uprobe_multi function. Adding session bool to bpf_uprobe_multi_opts struct that allows to load and attach the bpf program via uprobe session. the attachment to create uprobe multi session. Also adding new program loader section that allows: SEC("uprobe.session/bpf_fentry_test*") and loads/attaches uprobe program as uprobe session. Adding sleepable hook (uprobe.session.s) as well. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-6-jolsa@kernel.org
2024-11-11	bpf: Add support for uprobe multi session context	Jiri Olsa	1	-10/+18
	Placing bpf_session_run_ctx layer in between bpf_run_ctx and bpf_uprobe_multi_run_ctx, so the session data can be retrieved from uprobe_multi link. Plus granting session kfuncs access to uprobe session programs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-5-jolsa@kernel.org
2024-11-11	bpf: Add support for uprobe multi session attach	Jiri Olsa	6	-11/+38
	Adding support to attach BPF program for entry and return probe of the same function. This is common use case which at the moment requires to create two uprobe multi links. Adding new BPF_TRACE_UPROBE_SESSION attach type that instructs kernel to attach single link program to both entry and exit probe. It's possible to control execution of the BPF program on return probe simply by returning zero or non zero from the entry BPF program execution to execute or not the BPF program on return probe respectively. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-4-jolsa@kernel.org
2024-11-11	bpf: Force uprobe bpf program to always return 0	Jiri Olsa	1	-3/+2
	As suggested by Andrii make uprobe multi bpf programs to always return 0, so they can't force uprobe removal. Keeping the int return type for uprobe_prog_run, because it will be used in following session changes. Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-3-jolsa@kernel.org
2024-11-11	bpf: Allow return values 0 and 1 for kprobe session	Jiri Olsa	1	-0/+9
	The kprobe session program can return only 0 or 1, instruct verifier to check for that. Fixes: 535a3692ba72 ("bpf: Add support for kprobe session attach") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-2-jolsa@kernel.org
2024-11-11	selftests/bpf: Fix uprobe consumer test (again)	Jiri Olsa	1	-6/+8
	The new uprobe changes bring some new behaviour that we need to reflect in the consumer test. Now pending uprobe instance in the kernel can survive longer and thus might call uretprobe consumer callbacks in some situations in which, previously, such callback would be omitted. We now need to take that into account in uprobe-multi consumer tests. The idea being that uretprobe under test either stayed from before to after (uret_stays + test_bit) or uretprobe instance survived and we have uretprobe active in after (uret_survives + test_bit). uret_survives just states that uretprobe survives if there are any uretprobes both before and after (overlapping or not, doesn't matter) and uprobe was attached before. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241107094337.3848210-1-jolsa@kernel.org
2024-11-11	bpf: Remove trailing whitespace in verifier.rst	Abhinav Saxena	1	-2/+2
	Remove trailing whitespace in Documentation/bpf/verifier.rst. Signed-off-by: Abhinav Saxena <xandfury@gmail.com> Link: https://lore.kernel.org/r/20241107063708.106340-2-xandfury@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Allow building with extra flags	Viktor Malik	1	-11/+23
	In order to specify extra compilation or linking flags to BPF selftests, it is possible to set EXTRA_CFLAGS and EXTRA_LDFLAGS from the command line. The problem is that they are not propagated to sub-make calls (runqslower, bpftool, libbpf) and in the better case are not applied, in the worse case cause the entire build fail. Propagate EXTRA_CFLAGS and EXTRA_LDFLAGS to the sub-makes. This, for instance, allows to build selftests as PIE with $ make EXTRA_CFLAGS='-fPIE' EXTRA_LDFLAGS='-pie' Without this change, the command would fail because libbpf.a would not be built with -fPIE and other PIE binaries would not link against it. The only problem is that we have to explicitly provide empty EXTRA_CFLAGS='' and EXTRA_LDFLAGS='' to the builds of kernel modules as we don't want to build modules with flags used for userspace (the above example would fail as kernel doesn't support PIE). Signed-off-by: Viktor Malik <vmalik@redhat.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-04	selftests/bpf: Add tests for raw_tp null handling	Kumar Kartikeya Dwivedi	4	-0/+67
	Ensure that trusted PTR_TO_BTF_ID accesses perform PROBE_MEM handling in raw_tp program. Without the previous fix, this selftest crashes the kernel due to a NULL-pointer dereference. Also ensure that dead code elimination does not kick in for checks on the pointer. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241104171959.2938862-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-04	selftests/bpf: Clean up open-coded gettid syscall invocations	Kumar Kartikeya Dwivedi	12	-22/+33
	Availability of the gettid definition across glibc versions supported by BPF selftests is not certain. Currently, all users in the tree open-code syscall to gettid. Convert them to a common macro definition. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241104171959.2938862-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-04	bpf: Mark raw_tp arguments with PTR_MAYBE_NULL	Kumar Kartikeya Dwivedi	4	-9/+87
	Arguments to a raw tracepoint are tagged as trusted, which carries the semantics that the pointer will be non-NULL. However, in certain cases, a raw tracepoint argument may end up being NULL. More context about this issue is available in [0]. Thus, there is a discrepancy between the reality, that raw_tp arguments can actually be NULL, and the verifier's knowledge, that they are never NULL, causing explicit NULL checks to be deleted, and accesses to such pointers potentially crashing the kernel. To fix this, mark raw_tp arguments as PTR_MAYBE_NULL, and then special case the dereference and pointer arithmetic to permit it, and allow passing them into helpers/kfuncs; these exceptions are made for raw_tp programs only. Ensure that we don't do this when ref_obj_id > 0, as in that case this is an acquired object and doesn't need such adjustment. The reason we do mask_raw_tp_trusted_reg logic is because other will recheck in places whether the register is a trusted_reg, and then consider our register as untrusted when detecting the presence of the PTR_MAYBE_NULL flag. To allow safe dereference, we enable PROBE_MEM marking when we see loads into trusted pointers with PTR_MAYBE_NULL. While trusted raw_tp arguments can also be passed into helpers or kfuncs where such broken assumption may cause issues, a future patch set will tackle their case separately, as PTR_TO_BTF_ID (without PTR_TRUSTED) can already be passed into helpers and causes similar problems. Thus, they are left alone for now. It is possible that these checks also permit passing non-raw_tp args that are trusted PTR_TO_BTF_ID with null marking. In such a case, allowing dereference when pointer is NULL expands allowed behavior, so won't regress existing programs, and the case of passing these into helpers is the same as above and will be dealt with later. Also update the failure case in tp_btf_nullable selftest to capture the new behavior, as the verifier will no longer cause an error when directly dereference a raw tracepoint argument marked as __nullable. [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb Reviewed-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241104171959.2938862-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-04	bpf: Move btf_type_is_struct_ptr() under CONFIG_BPF_SYSCALL	Alistair Francis	1	-11/+10
	The static inline btf_type_is_struct_ptr() function calls btf_type_skip_modifiers() which is guarded by CONFIG_BPF_SYSCALL. btf_type_is_struct_ptr() is also only called by CONFIG_BPF_SYSCALL ifdef code, so let's only expose btf_type_is_struct_ptr() if CONFIG_BPF_SYSCALL is defined. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Link: https://lore.kernel.org/r/20241104060300.421403-1-alistair.francis@wdc.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>