summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bpf, arm64: Add bpf trampoline for arm64Xu Kuohai2022-07-111-3/+382
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is arm64 version of commit fec56f5890d9 ("bpf: Introduce BPF trampoline"). A bpf trampoline converts native calling convention to bpf calling convention and is used to implement various bpf features, such as fentry, fexit, fmod_ret and struct_ops. This patch does essentially the same thing that bpf trampoline does on x86. Tested on Raspberry Pi 4B and qemu: #18 /1 bpf_tcp_ca/dctcp:OK #18 /2 bpf_tcp_ca/cubic:OK #18 /3 bpf_tcp_ca/invalid_license:OK #18 /4 bpf_tcp_ca/dctcp_fallback:OK #18 /5 bpf_tcp_ca/rel_setsockopt:OK #18 bpf_tcp_ca:OK #51 /1 dummy_st_ops/dummy_st_ops_attach:OK #51 /2 dummy_st_ops/dummy_init_ret_value:OK #51 /3 dummy_st_ops/dummy_init_ptr_arg:OK #51 /4 dummy_st_ops/dummy_multiple_args:OK #51 dummy_st_ops:OK #57 /1 fexit_bpf2bpf/target_no_callees:OK #57 /2 fexit_bpf2bpf/target_yes_callees:OK #57 /3 fexit_bpf2bpf/func_replace:OK #57 /4 fexit_bpf2bpf/func_replace_verify:OK #57 /5 fexit_bpf2bpf/func_sockmap_update:OK #57 /6 fexit_bpf2bpf/func_replace_return_code:OK #57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK #57 /8 fexit_bpf2bpf/func_replace_multi:OK #57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK #57 fexit_bpf2bpf:OK #237 xdp_bpf2bpf:OK Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20220711150823.2128542-5-xukuohai@huawei.com
* bpf, arm64: Implement bpf_arch_text_poke() for arm64Xu Kuohai2022-07-112-14/+322
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement bpf_arch_text_poke() for arm64, so bpf prog or bpf trampoline can be patched with it. When the target address is NULL, the original instruction is patched to a NOP. When the target address and the source address are within the branch range, the original instruction is patched to a bl instruction to the target address directly. To support attaching bpf trampoline to both regular kernel function and bpf prog, we follow the ftrace patchsite way for bpf prog. That is, two instructions are inserted at the beginning of bpf prog, the first one saves the return address to x9, and the second is a nop which will be patched to a bl instruction when a bpf trampoline is attached. However, when a bpf trampoline is attached to bpf prog, the distance between target address and source address may exceed 128MB, the maximum branch range, because bpf trampoline and bpf prog are allocated separately with vmalloc. So long jump should be handled. When a bpf prog is constructed, a plt pointing to empty trampoline dummy_tramp is placed at the end: bpf_prog: mov x9, lr nop // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target This is also the state when no trampoline is attached. When a short-jump bpf trampoline is attached, the patchsite is patched to a bl instruction to the trampoline directly: bpf_prog: mov x9, lr bl <short-jump bpf trampoline address> // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target When a long-jump bpf trampoline is attached, the plt target is filled with the trampoline address and the patchsite is patched to a bl instruction to the plt: bpf_prog: mov x9, lr bl plt // patchsite ... ret plt: ldr x10, target br x10 target: .quad <long-jump bpf trampoline address> dummy_tramp is used to prevent another CPU from jumping to an unknown location during the patching process, making the patching process easier. The patching process is as follows: 1. when neither the old address or the new address is a long jump, the patchsite is replaced with a bl to the new address, or nop if the new address is NULL; 2. when the old address is not long jump but the new one is, the branch target address is written to plt first, then the patchsite is replaced with a bl instruction to the plt; 3. when the old address is long jump but the new one is not, the address of dummy_tramp is written to plt first, then the patchsite is replaced with a bl to the new address, or a nop if the new address is NULL; 4. when both the old address and the new address are long jump, the new address is written to plt and the patchsite is not changed. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-4-xukuohai@huawei.com
* arm64: Add LDR (literal) instructionXu Kuohai2022-07-112-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | Add LDR (literal) instruction to load data from address relative to PC. This instruction will be used to implement long jump from bpf prog to bpf trampoline in the follow-up patch. The instruction encoding: 3 2 2 2 0 0 0 7 6 4 5 0 +-----+-------+---+-----+-------------------------------------+--------+ | 0 x | 0 1 1 | 0 | 0 0 | imm19 | Rt | +-----+-------+---+-----+-------------------------------------+--------+ for 32-bit, variant x == 0; for 64-bit, x == 1. branch_imm_common() is used to check the distance between pc and target address, since it's reused by this patch and LDR (literal) is not a branch instruction, rename it to label_imm_common(). Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/bpf/20220711150823.2128542-3-xukuohai@huawei.com
* bpf: Remove is_valid_bpf_tramp_flags()Xu Kuohai2022-07-113-20/+6
| | | | | | | | | | | | | | | | | | | | Before generating bpf trampoline, x86 calls is_valid_bpf_tramp_flags() to check the input flags. This check is architecture independent. So, to be consistent with x86, arm64 should also do this check before generating bpf trampoline. However, the BPF_TRAMP_F_XXX flags are not used by user code and the flags argument is almost constant at compile time, so this run time check is a bit redundant. Remove is_valid_bpf_tramp_flags() and add some comments to the usage of BPF_TRAMP_F_XXX flags, as suggested by Alexei. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-2-xukuohai@huawei.com
* skmsg: Fix invalid last sg check in sk_msg_recvmsg()Liu Jian2022-07-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | In sk_psock_skb_ingress_enqueue function, if the linear area + nr_frags + frag_list of the SKB has NR_MSG_FRAG_IDS blocks in total, skb_to_sgvec will return NR_MSG_FRAG_IDS, then msg->sg.end will be set to NR_MSG_FRAG_IDS, and in addition, (NR_MSG_FRAG_IDS - 1) is set to the last SG of msg. Recv the msg in sk_msg_recvmsg, when i is (NR_MSG_FRAG_IDS - 1), the sk_msg_iter_var_next(i) will change i to 0 (not NR_MSG_FRAG_IDS), the judgment condition "msg_rx->sg.start==msg_rx->sg.end" and "i != msg_rx->sg.end" can not work. As a result, the processed msg cannot be deleted from ingress_msg list. But the length of all the sge of the msg has changed to 0. Then the next recvmsg syscall will process the msg repeatedly, because the length of sge is 0, the -EFAULT error is always returned. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220628123616.186950-1-liujian56@huawei.com
* fddi/skfp: fix repeated words in commentsJilin Yuan2022-07-111-1/+1
| | | | | | | Delete the redundant word 'test'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ethernet/via: fix repeated words in commentsJilin Yuan2022-07-111-1/+1
| | | | | | | Delete the redundant word 'driver'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Find dst with sk's xfrm policy not ctl_sksewookseo2022-07-114-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2022-07-09125-6701/+5141
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/nMaxim Mikityanskiy2022-07-091-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_NF_CONNTRACK=m, struct bpf_ct_opts and enum member BPF_F_CURRENT_NETNS are not exposed. This commit allows building the xdp_synproxy selftest in such cases. Note that nf_conntrack must be loaded before running the test if it's compiled as a module. This commit also allows this selftest to be successfully compiled when CONFIG_NF_CONNTRACK is disabled. One unused local variable of type struct bpf_ct_opts is also removed. Fixes: fb5cd0ce70d4 ("selftests/bpf: Add selftests for raw syncookie helpers") Reported-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220708130319.1016294-1-maximmi@nvidia.com
| * bpf: Correctly propagate errors up from bpf_core_composites_matchDaniel Müller2022-07-091-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change addresses a comment made earlier [0] about a missing return of an error when __bpf_core_types_match is invoked from bpf_core_composites_match, which could have let to us erroneously ignoring errors. Regarding the typedef name check pointed out in the same context, it is not actually an issue, because callers of the function perform a name check for the root type anyway. To make that more obvious, let's add comments to the function (similar to what we have for bpf_core_types_are_compat, which is called in pretty much the same context). [0]: https://lore.kernel.org/bpf/165708121449.4919.13204634393477172905.git-patchwork-notify@kernel.org/T/#m55141e8f8cfd2e8d97e65328fa04852870d01af6 Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220707211931.3415440-1-deso@posteo.net
| * libbpf: Disable SEC pragma macro on GCCJames Hilliard2022-07-091-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems the gcc preprocessor breaks with pragmas when surrounding __attribute__. Disable these pragmas on GCC due to upstream bugs see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90400 Fixes errors like: error: expected identifier or '(' before '#pragma' 106 | SEC("cgroup/bind6") | ^~~ error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma' 114 | char _license[] SEC("license") = "GPL"; | ^~~ Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220706111839.1247911-1-james.hilliard1@gmail.com
| * bpf: Check attach_func_proto more carefully in check_return_codeStanislav Fomichev2022-07-084-11/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzkaller reports the following crash: RIP: 0010:check_return_code kernel/bpf/verifier.c:10575 [inline] RIP: 0010:do_check kernel/bpf/verifier.c:12346 [inline] RIP: 0010:do_check_common+0xb3d2/0xd250 kernel/bpf/verifier.c:14610 With the following reproducer: bpf$PROG_LOAD_XDP(0x5, &(0x7f00000004c0)={0xd, 0x3, &(0x7f0000000000)=ANY=[@ANYBLOB="1800000000000019000000000000000095"], &(0x7f0000000300)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80) Because we don't enforce expected_attach_type for XDP programs, we end up in hitting 'if (prog->expected_attach_type == BPF_LSM_CGROUP' part in check_return_code and follow up with testing `prog->aux->attach_func_proto->type`, but `prog->aux->attach_func_proto` is NULL. Add explicit prog_type check for the "Note, BPF_LSM_CGROUP that attach ..." condition. Also, don't skip return code check for LSM/STRUCT_OPS. The above actually brings an issue with existing selftest which tries to return EPERM from void inet_csk_clone. Fix the test (and move called_socket_clone to make sure it's not incremented in case of an error) and add a new one to explicitly verify this condition. Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor") Reported-by: syzbot+5cc0730bd4b4d2c5f152@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220708175000.2603078-1-sdf@google.com
| * selftests/bpf: Add test involving restrict type qualifierDaniel Müller2022-07-083-2/+13
| | | | | | | | | | | | | | | | | | | | | | This change adds a type based test involving the restrict type qualifier to the BPF selftests. On the btfgen path, this will verify that bpftool correctly handles the corresponding RESTRICT BTF kind. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220706212855.1700615-3-deso@posteo.net
| * bpftool: Add support for KIND_RESTRICT to gen min_core_btf commandDaniel Müller2022-07-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This change adjusts bpftool's type marking logic, as used in conjunction with TYPE_EXISTS relocations, to correctly recognize and handle the RESTRICT BTF kind. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220623212205.2805002-1-deso@posteo.net/T/#m4c75205145701762a4b398e0cdb911d5b5305ffc Link: https://lore.kernel.org/bpf/20220706212855.1700615-2-deso@posteo.net
| * MAINTAINERS: Add entry for AF_XDP selftests filesMaciej Fijalkowski2022-07-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lukas reported that after commit f36600634282 ("libbpf: move xsk.{c,h} into selftests/bpf") MAINTAINERS file needed an update. In the meantime, Magnus removed AF_XDP samples in commit cfb5a2dbf141 ("bpf, samples: Remove AF_XDP samples"), but selftests part still misses its entry in MAINTAINERS. Now that xdpxceiver became xskxceiver, tools/testing/selftests/bpf/*xsk* will match all of the files related to AF_XDP testing (test_xsk.sh, xskxceiver, xsk_prereqs.sh, xsk.{c,h}). Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707111613.49031-3-maciej.fijalkowski@intel.com
| * selftests, xsk: Rename AF_XDP testing appMaciej Fijalkowski2022-07-086-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently, xsk part of libbpf was moved to selftests/bpf directory and lives on its own because there is an AF_XDP testing application that needs it called xdpxceiver. That name makes it a bit hard to indicate who maintains it as there are other XDP samples in there, whereas this one is strictly about AF_XDP. Do s/xdpxceiver/xskxceiver so that it will be easier to figure out who maintains it. A follow-up patch will correct MAINTAINERS file. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707111613.49031-2-maciej.fijalkowski@intel.com
| * bpf, docs: Remove deprecated xsk libbpf APIs descriptionPu Lehui2022-07-081-11/+2
| | | | | | | | | | | | | | | | | | | | Since xsk APIs has been removed from libbpf, let's clean up the BPF docs simutaneously. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20220708042736.669132-1-pulehui@huawei.com
| * selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usageDave Marchevsky2022-07-076-1/+416
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This benchmark measures grace period latency and kthread cpu usage of RCU Tasks Trace when many processes are creating/deleting BPF local_storage. Intent here is to quantify improvement on these metrics after Paul's recent RCU Tasks patches [0]. Specifically, fork 15k tasks which call a bpf prog that creates/destroys task local_storage and sleep in a loop, resulting in many call_rcu_tasks_trace calls. To determine grace period latency, trace time elapsed between rcu_tasks_trace_pregp_step and rcu_tasks_trace_postgp; for cpu usage look at rcu_task_trace_kthread's stime in /proc/PID/stat. On my virtualized test environment (Skylake, 8 cpus) benchmark results demonstrate significant improvement: BEFORE Paul's patches: SUMMARY tasks_trace grace period latency avg 22298.551 us stddev 1302.165 us SUMMARY ticks per tasks_trace grace period avg 2.291 stddev 0.324 AFTER Paul's patches: SUMMARY tasks_trace grace period latency avg 16969.197 us stddev 2525.053 us SUMMARY ticks per tasks_trace grace period avg 1.146 stddev 0.178 Note that since these patches are not in bpf-next benchmarking was done by cherry-picking this patch onto rcu tree. [0] https://lore.kernel.org/rcu/20220620225402.GA3842369@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220705190018.3239050-1-davemarchevsky@fb.com
| * libbpf, riscv: Use a0 for RC registerYixun Lan2022-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the RISC-V calling convention register usage here [0], a0 is used as return value register, so rename it to make it consistent with the spec. [0] section 18.2, table 18.2 https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf Fixes: 589fed479ba1 ("riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h") Signed-off-by: Yixun Lan <dlan@gentoo.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Amjad OULED-AMEUR <ouledameur.amjad@gmail.com> Link: https://lore.kernel.org/bpf/20220706140204.47926-1-dlan@gentoo.org
| * libbpf: Remove unnecessary usdt_rel_ip assignmentsAndrii Nakryiko2022-07-061-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | Coverity detected that usdt_rel_ip is unconditionally overwritten anyways, so there is no need to unnecessarily initialize it with unused value. Clean this up. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20220705224818.4026623-4-andrii@kernel.org
| * selftests/bpf: Fix few more compiler warningsAndrii Nakryiko2022-07-063-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling with -O2, GCC detects few problems with selftests/bpf, so fix all of them. Two are real issues (uninitialized err and nums out-of-bounds access), but two other uninitialized variables warnings are due to GCC not being able to prove that variables are indeed initialized under conditions under which they are used. Fix all 4 cases, though. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20220705224818.4026623-3-andrii@kernel.org
| * selftests/bpf: Fix bogus uninitialized variable warningAndrii Nakryiko2022-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling selftests/bpf in optimized mode (-O2), GCC erroneously complains about uninitialized token variable: In file included from network_helpers.c:22: network_helpers.c: In function ‘open_netns’: test_progs.h:355:22: error: ‘token’ may be used uninitialized [-Werror=maybe-uninitialized] 355 | int ___err = libbpf_get_error(___res); \ | ^~~~~~~~~~~~~~~~~~~~~~~~ network_helpers.c:440:14: note: in expansion of macro ‘ASSERT_OK_PTR’ 440 | if (!ASSERT_OK_PTR(token, "malloc token")) | ^~~~~~~~~~~~~ In file included from /data/users/andriin/linux/tools/testing/selftests/bpf/tools/include/bpf/libbpf.h:21, from bpf_util.h:9, from network_helpers.c:20: /data/users/andriin/linux/tools/testing/selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17: note: by argument 1 of type ‘const void *’ to ‘libbpf_get_error’ declared here 113 | LIBBPF_API long libbpf_get_error(const void *ptr); | ^~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make: *** [Makefile:522: /data/users/andriin/linux/tools/testing/selftests/bpf/network_helpers.o] Error 1 This is completely bogus becuase libbpf_get_error() doesn't dereference pointer, but the only easy way to silence this is to allocate initialized memory with calloc(). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20220705224818.4026623-2-andrii@kernel.org
| * bpftool: Remove zlib feature test from MakefileQuentin Monnet2022-07-061-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The feature test to detect the availability of zlib in bpftool's Makefile does not bring much. The library is not optional: it may or may not be required along libbfd for disassembling instructions, but in any case it is necessary to build feature.o or even libbpf, on which bpftool depends. If we remove the feature test, we lose the nicely formatted error message, but we get a compiler error about "zlib.h: No such file or directory", which is equally informative. Let's get rid of the test. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220705200456.285943-1-quentin@isovalent.com
| * Merge branch 'cleanup the legacy probe_event on failed scenario'Andrii Nakryiko2022-07-061-8/+27
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chuang Wang says: ==================== A potential scenario, when an error is returned after add_uprobe_event_legacy() in perf_event_uprobe_open_legacy(), or bpf_program__attach_perf_event_opts() in bpf_program__attach_uprobe_opts() returns an error, the uprobe_event that was previously created is not cleaned. At the same time, the legacy kprobe_event also have similar problems. With these patches, whenever an error is returned, it ensures that the created kprobe_event/uprobe_event is cleaned. V1 -> v3: - add detail commits - call remove_kprobe_event_legacy() on failed bpf_program__attach_perf_event_opts() v3 -> v4: - cleanup the legacy kprobe_event on failed add/attach_event ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
| | * libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()Chuang Wang2022-07-061-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A potential scenario, when an error is returned after add_uprobe_event_legacy() in perf_event_uprobe_open_legacy(), or bpf_program__attach_perf_event_opts() in bpf_program__attach_uprobe_opts() returns an error, the uprobe_event that was previously created is not cleaned. So, with this patch, when an error is returned, fix this by adding remove_uprobe_event_legacy() Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220629151848.65587-4-nashuiliang@gmail.com
| | * libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()Chuang Wang2022-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use "type" as opposed to "err" in pr_warn() after determine_uprobe_perf_type_legacy() returns an error. Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220629151848.65587-3-nashuiliang@gmail.com
| | * libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()Chuang Wang2022-07-061-4/+14
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the 0bc11ed5ab60 commit ("kprobes: Allow kprobes coexist with livepatch"), in a scenario where livepatch and kprobe coexist on the same function entry, the creation of kprobe_event using add_kprobe_event_legacy() will be successful, at the same time as a trace event (e.g. /debugfs/tracing/events/kprobe/XXX) will exist, but perf_event_open() will return an error because both livepatch and kprobe use FTRACE_OPS_FL_IPMODIFY. As follows: 1) add a livepatch $ insmod livepatch-XXX.ko 2) add a kprobe using tracefs API (i.e. add_kprobe_event_legacy) $ echo 'p:mykprobe XXX' > /sys/kernel/debug/tracing/kprobe_events 3) enable this kprobe (i.e. sys_perf_event_open) This will return an error, -EBUSY. On Andrii Nakryiko's comment, few error paths in bpf_program__attach_kprobe_opts() that should need to call remove_kprobe_event_legacy(). With this patch, whenever an error is returned after add_kprobe_event_legacy() or bpf_program__attach_perf_event_opts(), this ensures that the created kprobe_event is cleaned. Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Jingren Zhou <zhoujingren@didiglobal.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220629151848.65587-2-nashuiliang@gmail.com
| * Merge branch 'Introduce type match support'Andrii Nakryiko2022-07-0613-18/+648
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Müller says: ==================== This patch set proposes the addition of a new way for performing type queries to BPF. It introduces the "type matches" relation, similar to what is already present with "type exists" (in the form of bpf_core_type_exists). "type exists" performs fairly superficial checking, mostly concerned with whether a type exists in the kernel and is of the same kind (enum/struct/...). Notably, compatibility checks for members of composite types is lacking. The newly introduced "type matches" (bpf_core_type_matches) fills this gap in that it performs stricter checks: compatibility of members and existence of similarly named enum variants is checked as well. E.g., given these definitions: struct task_struct___og { int pid; int tgid; }; struct task_struct___foo { int foo; } 'task_struct___og' would "match" the kernel type 'task_struct', because the members match up, while 'task_struct___foo' would not match, because the kernel's 'task_struct' has no member named 'foo'. More precisely, the "type match" relation is defined as follows (copied from source): - modifiers and typedefs are stripped (and, hence, effectively ignored) - generally speaking types need to be of same kind (struct vs. struct, union vs. union, etc.) - exceptions are struct/union behind a pointer which could also match a forward declaration of a struct or union, respectively, and enum vs. enum64 (see below) Then, depending on type: - integers: - match if size and signedness match - arrays & pointers: - target types are recursively matched - structs & unions: - local members need to exist in target with the same name - for each member we recursively check match unless it is already behind a pointer, in which case we only check matching names and compatible kind - enums: - local variants have to have a match in target by symbolic name (but not numeric value) - size has to match (but enum may match enum64 and vice versa) - function pointers: - number and position of arguments in local type has to match target - for each argument and the return value we recursively check match Enabling this feature requires a new relocation to be made known to the compiler. This is being taken care of for LLVM as part of https://reviews.llvm.org/D126838. If applied, among other things, usage of this functionality could have helped flag issues such as the one discussed here https://lore.kernel.org/all/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/ earlier. Suggested-by: Andrii Nakryiko <andrii@kernel.org> --- Changelog: v2 -> v3: - renamed btfgen_mark_types_match - covered BTF_KIND_RESTRICT in type match marking logic - used bpf_core_names_match in more places - reworked "behind pointer" logic - added test using live task_struct v1 -> v2: - deduplicated and moved core algorithm into relo_core.c - adjusted bpf_core_names_match to get btf_type passed in - removed some length equality checks before strncmp usage - correctly use kflag from targ_t instead of local_t - added comment for meaning of kflag w/ FWD kind - __u32 -> u32 - handle BTF_KIND_FWD properly in bpftool marking logic - rebased ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
| | * selftests/bpf: Add type match test against kernel's task_structDaniel Müller2022-07-063-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change extends the existing core_reloc/kernel test to include a type match check of a local task_struct against the kernel's definition -- which we assume to succeed. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-11-deso@posteo.net
| | * selftests/bpf: Add nested type to type based testsDaniel Müller2022-07-063-20/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change extends the type based tests with another struct type (in addition to a_struct) to check relocations against: a_complex_struct. This type is nested more deeply to provide additional coverage of certain paths in the type match logic. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-10-deso@posteo.net
| | * selftests/bpf: Add test checking more characteristicsDaniel Müller2022-07-063-0/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds another type-based self-test that specifically aims to test some more characteristics of the TYPE_MATCH logic. Specifically, it covers a few more potential differences between types, such as different orders, enum variant values, and integer signedness. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-9-deso@posteo.net
| | * selftests/bpf: Add type-match checks to type-based testsDaniel Müller2022-07-063-4/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have type-match logic in both libbpf and the kernel, this change adjusts the existing BPF self tests to check this functionality. Specifically, we extend the existing type-based tests to check the previously introduced bpf_core_type_matches macro. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-8-deso@posteo.net
| | * libbpf: add bpf_core_type_matches() helper macroAndrii Nakryiko2022-07-061-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch finalizes support for the proposed type match relation in libbpf by adding bpf_core_type_matches() macro which emits TYPE_MATCH relocation. Clang support for this relocation was added in [0]. [0] https://reviews.llvm.org/D126838 Signed-off-by: Daniel Müller <deso@posteo.net>¬ Signed-off-by: Andrii Nakryiko <andrii@kernel.org>¬ Link: https://lore.kernel.org/bpf/20220628160127.607834-7-deso@posteo.net¬
| | * bpf, libbpf: Add type match supportDaniel Müller2022-07-064-4/+294
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the proposed type match relation to relo_core where it is shared between userspace and kernel. It plumbs through both kernel-side and libbpf-side support. The matching relation is defined as follows (copy from source): - modifiers and typedefs are stripped (and, hence, effectively ignored) - generally speaking types need to be of same kind (struct vs. struct, union vs. union, etc.) - exceptions are struct/union behind a pointer which could also match a forward declaration of a struct or union, respectively, and enum vs. enum64 (see below) Then, depending on type: - integers: - match if size and signedness match - arrays & pointers: - target types are recursively matched - structs & unions: - local members need to exist in target with the same name - for each member we recursively check match unless it is already behind a pointer, in which case we only check matching names and compatible kind - enums: - local variants have to have a match in target by symbolic name (but not numeric value) - size has to match (but enum may match enum64 and vice versa) - function pointers: - number and position of arguments in local type has to match target - for each argument and the return value we recursively check match Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-5-deso@posteo.net
| | * bpftool: Honor BPF_CORE_TYPE_MATCHES relocationDaniel Müller2022-07-061-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpftool needs to know about the newly introduced BPF_CORE_TYPE_MATCHES relocation for its 'gen min_core_btf' command to work properly in the present of this relocation. Specifically, we need to make sure to mark types and fields so that they are present in the minimized BTF for "type match" checks to work out. However, contrary to the existing btfgen_record_field_relo, we need to rely on the BTF -- and not the spec -- to find fields. With this change we handle this new variant correctly. The functionality will be tested with follow on changes to BPF selftests, which already run against a minimized BTF created with bpftool. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220628160127.607834-3-deso@posteo.net
| | * bpf: Introduce TYPE_MATCH related constants/macrosDaniel Müller2022-07-063-0/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | In order to provide type match support we require a new type of relocation which, in turn, requires toolchain support. Recent LLVM/Clang versions support a new value for the last argument to the __builtin_preserve_type_info builtin, for example. With this change we introduce the necessary constants into relevant header files, mirroring what the compiler may support. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-2-deso@posteo.net
| * bpf, samples: Remove AF_XDP samplesMagnus Karlsson2022-07-057-3348/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the AF_XDP samples from samples/bpf/ as they are dependent on the AF_XDP support in libbpf. This support has now been removed in the 1.0 release, so these samples cannot be compiled anymore. Please start to use libxdp instead. It is backwards compatible with the AF_XDP support that was offered in libbpf. New samples can be found in the various xdp-project repositories connected to libxdp and by googling. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20220630093717.8664-1-magnus.karlsson@gmail.com
| * bpftool: Rename "bpftool feature list" into "... feature list_builtins"Quentin Monnet2022-07-053-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make it more explicit that the features listed with "bpftool feature list" are known to bpftool, but not necessary available on the system (as opposed to the probed features), rename the "feature list" command into "feature list_builtins". Note that "bpftool feature list" still works as before given that we recognise arguments from their prefixes; but the real name of the subcommand, in particular as displayed in the man page or the interactive help, will now include "_builtins". Since we update the bash completion accordingly, let's also take this chance to redirect error output to /dev/null in the completion script, to avoid displaying unexpected error messages when users attempt to tab-complete. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220701093805.16920-1-quentin@isovalent.com
| * bpf: Omit superfluous address family check in __bpf_skc_lookupTobias Klauser2022-07-051-3/+2
| | | | | | | | | | | | | | | | | | | | family is only set to either AF_INET or AF_INET6 based on len. In all other cases we return early. Thus the check against AF_UNSPEC can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220630082618.15649-1-tklauser@distanz.ch
| * selftests/bpf: Skip lsm_cgroup when we don't have trampolinesStanislav Fomichev2022-07-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With arch_prepare_bpf_trampoline removed on x86: [...] #98/1 lsm_cgroup/functional:SKIP #98 lsm_cgroup:SKIP Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220630224203.512815-1-sdf@google.com
| * bpftool: Show also the name of type BPF_OBJ_LINKYafang Shao2022-06-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For example, /sys/fs/bpf/maps.debug is a BPF link. When you run `bpftool map show` to show it: Before: $ bpftool map show pinned /sys/fs/bpf/maps.debug Error: incorrect object type: unknown After: $ bpftool map show pinned /sys/fs/bpf/maps.debug Error: incorrect object type: link Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220629154832.56986-5-laoar.shao@gmail.com
| * selftests/xsk: Destroy BPF resources only when ctx refcount drops to 0Maciej Fijalkowski2022-06-301-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, xsk_socket__delete frees BPF resources regardless of ctx refcount. Xdpxceiver has a test to verify whether underlying BPF resources would not be wiped out after closing XSK socket that was bound to interface with other active sockets. From library's xsk part perspective it also means that the internal xsk context is shared and its refcount is bumped accordingly. After a switch to loading XDP prog based on previously opened XSK socket, mentioned xdpxceiver test fails with: not ok 16 [xdpxceiver.c:swap_xsk_resources:1334]: ERROR: 9/"Bad file descriptor which means that in swap_xsk_resources(), xsk_socket__delete() released xskmap which in turn caused a failure of xsk_socket__update_xskmap(). To fix this, when deleting socket, decrement ctx refcount before releasing BPF resources and do so only when refcount dropped to 0 which means there are no more active sockets for this ctx so BPF resources can be freed safely. Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220629143458.934337-5-maciej.fijalkowski@intel.com
| * selftests/xsk: Verify correctness of XDP prog attach pointMaciej Fijalkowski2022-06-301-0/+17
| | | | | | | | | | | | | | | | | | | | | | To prevent the case we had previously where for TEST_MODE_SKB, XDP prog was attached in native mode, call bpf_xdp_query() after loading prog and make sure that attach_mode is as expected. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220629143458.934337-4-maciej.fijalkowski@intel.com
| * selftests/xsk: Introduce XDP prog load based on existing AF_XDP socketMaciej Fijalkowski2022-06-303-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, xsk_setup_xdp_prog() uses anonymous xsk_socket struct which means that during xsk_create_bpf_link() call, xsk->config.xdp_flags is always 0. This in turn means that from xdpxceiver it is impossible to use xdpgeneric attachment, so since commit 3b22523bca02 ("selftests, xsk: Fix bpf_res cleanup test") we were not testing SKB mode at all. To fix this, introduce a function, called xsk_setup_xdp_prog_xsk(), that will load XDP prog based on the existing xsk_socket, so that xsk context's refcount is correctly bumped and flags from application side are respected. Use this from xdpxceiver side so we get coverage of generic and native XDP program attach points. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220629143458.934337-3-maciej.fijalkowski@intel.com
| * selftests/xsk: Avoid bpf_link probe for existing xskMaciej Fijalkowski2022-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently bpf_link probe is done for each call of xsk_socket__create(). For cases where xsk context was previously created and current socket creation uses it, has_bpf_link will be overwritten, where it has already been initialized. Optimize this by moving the query to the xsk_create_ctx() so that when xsk_get_ctx() finds a ctx then no further bpf_link probes are needed. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220629143458.934337-2-maciej.fijalkowski@intel.com
| * bpftool: Use feature list in bash completionQuentin Monnet2022-06-302-34/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that bpftool is able to produce a list of known program, map, attach types, let's use as much of this as we can in the bash completion file, so that we don't have to expand the list each time a new type is added to the kernel. Also update the relevant test script to remove some checks that are no longer needed. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Müller <deso@posteo.net> Link: https://lore.kernel.org/bpf/20220629203637.138944-3-quentin@isovalent.com
| * bpftool: Add feature list (prog/map/link/attach types, helpers)Quentin Monnet2022-06-303-1/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a "bpftool feature list" subcommand to list BPF "features". Contrarily to "bpftool feature probe", this is not about the features available on the system. Instead, it lists all features known to bpftool from compilation time; in other words, all program, map, attach, link types known to the libbpf version in use, and all helpers found in the UAPI BPF header. The first use case for this feature is bash completion: running the command provides a list of types that can be used to produce the list of candidate map types, for example. Now that bpftool uses "standard" names provided by libbpf for the program, map, link, and attach types, having the ability to list these types and helpers could also be useful in scripts to loop over existing items. Sample output: # bpftool feature list prog_types | grep -vw unspec | head -n 6 socket_filter kprobe sched_cls sched_act tracepoint xdp # bpftool -p feature list map_types | jq '.[1]' "hash" # bpftool feature list attach_types | grep '^cgroup_' cgroup_inet_ingress cgroup_inet_egress [...] cgroup_inet_sock_release # bpftool feature list helpers | grep -vw bpf_unspec | wc -l 207 The "unspec" types and helpers are not filtered out by bpftool, so as to remain closer to the enums, and to preserve the indices in the JSON arrays (e.g. "hash" at index 1 == BPF_MAP_TYPE_HASH in map types list). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Müller <deso@posteo.net> Link: https://lore.kernel.org/bpf/20220629203637.138944-2-quentin@isovalent.com
| * bpftool: Remove attach_type_name forward declarationTobias Klauser2022-06-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | The attach_type_name definition was removed in commit 1ba5ad36e00f ("bpftool: Use libbpf_bpf_attach_type_str"). Remove its forward declaration in main.h as well. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220630093638.25916-1-tklauser@distanz.ch
| * bpftool: Probe for memcg-based accounting before bumping rlimitQuentin Monnet2022-06-291-3/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bpftool used to bump the memlock rlimit to make sure to be able to load BPF objects. After the kernel has switched to memcg-based memory accounting [0] in 5.11, bpftool has relied on libbpf to probe the system for memcg-based accounting support and for raising the rlimit if necessary [1]. But this was later reverted, because the probe would sometimes fail, resulting in bpftool not being able to load all required objects [2]. Here we add a more efficient probe, in bpftool itself. We first lower the rlimit to 0, then we attempt to load a BPF object (and finally reset the rlimit): if the load succeeds, then memcg-based memory accounting is supported. This approach was earlier proposed for the probe in libbpf itself [3], but given that the library may be used in multithreaded applications, the probe could have undesirable consequences if one thread attempts to lock kernel memory while memlock rlimit is at 0. Since bpftool is single-threaded and the rlimit is process-based, this is fine to do in bpftool itself. This probe was inspired by the similar one from the cilium/ebpf Go library [4]. [0] commit 97306be45fbe ("Merge branch 'switch to memcg-based memory accounting'") [1] commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK") [2] commit 6b4384ff1088 ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"") [3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39 Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/bpf/20220629111351.47699-1-quentin@isovalent.com