diff options
143 files changed, 7123 insertions, 1092 deletions
diff --git a/Documentation/bpf/bpf_prog_run.rst b/Documentation/bpf/bpf_prog_run.rst new file mode 100644 index 000000000000..4868c909df5c --- /dev/null +++ b/Documentation/bpf/bpf_prog_run.rst @@ -0,0 +1,117 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Running BPF programs from userspace +=================================== + +This document describes the ``BPF_PROG_RUN`` facility for running BPF programs +from userspace. + +.. contents:: + :local: + :depth: 2 + + +Overview +-------- + +The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to +execute a BPF program in the kernel and return the results to userspace. This +can be used to unit test BPF programs against user-supplied context objects, and +as way to explicitly execute programs in the kernel for their side effects. The +command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue +to be defined in the UAPI header, aliased to the same value. + +The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the +following types: + +- ``BPF_PROG_TYPE_SOCKET_FILTER`` +- ``BPF_PROG_TYPE_SCHED_CLS`` +- ``BPF_PROG_TYPE_SCHED_ACT`` +- ``BPF_PROG_TYPE_XDP`` +- ``BPF_PROG_TYPE_SK_LOOKUP`` +- ``BPF_PROG_TYPE_CGROUP_SKB`` +- ``BPF_PROG_TYPE_LWT_IN`` +- ``BPF_PROG_TYPE_LWT_OUT`` +- ``BPF_PROG_TYPE_LWT_XMIT`` +- ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` +- ``BPF_PROG_TYPE_FLOW_DISSECTOR`` +- ``BPF_PROG_TYPE_STRUCT_OPS`` +- ``BPF_PROG_TYPE_RAW_TRACEPOINT`` +- ``BPF_PROG_TYPE_SYSCALL`` + +When using the ``BPF_PROG_RUN`` command, userspace supplies an input context +object and (for program types operating on network packets) a buffer containing +the packet data that the BPF program will operate on. The kernel will then +execute the program and return the results to userspace. Note that programs will +not have any side effects while being run in this mode; in particular, packets +will not actually be redirected or dropped, the program return code will just be +returned to userspace. A separate mode for live execution of XDP programs is +provided, documented separately below. + +Running XDP programs in "live frame mode" +----------------------------------------- + +The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, +which can be used to execute XDP programs in a way where packets will actually +be processed by the kernel after the execution of the XDP program as if they +arrived on a physical interface. This mode is activated by setting the +``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to +``BPF_PROG_RUN``. + +The live packet mode is optimised for high performance execution of the supplied +XDP program many times (suitable for, e.g., running as a traffic generator), +which means the semantics are not quite as straight-forward as the regular test +run mode. Specifically: + +- When executing an XDP program in live frame mode, the result of the execution + will not be returned to userspace; instead, the kernel will perform the + operation indicated by the program's return code (drop the packet, redirect + it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes + in the syscall parameters when running in this mode will be rejected. In + addition, not all failures will be reported back to userspace directly; + specifically, only fatal errors in setup or during execution (like memory + allocation errors) will halt execution and return an error. If an error occurs + in packet processing, like a failure to redirect to a given interface, + execution will continue with the next repetition; these errors can be detected + via the same trace points as for regular XDP programs. + +- Userspace can supply an ifindex as part of the context object, just like in + the regular (non-live) mode. The XDP program will be executed as though the + packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context + object will point to that interface. Furthermore, if the XDP program returns + ``XDP_PASS``, the packet will be injected into the kernel networking stack as + though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet + will be transmitted *out* of that same interface. Do note, though, that + because the program execution is not happening in driver context, an + ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to + that same interface (i.e., it will only work if the driver has support for the + ``ndo_xdp_xmit`` driver op). + +- When running the program with multiple repetitions, the execution will happen + in batches. The batch size defaults to 64 packets (which is same as the + maximum NAPI receive batch size), but can be specified by userspace through + the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, + the kernel executes the XDP program repeatedly, each invocation getting a + separate copy of the packet data. For each repetition, if the program drops + the packet, the data page is immediately recycled (see below). Otherwise, the + packet is buffered until the end of the batch, at which point all packets + buffered this way during the batch are transmitted at once. + +- When setting up the test run, the kernel will initialise a pool of memory + pages of the same size as the batch size. Each memory page will be initialised + with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` + invocation. When possible, the pages will be recycled on future program + invocations, to improve performance. Pages will generally be recycled a full + batch at a time, except when a packet is dropped (by return code or because + of, say, a redirection error), in which case that page will be recycled + immediately. If a packet ends up being passed to the regular networking stack + (because the XDP program returns ``XDP_PASS``, or because it ends up being + redirected to an interface that injects it into the stack), the page will be + released and a new one will be allocated when the pool is empty. + + When recycling, the page content is not rewritten; only the packet boundary + pointers (``data``, ``data_end`` and ``data_meta``) in the context object will + be reset to the original values. This means that if a program rewrites the + packet contents, it has to be prepared to see either the original content or + the modified version on subsequent invocations. diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index ef5c996547ec..96056a7447c7 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture. helpers programs maps + bpf_prog_run classic_vs_extended.rst bpf_licensing test_debug diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst new file mode 100644 index 000000000000..b64bec1ce144 --- /dev/null +++ b/Documentation/trace/fprobe.rst @@ -0,0 +1,174 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================== +Fprobe - Function entry/exit probe +================================== + +.. Author: Masami Hiramatsu <mhiramat@kernel.org> + +Introduction +============ + +Fprobe is a function entry/exit probe mechanism based on ftrace. +Instead of using ftrace full feature, if you only want to attach callbacks +on function entry and exit, similar to the kprobes and kretprobes, you can +use fprobe. Compared with kprobes and kretprobes, fprobe gives faster +instrumentation for multiple functions with single handler. This document +describes how to use fprobe. + +The usage of fprobe +=================== + +The fprobe is a wrapper of ftrace (+ kretprobe-like return callback) to +attach callbacks to multiple function entry and exit. User needs to set up +the `struct fprobe` and pass it to `register_fprobe()`. + +Typically, `fprobe` data structure is initialized with the `entry_handler` +and/or `exit_handler` as below. + +.. code-block:: c + + struct fprobe fp = { + .entry_handler = my_entry_callback, + .exit_handler = my_exit_callback, + }; + +To enable the fprobe, call one of register_fprobe(), register_fprobe_ips(), and +register_fprobe_syms(). These functions register the fprobe with different types +of parameters. + +The register_fprobe() enables a fprobe by function-name filters. +E.g. this enables @fp on "func*()" function except "func2()".:: + + register_fprobe(&fp, "func*", "func2"); + +The register_fprobe_ips() enables a fprobe by ftrace-location addresses. +E.g. + +.. code-block:: c + + unsigned long ips[] = { 0x.... }; + + register_fprobe_ips(&fp, ips, ARRAY_SIZE(ips)); + +And the register_fprobe_syms() enables a fprobe by symbol names. +E.g. + +.. code-block:: c + + char syms[] = {"func1", "func2", "func3"}; + + register_fprobe_syms(&fp, syms, ARRAY_SIZE(syms)); + +To disable (remove from functions) this fprobe, call:: + + unregister_fprobe(&fp); + +You can temporally (soft) disable the fprobe by:: + + disable_fprobe(&fp); + +and resume by:: + + enable_fprobe(&fp); + +The above is defined by including the header:: + + #include <linux/fprobe.h> + +Same as ftrace, the registered callbacks will start being called some time +after the register_fprobe() is called and before it returns. See +:file:`Documentation/trace/ftrace.rst`. + +Also, the unregister_fprobe() will guarantee that the both enter and exit +handlers are no longer being called by functions after unregister_fprobe() +returns as same as unregister_ftrace_function(). + +The fprobe entry/exit handler +============================= + +The prototype of the entry/exit callback function is as follows: + +.. code-block:: c + + void callback_func(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); + +Note that both entry and exit callbacks have same ptototype. The @entry_ip is +saved at function entry and passed to exit handler. + +@fp + This is the address of `fprobe` data structure related to this handler. + You can embed the `fprobe` to your data structure and get it by + container_of() macro from @fp. The @fp must not be NULL. + +@entry_ip + This is the ftrace address of the traced function (both entry and exit). + Note that this may not be the actual entry address of the function but + the address where the ftrace is instrumented. + +@regs + This is the `pt_regs` data structure at the entry and exit. Note that + the instruction pointer of @regs may be different from the @entry_ip + in the entry_handler. If you need traced instruction pointer, you need + to use @entry_ip. On the other hand, in the exit_handler, the instruction + pointer of @regs is set to the currect return address. + +Share the callbacks with kprobes +================================ + +Since the recursion safeness of the fprobe (and ftrace) is a bit different +from the kprobes, this may cause an issue if user wants to run the same +code from the fprobe and the kprobes. + +Kprobes has per-cpu 'current_kprobe' variable which protects the kprobe +handler from recursion in all cases. On the other hand, fprobe uses +only ftrace_test_recursion_trylock(). This allows interrupt context to +call another (or same) fprobe while the fprobe user handler is running. + +This is not a matter if the common callback code has its own recursion +detection, or it can handle the recursion in the different contexts +(normal/interrupt/NMI.) +But if it relies on the 'current_kprobe' recursion lock, it has to check +kprobe_running() and use kprobe_busy_*() APIs. + +Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback +code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED +*before* registering the fprobe, like: + +.. code-block:: c + + fprobe.flags = FPROBE_FL_KPROBE_SHARED; + + register_fprobe(&fprobe, "func*", NULL); + +This will protect your common callback from the nested call. + +The missed counter +================== + +The `fprobe` data structure has `fprobe::nmissed` counter field as same as +kprobes. +This counter counts up when; + + - fprobe fails to take ftrace_recursion lock. This usually means that a function + which is traced by other ftrace users is called from the entry_handler. + + - fprobe fails to setup the function exit because of the shortage of rethook + (the shadow stack for hooking the function return.) + +The `fprobe::nmissed` field counts up in both cases. Therefore, the former +skips both of entry and exit callback and the latter skips the exit +callback, but in both case the counter will increase by 1. + +Note that if you set the FTRACE_OPS_FL_RECURSION and/or FTRACE_OPS_FL_RCU to +`fprobe::ops::flags` (ftrace_ops::flags) when registering the fprobe, this +counter may not work correctly, because ftrace skips the fprobe function which +increase the counter. + + +Functions and structures +======================== + +.. kernel-doc:: include/linux/fprobe.h +.. kernel-doc:: kernel/trace/fprobe.c + diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index 3769b9b7aed8..b9f3757f8269 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -9,6 +9,7 @@ Linux Tracing Technologies tracepoint-analysis ftrace ftrace-uses + fprobe kprobes kprobetrace uprobetracer diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index 10ceebb7530b..9e457156ad4d 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -1864,7 +1864,7 @@ static int build_body(struct jit_ctx *ctx) if (ctx->target == NULL) ctx->offsets[i] = ctx->idx; - /* If unsuccesfull, return with error code */ + /* If unsuccesful, return with error code */ if (ret) return ret; } @@ -1973,7 +1973,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) * for jit, although it can decrease the size of the image. * * As each arm instruction is of length 32bit, we are translating - * number of JITed intructions into the size required to store these + * number of JITed instructions into the size required to store these * JITed code. */ image_size = sizeof(u32) * ctx.idx; diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 6b8de13faf83..6efbb87f65ed 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2335,7 +2335,13 @@ out_image: sizeof(rw_header->size)); bpf_jit_binary_pack_free(header, rw_header); } + /* Fall back to interpreter mode */ prog = orig_prog; + if (extra_pass) { + prog->bpf_func = NULL; + prog->jited = 0; + prog->jited_len = 0; + } goto out_addrs; } if (image) { @@ -2384,8 +2390,9 @@ out_image: * Both cases are serious bugs and justify WARN_ON. */ if (WARN_ON(bpf_jit_binary_pack_finalize(prog, header, rw_header))) { - prog = orig_prog; - goto out_addrs; + /* header has been freed */ + header = NULL; + goto out_image; } bpf_tail_call_direct_fixup(prog); diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 58b20ea171dd..1b5714926d81 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -433,21 +433,6 @@ static void veth_set_multicast_list(struct net_device *dev) { } -static struct sk_buff *veth_build_skb(void *head, int headroom, int len, - int buflen) -{ - struct sk_buff *skb; - - skb = build_skb(head, buflen); - if (!skb) - return NULL; - - skb_reserve(skb, headroom); - skb_put(skb, len); - - return skb; -} - static int veth_select_rxq(struct net_device *dev) { return smp_processor_id() % dev->real_num_rx_queues; @@ -494,7 +479,7 @@ static int veth_xdp_xmit(struct net_device *dev, int n, struct xdp_frame *frame = frames[i]; void *ptr = veth_xdp_to_ptr(frame); - if (unlikely(frame->len > max_len || + if (unlikely(xdp_get_frame_len(frame) > max_len || __ptr_ring_produce(&rq->xdp_ring, ptr))) break; nxmit++; @@ -695,72 +680,143 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames, } } -static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, - struct sk_buff *skb, - struct veth_xdp_tx_bq *bq, - struct veth_stats *stats) +static void veth_xdp_get(struct xdp_buff *xdp) { - u32 pktlen, headroom, act, metalen, frame_sz; - void *orig_data, *orig_data_end; - struct bpf_prog *xdp_prog; - int mac_len, delta, off; - struct xdp_buff xdp; + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); + int i; - skb_prepare_for_gro(skb); + get_page(virt_to_page(xdp->data)); + if (likely(!xdp_buff_has_frags(xdp))) + return; - rcu_read_lock(); - xdp_prog = rcu_dereference(rq->xdp_prog); - if (unlikely(!xdp_prog)) { - rcu_read_unlock(); - goto out; - } + for (i = 0; i < sinfo->nr_frags; i++) + __skb_frag_ref(&sinfo->frags[i]); +} - mac_len = skb->data - skb_mac_header(skb); - pktlen = skb->len + mac_len; - headroom = skb_headroom(skb) - mac_len; +static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, + struct xdp_buff *xdp, + struct sk_buff **pskb) +{ + struct sk_buff *skb = *pskb; + u32 frame_sz; if (skb_shared(skb) || skb_head_is_locked(skb) || - skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) { + skb_shinfo(skb)->nr_frags) { + u32 size, len, max_head_size, off; struct sk_buff *nskb; - int size, head_off; - void *head, *start; struct page *page; + int i, head_off; - size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - if (size > PAGE_SIZE) + /* We need a private copy of the skb and data buffers since + * the ebpf program can modify it. We segment the original skb + * into order-0 pages without linearize it. + * + * Make sure we have enough space for linear and paged area + */ + max_head_size = SKB_WITH_OVERHEAD(PAGE_SIZE - + VETH_XDP_HEADROOM); + if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size) goto drop; + /* Allocate skb head */ page = alloc_page(GFP_ATOMIC | __GFP_NOWARN); if (!page) goto drop; - head = page_address(page); - start = head + VETH_XDP_HEADROOM; - if (skb_copy_bits(skb, -mac_len, start, pktlen)) { - page_frag_free(head); + nskb = build_skb(page_address(page), PAGE_SIZE); + if (!nskb) { + put_page(page); goto drop; } - nskb = veth_build_skb(head, VETH_XDP_HEADROOM + mac_len, - skb->len, PAGE_SIZE); - if (!nskb) { - page_frag_free(head); + skb_reserve(nskb, VETH_XDP_HEADROOM); + size = min_t(u32, skb->len, max_head_size); + if (skb_copy_bits(skb, 0, nskb->data, size)) { + consume_skb(nskb); goto drop; } + skb_put(nskb, size); skb_copy_header(nskb, skb); head_off = skb_headroom(nskb) - skb_headroom(skb); skb_headers_offset_update(nskb, head_off); + + /* Allocate paged area of new skb */ + off = size; + len = skb->len - off; + + for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { + page = alloc_page(GFP_ATOMIC | __GFP_NOWARN); + if (!page) { + consume_skb(nskb); + goto drop; + } + + size = min_t(u32, len, PAGE_SIZE); + skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE); + if (skb_copy_bits(skb, off, page_address(page), + size)) { + consume_skb(nskb); + goto drop; + } + + len -= size; + off += size; + } + consume_skb(skb); skb = nskb; + } else if (skb_headroom(skb) < XDP_PACKET_HEADROOM && + pskb_expand_head(skb, VETH_XDP_HEADROOM, 0, GFP_ATOMIC)) { + goto drop; } /* SKB "head" area always have tailroom for skb_shared_info */ frame_sz = skb_end_pointer(skb) - skb->head; frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - xdp_init_buff(&xdp, frame_sz, &rq->xdp_rxq); - xdp_prepare_buff(&xdp, skb->head, skb->mac_header, pktlen, true); + xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq); + xdp_prepare_buff(xdp, skb->head, skb_headroom(skb), + skb_headlen(skb), true); + + if (skb_is_nonlinear(skb)) { + skb_shinfo(skb)->xdp_frags_size = skb->data_len; + xdp_buff_set_frags_flag(xdp); + } else { + xdp_buff_clear_frags_flag(xdp); + } + *pskb = skb; + + return 0; +drop: + consume_skb(skb); + *pskb = NULL; + + return -ENOMEM; +} + +static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, + struct sk_buff *skb, + struct veth_xdp_tx_bq *bq, + struct veth_stats *stats) +{ + void *orig_data, *orig_data_end; + struct bpf_prog *xdp_prog; + struct xdp_buff xdp; + u32 act, metalen; + int off; + + skb_prepare_for_gro(skb); + + rcu_read_lock(); + xdp_prog = rcu_dereference(rq->xdp_prog); + if (unlikely(!xdp_prog)) { + rcu_read_unlock(); + goto out; + } + + __skb_push(skb, skb->data - skb_mac_header(skb)); + if (veth_convert_skb_to_xdp_buff(rq, &xdp, &skb)) + goto drop; orig_data = xdp.data; orig_data_end = xdp.data_end; @@ -771,7 +827,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, case XDP_PASS: break; case XDP_TX: - get_page(virt_to_page(xdp.data)); + veth_xdp_get(&xdp); consume_skb(skb); xdp.rxq->mem = rq->xdp_mem; if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) { @@ -783,7 +839,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, rcu_read_unlock(); goto xdp_xmit; case XDP_REDIRECT: - get_page(virt_to_page(xdp.data)); + veth_xdp_get(&xdp); consume_skb(skb); xdp.rxq->mem = rq->xdp_mem; if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { @@ -806,18 +862,27 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, rcu_read_unlock(); /* check if bpf_xdp_adjust_head was used */ - delta = orig_data - xdp.data; - off = mac_len + delta; + off = orig_data - xdp.data; if (off > 0) __skb_push(skb, off); else if (off < 0) __skb_pull(skb, -off); - skb->mac_header -= delta; + + skb_reset_mac_header(skb); /* check if bpf_xdp_adjust_tail was used */ off = xdp.data_end - orig_data_end; if (off != 0) __skb_put(skb, off); /* positive on grow, negative on shrink */ + + /* XDP frag metadata (e.g. nr_frags) are updated in eBPF helpers + * (e.g. bpf_xdp_adjust_tail), we need to update data_len here. + */ + if (xdp_buff_has_frags(&xdp)) + skb->data_len = skb_shinfo(skb)->xdp_frags_size; + else + skb->data_len = 0; + skb->protocol = eth_type_trans(skb, rq->dev); metalen = xdp.data - xdp.data_meta; @@ -833,7 +898,7 @@ xdp_drop: return NULL; err_xdp: rcu_read_unlock(); - page_frag_free(xdp.data); + xdp_return_buff(&xdp); xdp_xmit: return NULL; } @@ -855,7 +920,7 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, /* ndo_xdp_xmit */ struct xdp_frame *frame = veth_ptr_to_xdp(ptr); - stats->xdp_bytes += frame->len; + stats->xdp_bytes += xdp_get_frame_len(frame); frame = veth_xdp_rcv_one(rq, frame, bq, stats); if (frame) { /* XDP_PASS */ @@ -1463,9 +1528,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, goto err; } - max_mtu = PAGE_SIZE - VETH_XDP_HEADROOM - - peer->hard_header_len - - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + max_mtu = SKB_WITH_OVERHEAD(PAGE_SIZE - VETH_XDP_HEADROOM) - + peer->hard_header_len; + /* Allow increasing the max_mtu if the program supports + * XDP fragments. + */ + if (prog->aux->xdp_has_frags) + max_mtu += PAGE_SIZE * MAX_SKB_FRAGS; + if (peer->mtu > max_mtu) { NL_SET_ERR_MSG_MOD(extack, "Peer MTU is too large to set XDP"); err = -ERANGE; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 57cb6af3177a..bdb5298735ce 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -334,7 +334,15 @@ enum bpf_type_flag { /* MEM is in user address space. */ MEM_USER = BIT(3 + BPF_BASE_TYPE_BITS), - __BPF_TYPE_LAST_FLAG = MEM_USER, + /* MEM is a percpu memory. MEM_PERCPU tags PTR_TO_BTF_ID. When tagged + * with MEM_PERCPU, PTR_TO_BTF_ID _cannot_ be directly accessed. In + * order to drop this tag, it must be passed into bpf_per_cpu_ptr() + * or bpf_this_cpu_ptr(), which will return the pointer corresponding + * to the specified cpu. + */ + MEM_PERCPU = BIT(4 + BPF_BASE_TYPE_BITS), + + __BPF_TYPE_LAST_FLAG = MEM_PERCPU, }; /* Max number of base types. */ @@ -516,7 +524,6 @@ enum bpf_reg_type { */ PTR_TO_MEM, /* reg points to valid memory region */ PTR_TO_BUF, /* reg points to a read/write buffer */ - PTR_TO_PERCPU_BTF_ID, /* reg points to a percpu kernel variable */ PTR_TO_FUNC, /* reg points to a bpf program function */ __BPF_REG_TYPE_MAX, diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h index 37b3906af8b1..493e63258497 100644 --- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -154,16 +154,17 @@ void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem); struct bpf_local_storage_elem * bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, - bool charge_mem); + bool charge_mem, gfp_t gfp_flags); int bpf_local_storage_alloc(void *owner, struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *first_selem); + struct bpf_local_storage_elem *first_selem, + gfp_t gfp_flags); struct bpf_local_storage_data * bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, - void *value, u64 map_flags); + void *value, u64 map_flags, gfp_t gfp_flags); void bpf_local_storage_free_rcu(struct rcu_head *rcu); diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 48a91c51c015..3e24ad0c4b3c 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -140,3 +140,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp) #ifdef CONFIG_PERF_EVENTS BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf) #endif +BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 7a7be8c057f2..c1fc4af47f69 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -521,6 +521,10 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt); int check_ptr_off_reg(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno); +int check_func_arg_reg_off(struct bpf_verifier_env *env, + const struct bpf_reg_state *reg, int regno, + enum bpf_arg_type arg_type, + bool is_release_func); int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno); int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 3c4de9b6c6e3..babb1347148c 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -68,3 +68,28 @@ #define __nocfi __attribute__((__no_sanitize__("cfi"))) #define __cficanonical __attribute__((__cfi_canonical_jump_table__)) + +/* + * Turn individual warnings and errors on and off locally, depending + * on version. + */ +#define __diag_clang(version, severity, s) \ + __diag_clang_ ## version(__diag_clang_ ## severity s) + +/* Severity used in pragma directives */ +#define __diag_clang_ignore ignored +#define __diag_clang_warn warning +#define __diag_clang_error error + +#define __diag_str1(s) #s +#define __diag_str(s) __diag_str1(s) +#define __diag(s) _Pragma(__diag_str(clang diagnostic s)) + +#if CONFIG_CLANG_VERSION >= 110000 +#define __diag_clang_11(s) __diag(s) +#else +#define __diag_clang_11(s) +#endif + +#define __diag_ignore_all(option, comment) \ + __diag_clang(11, ignore, option) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index ccbbd31b3aae..d364c98a4a80 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -151,6 +151,9 @@ #define __diag_GCC_8(s) #endif +#define __diag_ignore_all(option, comment) \ + __diag_GCC(8, ignore, option) + /* * Prior to 9.1, -Wno-alloc-size-larger-than (and therefore the "alloc_size" * attribute) do not work, and must be disabled. diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 3f31ff400432..1bc760ba400c 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -4,6 +4,13 @@ #ifndef __ASSEMBLY__ +#if defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ + __has_attribute(btf_type_tag) +# define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value))) +#else +# define BTF_TYPE_TAG(value) /* nothing */ +#endif + #ifdef __CHECKER__ /* address spaces */ # define __kernel __attribute__((address_space(0))) @@ -31,14 +38,11 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { } # define __kernel # ifdef STRUCTLEAK_PLUGIN # define __user __attribute__((user)) -# elif defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ - __has_attribute(btf_type_tag) -# define __user __attribute__((btf_type_tag("user"))) # else -# define __user +# define __user BTF_TYPE_TAG(user) # endif # define __iomem -# define __percpu +# define __percpu BTF_TYPE_TAG(percpu) # define __rcu # define __chk_user_ptr(x) (void)0 # define __chk_io_ptr(x) (void)0 @@ -371,4 +375,8 @@ struct ftrace_likely_data { #define __diag_error(compiler, version, option, comment) \ __diag_ ## compiler(version, error, option) +#ifndef __diag_ignore_all +#define __diag_ignore_all(option, comment) +#endif + #endif /* __LINUX_COMPILER_TYPES_H */ diff --git a/include/linux/filter.h b/include/linux/filter.h index 9bf26307247f..ed0c0ff42ad5 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -566,6 +566,7 @@ struct bpf_prog { gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1, /* Is control block accessed? */ dst_needed:1, /* Do we need dst entry? */ + blinding_requested:1, /* needs constant blinding */ blinded:1, /* Was blinded */ is_func:1, /* program is a bpf function */ kprobe_override:1, /* Do we override a kprobe? */ @@ -573,7 +574,7 @@ struct bpf_prog { enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ call_get_func_ip:1, /* Do we call get_func_ip() */ - delivery_time_access:1; /* Accessed __sk_buff->delivery_time_type */ + tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h new file mode 100644 index 000000000000..1c2bde0ead73 --- /dev/null +++ b/include/linux/fprobe.h @@ -0,0 +1,105 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Simple ftrace probe wrapper */ +#ifndef _LINUX_FPROBE_H +#define _LINUX_FPROBE_H + +#include <linux/compiler.h> +#include <linux/ftrace.h> +#include <linux/rethook.h> + +/** + * struct fprobe - ftrace based probe. + * @ops: The ftrace_ops. + * @nmissed: The counter for missing events. + * @flags: The status flag. + * @rethook: The rethook data structure. (internal data) + * @entry_handler: The callback function for function entry. + * @exit_handler: The callback function for function exit. + */ +struct fprobe { +#ifdef CONFIG_FUNCTION_TRACER + /* + * If CONFIG_FUNCTION_TRACER is not set, CONFIG_FPROBE is disabled too. + * But user of fprobe may keep embedding the struct fprobe on their own + * code. To avoid build error, this will keep the fprobe data structure + * defined here, but remove ftrace_ops data structure. + */ + struct ftrace_ops ops; +#endif + unsigned long nmissed; + unsigned int flags; + struct rethook *rethook; + + void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); + void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); +}; + +/* This fprobe is soft-disabled. */ +#define FPROBE_FL_DISABLED 1 + +/* + * This fprobe handler will be shared with kprobes. + * This flag must be set before registering. + */ +#define FPROBE_FL_KPROBE_SHARED 2 + +static inline bool fprobe_disabled(struct fprobe *fp) +{ + return (fp) ? fp->flags & FPROBE_FL_DISABLED : false; +} + +static inline bool fprobe_shared_with_kprobes(struct fprobe *fp) +{ + return (fp) ? fp->flags & FPROBE_FL_KPROBE_SHARED : false; +} + +#ifdef CONFIG_FPROBE +int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter); +int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num); +int register_fprobe_syms(struct fprobe *fp, const char **syms, int num); +int unregister_fprobe(struct fprobe *fp); +#else +static inline int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter) +{ + return -EOPNOTSUPP; +} +static inline int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num) +{ + return -EOPNOTSUPP; +} +static inline int register_fprobe_syms(struct fprobe *fp, const char **syms, int num) +{ + return -EOPNOTSUPP; +} +static inline int unregister_fprobe(struct fprobe *fp) +{ + return -EOPNOTSUPP; +} +#endif + +/** + * disable_fprobe() - Disable fprobe + * @fp: The fprobe to be disabled. + * + * This will soft-disable @fp. Note that this doesn't remove the ftrace + * hooks from the function entry. + */ +static inline void disable_fprobe(struct fprobe *fp) +{ + if (fp) + fp->flags |= FPROBE_FL_DISABLED; +} + +/** + * enable_fprobe() - Enable fprobe + * @fp: The fprobe to be enabled. + * + * This will soft-enable @fp. + */ +static inline void enable_fprobe(struct fprobe *fp) +{ + if (fp) + fp->flags &= ~FPROBE_FL_DISABLED; +} + +#endif diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 9999e29187de..60847cbce0da 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -512,6 +512,8 @@ struct dyn_ftrace { int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip, int remove, int reset); +int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips, + unsigned int cnt, int remove, int reset); int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, int len, int reset); int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, @@ -802,6 +804,7 @@ static inline unsigned long ftrace_location(unsigned long ip) #define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; }) #define ftrace_set_early_filter(ops, buf, enable) do { } while (0) #define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; }) +#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; }) #define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; }) #define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; }) #define ftrace_free_filter(ops) do { } while (0) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 19b884353b15..5f1859836deb 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -427,6 +427,9 @@ static inline struct kprobe *kprobe_running(void) { return NULL; } +#define kprobe_busy_begin() do {} while (0) +#define kprobe_busy_end() do {} while (0) + static inline int register_kprobe(struct kprobe *p) { return -EOPNOTSUPP; diff --git a/include/linux/rethook.h b/include/linux/rethook.h new file mode 100644 index 000000000000..c8ac1e5afcd1 --- /dev/null +++ b/include/linux/rethook.h @@ -0,0 +1,100 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Return hooking with list-based shadow stack. + */ +#ifndef _LINUX_RETHOOK_H +#define _LINUX_RETHOOK_H + +#include <linux/compiler.h> +#include <linux/freelist.h> +#include <linux/kallsyms.h> +#include <linux/llist.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> + +struct rethook_node; + +typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *); + +/** + * struct rethook - The rethook management data structure. + * @data: The user-defined data storage. + * @handler: The user-defined return hook handler. + * @pool: The pool of struct rethook_node. + * @ref: The reference counter. + * @rcu: The rcu_head for deferred freeing. + * + * Don't embed to another data structure, because this is a self-destructive + * data structure when all rethook_node are freed. + */ +struct rethook { + void *data; + rethook_handler_t handler; + struct freelist_head pool; + refcount_t ref; + struct rcu_head rcu; +}; + +/** + * struct rethook_node - The rethook shadow-stack entry node. + * @freelist: The freelist, linked to struct rethook::pool. + * @rcu: The rcu_head for deferred freeing. + * @llist: The llist, linked to a struct task_struct::rethooks. + * @rethook: The pointer to the struct rethook. + * @ret_addr: The storage for the real return address. + * @frame: The storage for the frame pointer. + * + * You can embed this to your extended data structure to store any data + * on each entry of the shadow stack. + */ +struct rethook_node { + union { + struct freelist_node freelist; + struct rcu_head rcu; + }; + struct llist_node llist; + struct rethook *rethook; + unsigned long ret_addr; + unsigned long frame; +}; + +struct rethook *rethook_alloc(void *data, rethook_handler_t handler); +void rethook_free(struct rethook *rh); +void rethook_add_node(struct rethook *rh, struct rethook_node *node); +struct rethook_node *rethook_try_get(struct rethook *rh); +void rethook_recycle(struct rethook_node *node); +void rethook_hook(struct rethook_node *node, struct pt_regs *regs, bool mcount); +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame, + struct llist_node **cur); + +/* Arch dependent code must implement arch_* and trampoline code */ +void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs, bool mcount); +void arch_rethook_trampoline(void); + +/** + * is_rethook_trampoline() - Check whether the address is rethook trampoline + * @addr: The address to be checked + * + * Return true if the @addr is the rethook trampoline address. + */ +static inline bool is_rethook_trampoline(unsigned long addr) +{ + return addr == (unsigned long)dereference_symbol_descriptor(arch_rethook_trampoline); +} + +/* If the architecture needs to fixup the return address, implement it. */ +void arch_rethook_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr); + +/* Generic trampoline handler, arch code must prepare asm stub */ +unsigned long rethook_trampoline_handler(struct pt_regs *regs, + unsigned long frame); + +#ifdef CONFIG_RETHOOK +void rethook_flush_task(struct task_struct *tk); +#else +#define rethook_flush_task(tsk) do { } while (0) +#endif + +#endif + diff --git a/include/linux/sched.h b/include/linux/sched.h index 75ba8aa60248..7034f53404e3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1481,6 +1481,9 @@ struct task_struct { #ifdef CONFIG_KRETPROBES struct llist_head kretprobe_instances; #endif +#ifdef CONFIG_RETHOOK + struct llist_head rethooks; +#endif #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH /* diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 26538ceb4b01..3a30cae8b0a5 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -992,10 +992,10 @@ struct sk_buff { __u8 csum_complete_sw:1; __u8 csum_level:2; __u8 dst_pending_confirm:1; - __u8 mono_delivery_time:1; + __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ #ifdef CONFIG_NET_CLS_ACT __u8 tc_skip_classify:1; - __u8 tc_at_ingress:1; + __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ #endif #ifdef CONFIG_IPV6_NDISC_NODETYPE __u8 ndisc_nodetype:2; @@ -1094,7 +1094,9 @@ struct sk_buff { #endif #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset) -/* if you move pkt_vlan_present around you also must adapt these constants */ +/* if you move pkt_vlan_present, tc_at_ingress, or mono_delivery_time + * around, you also must adapt these constants. + */ #ifdef __BIG_ENDIAN_BITFIELD #define PKT_VLAN_PRESENT_BIT 7 #define TC_AT_INGRESS_MASK (1 << 0) @@ -1105,8 +1107,6 @@ struct sk_buff { #define SKB_MONO_DELIVERY_TIME_MASK (1 << 5) #endif #define PKT_VLAN_PRESENT_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) -#define TC_AT_INGRESS_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) -#define SKB_MONO_DELIVERY_TIME_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) #ifdef __KERNEL__ /* diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index fdb5375f0562..c5a2d6f50f25 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -304,21 +304,16 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb) kfree_skb(skb); } -static inline void drop_sk_msg(struct sk_psock *psock, struct sk_msg *msg) -{ - if (msg->skb) - sock_drop(psock->sk, msg->skb); - kfree(msg); -} - static inline void sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { spin_lock_bh(&psock->ingress_lock); if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) list_add_tail(&msg->list, &psock->ingress_msg); - else - drop_sk_msg(psock, msg); + else { + sk_msg_free(psock->sk, msg); + kfree(msg); + } spin_unlock_bh(&psock->ingress_lock); } diff --git a/include/linux/sort.h b/include/linux/sort.h index b5898725fe9d..e163287ac6c1 100644 --- a/include/linux/sort.h +++ b/include/linux/sort.h @@ -6,7 +6,7 @@ void sort_r(void *base, size_t num, size_t size, cmp_r_func_t cmp_func, - swap_func_t swap_func, + swap_r_func_t swap_func, const void *priv); void sort(void *base, size_t num, size_t size, diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index dcea51fb60e2..8f0e9e7cb493 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -15,6 +15,7 @@ struct array_buffer; struct tracer; struct dentry; struct bpf_prog; +union bpf_attr; const char *trace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, @@ -738,6 +739,7 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp); int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr); +int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); #else static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { @@ -779,6 +781,11 @@ static inline int bpf_get_perf_event_info(const struct perf_event *event, { return -EOPNOTSUPP; } +static inline int +bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -EOPNOTSUPP; +} #endif enum { diff --git a/include/linux/types.h b/include/linux/types.h index ac825ad90e44..ea8cf60a8a79 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -226,6 +226,7 @@ struct callback_head { typedef void (*rcu_callback_t)(struct rcu_head *head); typedef void (*call_rcu_func_t)(struct rcu_head *head, rcu_callback_t func); +typedef void (*swap_r_func_t)(void *a, void *b, int size, const void *priv); typedef void (*swap_func_t)(void *a, void *b, int size); typedef int (*cmp_r_func_t)(const void *a, const void *b, const void *priv); diff --git a/include/net/xdp.h b/include/net/xdp.h index b7721c3e4d1f..04c852c7a77f 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -343,6 +343,20 @@ out: __xdp_release_frame(xdpf->data, mem); } +static __always_inline unsigned int xdp_get_frame_len(struct xdp_frame *xdpf) +{ + struct skb_shared_info *sinfo; + unsigned int len = xdpf->len; + + if (likely(!xdp_frame_has_frags(xdpf))) + goto out; + + sinfo = xdp_get_shared_info_from_frame(xdpf); + len += sinfo->xdp_frags_size; +out: + return len; +} + int __xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, struct net_device *dev, u32 queue_index, unsigned int napi_id, u32 frag_size); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4eebea830613..d14b10b85e51 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -997,6 +997,7 @@ enum bpf_attach_type { BPF_SK_REUSEPORT_SELECT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, BPF_PERF_EVENT, + BPF_TRACE_KPROBE_MULTI, __MAX_BPF_ATTACH_TYPE }; @@ -1011,6 +1012,7 @@ enum bpf_link_type { BPF_LINK_TYPE_NETNS = 5, BPF_LINK_TYPE_XDP = 6, BPF_LINK_TYPE_PERF_EVENT = 7, + BPF_LINK_TYPE_KPROBE_MULTI = 8, MAX_BPF_LINK_TYPE, }; @@ -1118,6 +1120,11 @@ enum bpf_link_type { */ #define BPF_F_XDP_HAS_FRAGS (1U << 5) +/* link_create.kprobe_multi.flags used in LINK_CREATE command for + * BPF_TRACE_KPROBE_MULTI attach type to create return probe. + */ +#define BPF_F_KPROBE_MULTI_RETURN (1U << 0) + /* When BPF ldimm64's insn[0].src_reg != 0 then this can have * the following extensions: * @@ -1232,6 +1239,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, XDP frames will be transmitted after processing */ +#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type { @@ -1393,6 +1402,7 @@ union bpf_attr { __aligned_u64 ctx_out; __u32 flags; __u32 cpu; + __u32 batch_size; } test; struct { /* anonymous struct used by BPF_*_GET_*_ID */ @@ -1472,6 +1482,13 @@ union bpf_attr { */ __u64 bpf_cookie; } perf_event; + struct { + __u32 flags; + __u32 cnt; + __aligned_u64 syms; + __aligned_u64 addrs; + __aligned_u64 cookies; + } kprobe_multi; }; } link_create; @@ -2299,8 +2316,8 @@ union bpf_attr { * Return * The return value depends on the result of the test, and can be: * - * * 0, if current task belongs to the cgroup2. - * * 1, if current task does not belong to the cgroup2. + * * 1, if current task belongs to the cgroup2. + * * 0, if current task does not belong to the cgroup2. * * A negative error code, if an error occurred. * * long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags) @@ -2992,8 +3009,8 @@ union bpf_attr { * * # sysctl kernel.perf_event_max_stack=<new value> * Return - * A non-negative value equal to or less than *size* on success, - * or a negative error in case of failure. + * The non-negative copied *buf* length equal to or less than + * *size* on success, or a negative error in case of failure. * * long bpf_skb_load_bytes_relative(const void *skb, u32 offset, void *to, u32 len, u32 start_header) * Description @@ -4299,8 +4316,8 @@ union bpf_attr { * * # sysctl kernel.perf_event_max_stack=<new value> * Return - * A non-negative value equal to or less than *size* on success, - * or a negative error in case of failure. + * The non-negative copied *buf* length equal to or less than + * *size* on success, or a negative error in case of failure. * * long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags) * Description @@ -5087,23 +5104,22 @@ union bpf_attr { * 0 on success, or a negative error in case of failure. On error * *dst* buffer is zeroed out. * - * long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type) + * long bpf_skb_set_tstamp(struct sk_buff *skb, u64 tstamp, u32 tstamp_type) * Description - * Set a *dtime* (delivery time) to the __sk_buff->tstamp and also - * change the __sk_buff->delivery_time_type to *dtime_type*. - * - * When setting a delivery time (non zero *dtime*) to - * __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type* - * is supported. It is the only delivery_time_type that will be - * kept after bpf_redirect_*(). + * Change the __sk_buff->tstamp_type to *tstamp_type* + * and set *tstamp* to the __sk_buff->tstamp together. * - * If there is no need to change the __sk_buff->delivery_time_type, - * the delivery time can be directly written to __sk_buff->tstamp + * If there is no need to change the __sk_buff->tstamp_type, + * the tstamp value can be directly written to __sk_buff->tstamp * instead. * - * *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE - * can be used to clear any delivery time stored in - * __sk_buff->tstamp. + * BPF_SKB_TSTAMP_DELIVERY_MONO is the only tstamp that + * will be kept during bpf_redirect_*(). A non zero + * *tstamp* must be used with the BPF_SKB_TSTAMP_DELIVERY_MONO + * *tstamp_type*. + * + * A BPF_SKB_TSTAMP_UNSPEC *tstamp_type* can only be used + * with a zero *tstamp*. * * Only IPv4 and IPv6 skb->protocol are supported. * @@ -5116,7 +5132,17 @@ union bpf_attr { * Return * 0 on success. * **-EINVAL** for invalid input - * **-EOPNOTSUPP** for unsupported delivery_time_type and protocol + * **-EOPNOTSUPP** for unsupported protocol + * + * long bpf_ima_file_hash(struct file *file, void *dst, u32 size) + * Description + * Returns a calculated IMA hash of the *file*. + * If the hash is larger than *size*, then only *size* + * bytes will be copied to *dst* + * Return + * The **hash_algo** is returned on success, + * **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if + * invalid arguments are passed. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5311,7 +5337,8 @@ union bpf_attr { FN(xdp_load_bytes), \ FN(xdp_store_bytes), \ FN(copy_from_user_task), \ - FN(skb_set_delivery_time), \ + FN(skb_set_tstamp), \ + FN(ima_file_hash), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -5502,9 +5529,12 @@ union { \ } __attribute__((aligned(8))) enum { - BPF_SKB_DELIVERY_TIME_NONE, - BPF_SKB_DELIVERY_TIME_UNSPEC, - BPF_SKB_DELIVERY_TIME_MONO, + BPF_SKB_TSTAMP_UNSPEC, + BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */ + /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle, + * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC + * and try to deduce it by ingress, egress or skb->sk->sk_clockid. + */ }; /* user accessible mirror of in-kernel sk_buff. @@ -5547,7 +5577,7 @@ struct __sk_buff { __u32 gso_segs; __bpf_md_ptr(struct bpf_sock *, sk); __u32 gso_size; - __u8 delivery_time_type; + __u8 tstamp_type; __u32 :24; /* Padding, future use. */ __u64 hwtstamp; }; diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig index c3cf0b86eeb2..d56ee177d5f8 100644 --- a/kernel/bpf/Kconfig +++ b/kernel/bpf/Kconfig @@ -30,6 +30,7 @@ config BPF_SYSCALL select TASKS_TRACE_RCU select BINARY_PRINTF select NET_SOCK_MSG if NET + select PAGE_POOL if NET default n help Enable the bpf() system call that allows to manipulate BPF programs diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c index e29d9e3d853e..96be8d518885 100644 --- a/kernel/bpf/bpf_inode_storage.c +++ b/kernel/bpf/bpf_inode_storage.c @@ -136,7 +136,7 @@ static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key, sdata = bpf_local_storage_update(f->f_inode, (struct bpf_local_storage_map *)map, - value, map_flags); + value, map_flags, GFP_ATOMIC); fput(f); return PTR_ERR_OR_ZERO(sdata); } @@ -169,8 +169,9 @@ static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key) return err; } -BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, - void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, + void *, value, u64, flags, gfp_t, gfp_flags) { struct bpf_local_storage_data *sdata; @@ -196,7 +197,7 @@ BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) { sdata = bpf_local_storage_update( inode, (struct bpf_local_storage_map *)map, value, - BPF_NOEXIST); + BPF_NOEXIST, gfp_flags); return IS_ERR(sdata) ? (unsigned long)NULL : (unsigned long)sdata->data; } diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 092a1ac772d7..01aa2b51ec4d 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) struct bpf_local_storage_elem * bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, - void *value, bool charge_mem) + void *value, bool charge_mem, gfp_t gfp_flags) { struct bpf_local_storage_elem *selem; @@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, return NULL; selem = bpf_map_kzalloc(&smap->map, smap->elem_size, - GFP_ATOMIC | __GFP_NOWARN); + gfp_flags | __GFP_NOWARN); if (selem) { if (value) memcpy(SDATA(selem)->data, value, smap->map.value_size); @@ -282,7 +282,8 @@ static int check_flags(const struct bpf_local_storage_data *old_sdata, int bpf_local_storage_alloc(void *owner, struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *first_selem) + struct bpf_local_storage_elem *first_selem, + gfp_t gfp_flags) { struct bpf_local_storage *prev_storage, *storage; struct bpf_local_storage **owner_storage_ptr; @@ -293,7 +294,7 @@ int bpf_local_storage_alloc(void *owner, return err; storage = bpf_map_kzalloc(&smap->map, sizeof(*storage), - GFP_ATOMIC | __GFP_NOWARN); + gfp_flags | __GFP_NOWARN); if (!storage) { err = -ENOMEM; goto uncharge; @@ -350,10 +351,10 @@ uncharge: */ struct bpf_local_storage_data * bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, - void *value, u64 map_flags) + void *value, u64 map_flags, gfp_t gfp_flags) { struct bpf_local_storage_data *old_sdata = NULL; - struct bpf_local_storage_elem *selem; + struct bpf_local_storage_elem *selem = NULL; struct bpf_local_storage *local_storage; unsigned long flags; int err; @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, !map_value_has_spin_lock(&smap->map))) return ERR_PTR(-EINVAL); + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST) + return ERR_PTR(-EINVAL); + local_storage = rcu_dereference_check(*owner_storage(smap, owner), bpf_rcu_lock_held()); if (!local_storage || hlist_empty(&local_storage->list)) { @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, if (err) return ERR_PTR(err); - selem = bpf_selem_alloc(smap, owner, value, true); + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); if (!selem) return ERR_PTR(-ENOMEM); - err = bpf_local_storage_alloc(owner, smap, selem); + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags); if (err) { kfree(selem); mem_uncharge(smap, owner, smap->elem_size); @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, } } + if (gfp_flags == GFP_KERNEL) { + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); + if (!selem) + return ERR_PTR(-ENOMEM); + } + raw_spin_lock_irqsave(&local_storage->lock, flags); /* Recheck local_storage->list under local_storage->lock */ @@ -429,19 +439,21 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, goto unlock; } - /* local_storage->lock is held. Hence, we are sure - * we can unlink and uncharge the old_sdata successfully - * later. Hence, instead of charging the new selem now - * and then uncharge the old selem later (which may cause - * a potential but unnecessary charge failure), avoid taking - * a charge at all here (the "!old_sdata" check) and the - * old_sdata will not be uncharged later during - * bpf_selem_unlink_storage_nolock(). - */ - selem = bpf_selem_alloc(smap, owner, value, !old_sdata); - if (!selem) { - err = -ENOMEM; - goto unlock_err; + if (gfp_flags != GFP_KERNEL) { + /* local_storage->lock is held. Hence, we are sure + * we can unlink and uncharge the old_sdata successfully + * later. Hence, instead of charging the new selem now + * and then uncharge the old selem later (which may cause + * a potential but unnecessary charge failure), avoid taking + * a charge at all here (the "!old_sdata" check) and the + * old_sdata will not be uncharged later during + * bpf_selem_unlink_storage_nolock(). + */ + selem = bpf_selem_alloc(smap, owner, value, !old_sdata, gfp_flags); + if (!selem) { + err = -ENOMEM; + goto unlock_err; + } } /* First, link the new selem to the map */ @@ -463,6 +475,10 @@ unlock: unlock_err: raw_spin_unlock_irqrestore(&local_storage->lock, flags); + if (selem) { + mem_uncharge(smap, owner, smap->elem_size); + kfree(selem); + } return ERR_PTR(err); } diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c index 9e4ecc990647..064eccba641d 100644 --- a/kernel/bpf/bpf_lsm.c +++ b/kernel/bpf/bpf_lsm.c @@ -99,6 +99,24 @@ static const struct bpf_func_proto bpf_ima_inode_hash_proto = { .allowed = bpf_ima_inode_hash_allowed, }; +BPF_CALL_3(bpf_ima_file_hash, struct file *, file, void *, dst, u32, size) +{ + return ima_file_hash(file, dst, size); +} + +BTF_ID_LIST_SINGLE(bpf_ima_file_hash_btf_ids, struct, file) + +static const struct bpf_func_proto bpf_ima_file_hash_proto = { + .func = bpf_ima_file_hash, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &bpf_ima_file_hash_btf_ids[0], + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE, + .allowed = bpf_ima_inode_hash_allowed, +}; + static const struct bpf_func_proto * bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { @@ -121,6 +139,8 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_bprm_opts_set_proto; case BPF_FUNC_ima_inode_hash: return prog->aux->sleepable ? &bpf_ima_inode_hash_proto : NULL; + case BPF_FUNC_ima_file_hash: + return prog->aux->sleepable ? &bpf_ima_file_hash_proto : NULL; default: return tracing_prog_func_proto(func_id, prog); } @@ -167,6 +187,7 @@ BTF_ID(func, bpf_lsm_inode_setxattr) BTF_ID(func, bpf_lsm_inode_symlink) BTF_ID(func, bpf_lsm_inode_unlink) BTF_ID(func, bpf_lsm_kernel_module_request) +BTF_ID(func, bpf_lsm_kernel_read_file) BTF_ID(func, bpf_lsm_kernfs_init_security) #ifdef CONFIG_KEYS diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c index 5da7bed0f5f6..6638a0ecc3d2 100644 --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -174,7 +174,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key, bpf_task_storage_lock(); sdata = bpf_local_storage_update( - task, (struct bpf_local_storage_map *)map, value, map_flags); + task, (struct bpf_local_storage_map *)map, value, map_flags, + GFP_ATOMIC); bpf_task_storage_unlock(); err = PTR_ERR_OR_ZERO(sdata); @@ -226,8 +227,9 @@ out: return err; } -BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, - task, void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, + task, void *, value, u64, flags, gfp_t, gfp_flags) { struct bpf_local_storage_data *sdata; @@ -250,7 +252,7 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) sdata = bpf_local_storage_update( task, (struct bpf_local_storage_map *)map, value, - BPF_NOEXIST); + BPF_NOEXIST, gfp_flags); unlock: bpf_task_storage_unlock(); diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index b472cf0c8fdb..24788ce564a0 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -525,6 +525,50 @@ s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind) return -ENOENT; } +static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p) +{ + struct btf *btf; + s32 ret; + int id; + + btf = bpf_get_btf_vmlinux(); + if (IS_ERR(btf)) + return PTR_ERR(btf); + if (!btf) + return -EINVAL; + + ret = btf_find_by_name_kind(btf, name, kind); + /* ret is never zero, since btf_find_by_name_kind returns + * positive btf_id or negative error. + */ + if (ret > 0) { + btf_get(btf); + *btf_p = btf; + return ret; + } + + /* If name is not found in vmlinux's BTF then search in module's BTFs */ + spin_lock_bh(&btf_idr_lock); + idr_for_each_entry(&btf_idr, btf, id) { + if (!btf_is_module(btf)) + continue; + /* linear search could be slow hence unlock/lock + * the IDR to avoiding holding it for too long + */ + btf_get(btf); + spin_unlock_bh(&btf_idr_lock); + ret = btf_find_by_name_kind(btf, name, kind); + if (ret > 0) { + *btf_p = btf; + return ret; + } + spin_lock_bh(&btf_idr_lock); + btf_put(btf); + } + spin_unlock_bh(&btf_idr_lock); + return ret; +} + const struct btf_type *btf_type_skip_modifiers(const struct btf *btf, u32 id, u32 *res_id) { @@ -4438,8 +4482,7 @@ static int btf_parse_hdr(struct btf_verifier_env *env) btf = env->btf; btf_data_size = btf->data_size; - if (btf_data_size < - offsetof(struct btf_header, hdr_len) + sizeof(hdr->hdr_len)) { + if (btf_data_size < offsetofend(struct btf_header, hdr_len)) { btf_verifier_log(env, "hdr_len not found"); return -EINVAL; } @@ -5057,6 +5100,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, tag_value = __btf_name_by_offset(btf, t->name_off); if (strcmp(tag_value, "user") == 0) info->reg_type |= MEM_USER; + if (strcmp(tag_value, "percpu") == 0) + info->reg_type |= MEM_PERCPU; } /* skip modifiers */ @@ -5285,12 +5330,16 @@ error: return -EACCES; } - /* check __user tag */ + /* check type tag */ t = btf_type_by_id(btf, mtype->type); if (btf_type_is_type_tag(t)) { tag_value = __btf_name_by_offset(btf, t->name_off); + /* check __user tag */ if (strcmp(tag_value, "user") == 0) tmp_flag = MEM_USER; + /* check __percpu tag */ + if (strcmp(tag_value, "percpu") == 0) + tmp_flag = MEM_PERCPU; } stype = btf_type_skip_modifiers(btf, mtype->type, &id); @@ -5726,7 +5775,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, const char *func_name, *ref_tname; const struct btf_type *t, *ref_t; const struct btf_param *args; - int ref_regno = 0; + int ref_regno = 0, ret; bool rel = false; t = btf_type_by_id(btf, func_id); @@ -5753,6 +5802,10 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, return -EINVAL; } + /* Only kfunc can be release func */ + if (is_kfunc) + rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), + BTF_KFUNC_TYPE_RELEASE, func_id); /* check that BTF function arguments match actual types that the * verifier sees. */ @@ -5776,6 +5829,11 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id); ref_tname = btf_name_by_offset(btf, ref_t->name_off); + + ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel); + if (ret < 0) + return ret; + if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) { /* If function expects ctx type in BTF check that caller @@ -5787,8 +5845,6 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, i, btf_type_str(t)); return -EINVAL; } - if (check_ptr_off_reg(env, reg, regno)) - return -EINVAL; } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) { const struct btf_type *reg_ref_t; @@ -5806,7 +5862,11 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, if (reg->type == PTR_TO_BTF_ID) { reg_btf = reg->btf; reg_ref_id = reg->btf_id; - /* Ensure only one argument is referenced PTR_TO_BTF_ID */ + /* Ensure only one argument is referenced + * PTR_TO_BTF_ID, check_func_arg_reg_off relies + * on only one referenced register being allowed + * for kfuncs. + */ if (reg->ref_obj_id) { if (ref_obj_id) { bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n", @@ -5888,18 +5948,15 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, /* Either both are set, or neither */ WARN_ON_ONCE((ref_obj_id && !ref_regno) || (!ref_obj_id && ref_regno)); - if (is_kfunc) { - rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), - BTF_KFUNC_TYPE_RELEASE, func_id); - /* We already made sure ref_obj_id is set only for one argument */ - if (rel && !ref_obj_id) { - bpf_log(log, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n", - func_name); - return -EINVAL; - } - /* Allow (!rel && ref_obj_id), so that passing such referenced PTR_TO_BTF_ID to - * other kfuncs works - */ + /* We already made sure ref_obj_id is set only for one argument. We do + * allow (!rel && ref_obj_id), so that passing such referenced + * PTR_TO_BTF_ID to other kfuncs works. Note that rel is only true when + * is_kfunc is true. + */ + if (rel && !ref_obj_id) { + bpf_log(log, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n", + func_name); + return -EINVAL; } /* returns argument register number > 0 in case of reference release kfunc */ return rel ? ref_regno : 0; @@ -6516,20 +6573,23 @@ struct module *btf_try_get_module(const struct btf *btf) return res; } -/* Returns struct btf corresponding to the struct module - * - * This function can return NULL or ERR_PTR. Note that caller must - * release reference for struct btf iff btf_is_module is true. +/* Returns struct btf corresponding to the struct module. + * This function can return NULL or ERR_PTR. */ static struct btf *btf_get_module_btf(const struct module *module) { - struct btf *btf = NULL; #ifdef CONFIG_DEBUG_INFO_BTF_MODULES struct btf_module *btf_mod, *tmp; #endif + struct btf *btf = NULL; + + if (!module) { + btf = bpf_get_btf_vmlinux(); + if (!IS_ERR_OR_NULL(btf)) + btf_get(btf); + return btf; + } - if (!module) - return bpf_get_btf_vmlinux(); #ifdef CONFIG_DEBUG_INFO_BTF_MODULES mutex_lock(&btf_module_mutex); list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) { @@ -6548,7 +6608,8 @@ static struct btf *btf_get_module_btf(const struct module *module) BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int, flags) { - struct btf *btf; + struct btf *btf = NULL; + int btf_obj_fd = 0; long ret; if (flags) @@ -6557,44 +6618,17 @@ BPF_CALL_4(bpf_btf_find_by_name_kind, char *, name, int, name_sz, u32, kind, int if (name_sz <= 1 || name[name_sz - 1]) return -EINVAL; - btf = bpf_get_btf_vmlinux(); - if (IS_ERR(btf)) - return PTR_ERR(btf); - - ret = btf_find_by_name_kind(btf, name, kind); - /* ret is never zero, since btf_find_by_name_kind returns - * positive btf_id or negative error. - */ - if (ret < 0) { - struct btf *mod_btf; - int id; - - /* If name is not found in vmlinux's BTF then search in module's BTFs */ - spin_lock_bh(&btf_idr_lock); - idr_for_each_entry(&btf_idr, mod_btf, id) { - if (!btf_is_module(mod_btf)) - continue; - /* linear search could be slow hence unlock/lock - * the IDR to avoiding holding it for too long - */ - btf_get(mod_btf); - spin_unlock_bh(&btf_idr_lock); - ret = btf_find_by_name_kind(mod_btf, name, kind); - if (ret > 0) { - int btf_obj_fd; - - btf_obj_fd = __btf_new_fd(mod_btf); - if (btf_obj_fd < 0) { - btf_put(mod_btf); - return btf_obj_fd; - } - return ret | (((u64)btf_obj_fd) << 32); - } - spin_lock_bh(&btf_idr_lock); - btf_put(mod_btf); + ret = bpf_find_btf_id(name, kind, &btf); + if (ret > 0 && btf_is_module(btf)) { + btf_obj_fd = __btf_new_fd(btf); + if (btf_obj_fd < 0) { + btf_put(btf); + return btf_obj_fd; } - spin_unlock_bh(&btf_idr_lock); + return ret | (((u64)btf_obj_fd) << 32); } + if (ret > 0) + btf_put(btf); return ret; } @@ -6793,9 +6827,7 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, hook = bpf_prog_type_to_kfunc_hook(prog_type); ret = btf_populate_kfunc_set(btf, hook, kset); - /* reference is only taken for module BTF */ - if (btf_is_module(btf)) - btf_put(btf); + btf_put(btf); return ret; } EXPORT_SYMBOL_GPL(register_btf_kfunc_id_set); @@ -7149,6 +7181,8 @@ bpf_core_find_cands(struct bpf_core_ctx *ctx, u32 local_type_id) main_btf = bpf_get_btf_vmlinux(); if (IS_ERR(main_btf)) return ERR_CAST(main_btf); + if (!main_btf) + return ERR_PTR(-EINVAL); local_type = btf_type_by_id(local_btf, local_type_id); if (!local_type) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ab630f773ec1..13e9dbeeedf3 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -33,6 +33,7 @@ #include <linux/extable.h> #include <linux/log2.h> #include <linux/bpf_verifier.h> +#include <linux/nodemask.h> #include <asm/barrier.h> #include <asm/unaligned.h> @@ -105,6 +106,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag fp->aux = aux; fp->aux->prog = fp; fp->jit_requested = ebpf_jit_enabled(); + fp->blinding_requested = bpf_jit_blinding_enabled(fp); INIT_LIST_HEAD_RCU(&fp->aux->ksym.lnode); mutex_init(&fp->aux->used_maps_mutex); @@ -814,15 +816,9 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog, * allocator. The prog_pack allocator uses HPAGE_PMD_SIZE page (2MB on x86) * to host BPF programs. */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define BPF_PROG_PACK_SIZE HPAGE_PMD_SIZE -#else -#define BPF_PROG_PACK_SIZE PAGE_SIZE -#endif #define BPF_PROG_CHUNK_SHIFT 6 #define BPF_PROG_CHUNK_SIZE (1 << BPF_PROG_CHUNK_SHIFT) #define BPF_PROG_CHUNK_MASK (~(BPF_PROG_CHUNK_SIZE - 1)) -#define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE) struct bpf_prog_pack { struct list_head list; @@ -830,30 +826,72 @@ struct bpf_prog_pack { unsigned long bitmap[]; }; -#define BPF_PROG_MAX_PACK_PROG_SIZE BPF_PROG_PACK_SIZE #define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE) +static size_t bpf_prog_pack_size = -1; +static size_t bpf_prog_pack_mask = -1; + +static int bpf_prog_chunk_count(void) +{ + WARN_ON_ONCE(bpf_prog_pack_size == -1); + return bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE; +} + static DEFINE_MUTEX(pack_mutex); static LIST_HEAD(pack_list); +/* PMD_SIZE is not available in some special config, e.g. ARCH=arm with + * CONFIG_MMU=n. Use PAGE_SIZE in these cases. + */ +#ifdef PMD_SIZE +#define BPF_HPAGE_SIZE PMD_SIZE +#define BPF_HPAGE_MASK PMD_MASK +#else +#define BPF_HPAGE_SIZE PAGE_SIZE +#define BPF_HPAGE_MASK PAGE_MASK +#endif + +static size_t select_bpf_prog_pack_size(void) +{ + size_t size; + void *ptr; + + size = BPF_HPAGE_SIZE * num_online_nodes(); + ptr = module_alloc(size); + + /* Test whether we can get huge pages. If not just use PAGE_SIZE + * packs. + */ + if (!ptr || !is_vm_area_hugepages(ptr)) { + size = PAGE_SIZE; + bpf_prog_pack_mask = PAGE_MASK; + } else { + bpf_prog_pack_mask = BPF_HPAGE_MASK; + } + + vfree(ptr); + return size; +} + static struct bpf_prog_pack *alloc_new_pack(void) { struct bpf_prog_pack *pack; - pack = kzalloc(sizeof(*pack) + BITS_TO_BYTES(BPF_PROG_CHUNK_COUNT), GFP_KERNEL); + pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(bpf_prog_chunk_count())), + GFP_KERNEL); if (!pack) return NULL; - pack->ptr = module_alloc(BPF_PROG_PACK_SIZE); + pack->ptr = module_alloc(bpf_prog_pack_size); if (!pack->ptr) { kfree(pack); return NULL; } - bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE); + bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE); list_add_tail(&pack->list, &pack_list); set_vm_flush_reset_perms(pack->ptr); - set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); - set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); + set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); + set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); return pack; } @@ -864,7 +902,11 @@ static void *bpf_prog_pack_alloc(u32 size) unsigned long pos; void *ptr = NULL; - if (size > BPF_PROG_MAX_PACK_PROG_SIZE) { + mutex_lock(&pack_mutex); + if (bpf_prog_pack_size == -1) + bpf_prog_pack_size = select_bpf_prog_pack_size(); + + if (size > bpf_prog_pack_size) { size = round_up(size, PAGE_SIZE); ptr = module_alloc(size); if (ptr) { @@ -872,13 +914,12 @@ static void *bpf_prog_pack_alloc(u32 size) set_memory_ro((unsigned long)ptr, size / PAGE_SIZE); set_memory_x((unsigned long)ptr, size / PAGE_SIZE); } - return ptr; + goto out; } - mutex_lock(&pack_mutex); list_for_each_entry(pack, &pack_list, list) { - pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, + pos = bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, nbits, 0); - if (pos < BPF_PROG_CHUNK_COUNT) + if (pos < bpf_prog_chunk_count()) goto found_free_area; } @@ -904,13 +945,13 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr) unsigned long pos; void *pack_ptr; - if (hdr->size > BPF_PROG_MAX_PACK_PROG_SIZE) { + mutex_lock(&pack_mutex); + if (hdr->size > bpf_prog_pack_size) { module_memfree(hdr); - return; + goto out; } - pack_ptr = (void *)((unsigned long)hdr & ~(BPF_PROG_PACK_SIZE - 1)); - mutex_lock(&pack_mutex); + pack_ptr = (void *)((unsigned long)hdr & bpf_prog_pack_mask); list_for_each_entry(tmp, &pack_list, list) { if (tmp->ptr == pack_ptr) { @@ -926,8 +967,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr) pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT; bitmap_clear(pack->bitmap, pos, nbits); - if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, - BPF_PROG_CHUNK_COUNT, 0) == 0) { + if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, + bpf_prog_chunk_count(), 0) == 0) { list_del(&pack->list); module_memfree(pack->ptr); kfree(pack); @@ -1382,7 +1423,7 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog) struct bpf_insn *insn; int i, rewritten; - if (!bpf_jit_blinding_enabled(prog) || prog->blinded) + if (!prog->blinding_requested || prog->blinded) return prog; clone = bpf_prog_clone_create(prog, GFP_USER); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index ae64110a98b5..315053ef6a75 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -225,13 +225,8 @@ BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size) if (unlikely(!task)) goto err_clear; - strncpy(buf, task->comm, size); - - /* Verifier guarantees that size > 0. For task->comm exceeding - * size, guarantee that buf is %NUL-terminated. Unconditionally - * done here to save the size test. - */ - buf[size - 1] = 0; + /* Verifier guarantees that size > 0 */ + strscpy(buf, task->comm, size); return 0; err_clear: memset(buf, 0, size); diff --git a/kernel/bpf/preload/Makefile b/kernel/bpf/preload/Makefile index 167534e3b0b4..20f89cc0a0a6 100644 --- a/kernel/bpf/preload/Makefile +++ b/kernel/bpf/preload/Makefile @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 -LIBBPF_SRCS = $(srctree)/tools/lib/bpf/ -LIBBPF_INCLUDE = $(LIBBPF_SRCS)/.. +LIBBPF_INCLUDE = $(srctree)/tools/lib obj-$(CONFIG_BPF_PRELOAD_UMD) += bpf_preload.o -CFLAGS_bpf_preload_kern.o += -I $(LIBBPF_INCLUDE) +CFLAGS_bpf_preload_kern.o += -I$(LIBBPF_INCLUDE) bpf_preload-objs += bpf_preload_kern.o diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 38bdfcd06f55..34725bfa1e97 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -176,7 +176,7 @@ build_id_valid: } static struct perf_callchain_entry * -get_callchain_entry_for_task(struct task_struct *task, u32 init_nr) +get_callchain_entry_for_task(struct task_struct *task, u32 max_depth) { #ifdef CONFIG_STACKTRACE struct perf_callchain_entry *entry; @@ -187,9 +187,8 @@ get_callchain_entry_for_task(struct task_struct *task, u32 init_nr) if (!entry) return NULL; - entry->nr = init_nr + - stack_trace_save_tsk(task, (unsigned long *)(entry->ip + init_nr), - sysctl_perf_event_max_stack - init_nr, 0); + entry->nr = stack_trace_save_tsk(task, (unsigned long *)entry->ip, + max_depth, 0); /* stack_trace_save_tsk() works on unsigned long array, while * perf_callchain_entry uses u64 array. For 32-bit systems, it is @@ -201,7 +200,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 init_nr) int i; /* copy data from the end to avoid using extra buffer */ - for (i = entry->nr - 1; i >= (int)init_nr; i--) + for (i = entry->nr - 1; i >= 0; i--) to[i] = (u64)(from[i]); } @@ -218,27 +217,19 @@ static long __bpf_get_stackid(struct bpf_map *map, { struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); struct stack_map_bucket *bucket, *new_bucket, *old_bucket; - u32 max_depth = map->value_size / stack_map_data_size(map); - /* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */ - u32 init_nr = sysctl_perf_event_max_stack - max_depth; u32 skip = flags & BPF_F_SKIP_FIELD_MASK; u32 hash, id, trace_nr, trace_len; bool user = flags & BPF_F_USER_STACK; u64 *ips; bool hash_matches; - /* get_perf_callchain() guarantees that trace->nr >= init_nr - * and trace-nr <= sysctl_perf_event_max_stack, so trace_nr <= max_depth - */ - trace_nr = trace->nr - init_nr; - - if (trace_nr <= skip) + if (trace->nr <= skip) /* skipping more than usable stack trace */ return -EFAULT; - trace_nr -= skip; + trace_nr = trace->nr - skip; trace_len = trace_nr * sizeof(u64); - ips = trace->ip + skip + init_nr; + ips = trace->ip + skip; hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0); id = hash & (smap->n_buckets - 1); bucket = READ_ONCE(smap->buckets[id]); @@ -295,8 +286,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map, u64, flags) { u32 max_depth = map->value_size / stack_map_data_size(map); - /* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */ - u32 init_nr = sysctl_perf_event_max_stack - max_depth; + u32 skip = flags & BPF_F_SKIP_FIELD_MASK; bool user = flags & BPF_F_USER_STACK; struct perf_callchain_entry *trace; bool kernel = !user; @@ -305,8 +295,12 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map, BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID))) return -EINVAL; - trace = get_perf_callchain(regs, init_nr, kernel, user, - sysctl_perf_event_max_stack, false, false); + max_depth += skip; + if (max_depth > sysctl_perf_event_max_stack) + max_depth = sysctl_perf_event_max_stack; + + trace = get_perf_callchain(regs, 0, kernel, user, max_depth, + false, false); if (unlikely(!trace)) /* couldn't fetch the stack trace */ @@ -397,7 +391,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, struct perf_callchain_entry *trace_in, void *buf, u32 size, u64 flags) { - u32 init_nr, trace_nr, copy_len, elem_size, num_elem; + u32 trace_nr, copy_len, elem_size, num_elem, max_depth; bool user_build_id = flags & BPF_F_USER_BUILD_ID; u32 skip = flags & BPF_F_SKIP_FIELD_MASK; bool user = flags & BPF_F_USER_STACK; @@ -422,30 +416,28 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, goto err_fault; num_elem = size / elem_size; - if (sysctl_perf_event_max_stack < num_elem) - init_nr = 0; - else - init_nr = sysctl_perf_event_max_stack - num_elem; + max_depth = num_elem + skip; + if (sysctl_perf_event_max_stack < max_depth) + max_depth = sysctl_perf_event_max_stack; if (trace_in) trace = trace_in; else if (kernel && task) - trace = get_callchain_entry_for_task(task, init_nr); + trace = get_callchain_entry_for_task(task, max_depth); else - trace = get_perf_callchain(regs, init_nr, kernel, user, - sysctl_perf_event_max_stack, + trace = get_perf_callchain(regs, 0, kernel, user, max_depth, false, false); if (unlikely(!trace)) goto err_fault; - trace_nr = trace->nr - init_nr; - if (trace_nr < skip) + if (trace->nr < skip) goto err_fault; - trace_nr -= skip; + trace_nr = trace->nr - skip; trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem; copy_len = trace_nr * elem_size; - ips = trace->ip + skip + init_nr; + + ips = trace->ip + skip; if (user && user_build_id) stack_map_get_build_id_offset(buf, ips, trace_nr, user); else diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index db402ebc5570..cdaa1152436a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -32,6 +32,7 @@ #include <linux/bpf-netns.h> #include <linux/rcupdate_trace.h> #include <linux/memcontrol.h> +#include <linux/trace_events.h> #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ @@ -3022,6 +3023,11 @@ out_put_file: fput(perf_file); return err; } +#else +static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_PERF_EVENTS */ #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd @@ -3336,7 +3342,7 @@ static int bpf_prog_query(const union bpf_attr *attr, } } -#define BPF_PROG_TEST_RUN_LAST_FIELD test.cpu +#define BPF_PROG_TEST_RUN_LAST_FIELD test.batch_size static int bpf_prog_test_run(const union bpf_attr *attr, union bpf_attr __user *uattr) @@ -4255,7 +4261,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr, return -EINVAL; } -#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len +#define BPF_LINK_CREATE_LAST_FIELD link_create.kprobe_multi.cookies static int link_create(union bpf_attr *attr, bpfptr_t uattr) { enum bpf_prog_type ptype; @@ -4279,7 +4285,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) ret = tracing_bpf_link_attach(attr, uattr, prog); goto out; case BPF_PROG_TYPE_PERF_EVENT: - case BPF_PROG_TYPE_KPROBE: case BPF_PROG_TYPE_TRACEPOINT: if (attr->link_create.attach_type != BPF_PERF_EVENT) { ret = -EINVAL; @@ -4287,6 +4292,14 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) } ptype = prog->type; break; + case BPF_PROG_TYPE_KPROBE: + if (attr->link_create.attach_type != BPF_PERF_EVENT && + attr->link_create.attach_type != BPF_TRACE_KPROBE_MULTI) { + ret = -EINVAL; + goto out; + } + ptype = prog->type; + break; default: ptype = attach_type_to_prog_type(attr->link_create.attach_type); if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) { @@ -4318,13 +4331,16 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) ret = bpf_xdp_link_attach(attr, prog); break; #endif -#ifdef CONFIG_PERF_EVENTS case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_TRACEPOINT: - case BPF_PROG_TYPE_KPROBE: ret = bpf_perf_link_attach(attr, prog); break; -#endif + case BPF_PROG_TYPE_KPROBE: + if (attr->link_create.attach_type == BPF_PERF_EVENT) + ret = bpf_perf_link_attach(attr, prog); + else + ret = bpf_kprobe_multi_link_attach(attr, prog); + break; default: ret = -EINVAL; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a57db4b2803c..d175b70067b3 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -554,7 +554,6 @@ static const char *reg_type_str(struct bpf_verifier_env *env, [PTR_TO_TP_BUFFER] = "tp_buffer", [PTR_TO_XDP_SOCK] = "xdp_sock", [PTR_TO_BTF_ID] = "ptr_", - [PTR_TO_PERCPU_BTF_ID] = "percpu_ptr_", [PTR_TO_MEM] = "mem", [PTR_TO_BUF] = "buf", [PTR_TO_FUNC] = "func", @@ -562,8 +561,7 @@ static const char *reg_type_str(struct bpf_verifier_env *env, }; if (type & PTR_MAYBE_NULL) { - if (base_type(type) == PTR_TO_BTF_ID || - base_type(type) == PTR_TO_PERCPU_BTF_ID) + if (base_type(type) == PTR_TO_BTF_ID) strncpy(postfix, "or_null_", 16); else strncpy(postfix, "_or_null", 16); @@ -575,6 +573,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env, strncpy(prefix, "alloc_", 32); if (type & MEM_USER) strncpy(prefix, "user_", 32); + if (type & MEM_PERCPU) + strncpy(prefix, "percpu_", 32); snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s", prefix, str[base_type(type)], postfix); @@ -697,8 +697,7 @@ static void print_verifier_state(struct bpf_verifier_env *env, const char *sep = ""; verbose(env, "%s", reg_type_str(env, t)); - if (base_type(t) == PTR_TO_BTF_ID || - base_type(t) == PTR_TO_PERCPU_BTF_ID) + if (base_type(t) == PTR_TO_BTF_ID) verbose(env, "%s", kernel_type_name(reg->btf, reg->btf_id)); verbose(env, "("); /* @@ -2783,7 +2782,6 @@ static bool is_spillable_regtype(enum bpf_reg_type type) case PTR_TO_XDP_SOCK: case PTR_TO_BTF_ID: case PTR_TO_BUF: - case PTR_TO_PERCPU_BTF_ID: case PTR_TO_MEM: case PTR_TO_FUNC: case PTR_TO_MAP_KEY: @@ -3990,6 +3988,12 @@ static int __check_ptr_off_reg(struct bpf_verifier_env *env, * is only allowed in its original, unmodified form. */ + if (reg->off < 0) { + verbose(env, "negative offset %s ptr R%d off=%d disallowed\n", + reg_type_str(env, reg->type), regno, reg->off); + return -EACCES; + } + if (!fixed_off_ok && reg->off) { verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n", reg_type_str(env, reg->type), regno, reg->off); @@ -4058,9 +4062,9 @@ static int check_buffer_access(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno, int off, int size, bool zero_size_allowed, - const char *buf_info, u32 *max_access) { + const char *buf_info = type_is_rdonly_mem(reg->type) ? "rdonly" : "rdwr"; int err; err = __check_buffer_access(env, buf_info, reg, regno, off, size); @@ -4197,6 +4201,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, return -EACCES; } + if (reg->type & MEM_PERCPU) { + verbose(env, + "R%d is ptr_%s access percpu memory: off=%d\n", + regno, tname, off); + return -EACCES; + } + if (env->ops->btf_struct_access) { ret = env->ops->btf_struct_access(&env->log, reg->btf, t, off, size, atype, &btf_id, &flag); @@ -4556,7 +4567,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn err = check_tp_buffer_access(env, reg, regno, off, size); if (!err && t == BPF_READ && value_regno >= 0) mark_reg_unknown(env, regs, value_regno); - } else if (reg->type == PTR_TO_BTF_ID) { + } else if (base_type(reg->type) == PTR_TO_BTF_ID && + !type_may_be_null(reg->type)) { err = check_ptr_to_btf_access(env, regs, regno, off, size, t, value_regno); } else if (reg->type == CONST_PTR_TO_MAP) { @@ -4564,7 +4576,6 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn value_regno); } else if (base_type(reg->type) == PTR_TO_BUF) { bool rdonly_mem = type_is_rdonly_mem(reg->type); - const char *buf_info; u32 *max_access; if (rdonly_mem) { @@ -4573,15 +4584,13 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn regno, reg_type_str(env, reg->type)); return -EACCES; } - buf_info = "rdonly"; max_access = &env->prog->aux->max_rdonly_access; } else { - buf_info = "rdwr"; max_access = &env->prog->aux->max_rdwr_access; } err = check_buffer_access(env, reg, regno, off, size, false, - buf_info, max_access); + max_access); if (!err && value_regno >= 0 && (rdonly_mem || t == BPF_READ)) mark_reg_unknown(env, regs, value_regno); @@ -4802,7 +4811,7 @@ static int check_stack_range_initialized( } if (is_spilled_reg(&state->stack[spi]) && - state->stack[spi].spilled_ptr.type == PTR_TO_BTF_ID) + base_type(state->stack[spi].spilled_ptr.type) == PTR_TO_BTF_ID) goto mark; if (is_spilled_reg(&state->stack[spi]) && @@ -4844,7 +4853,6 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, struct bpf_call_arg_meta *meta) { struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; - const char *buf_info; u32 *max_access; switch (base_type(reg->type)) { @@ -4871,15 +4879,13 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, if (meta && meta->raw_mode) return -EACCES; - buf_info = "rdonly"; max_access = &env->prog->aux->max_rdonly_access; } else { - buf_info = "rdwr"; max_access = &env->prog->aux->max_rdwr_access; } return check_buffer_access(env, reg, regno, reg->off, access_size, zero_size_allowed, - buf_info, max_access); + max_access); case PTR_TO_STACK: return check_stack_range_initialized( env, @@ -5258,7 +5264,7 @@ static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | ME static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } }; static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } }; static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } }; -static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_PERCPU_BTF_ID } }; +static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } }; static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } }; static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } }; static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } }; @@ -5359,6 +5365,60 @@ found: return 0; } +int check_func_arg_reg_off(struct bpf_verifier_env *env, + const struct bpf_reg_state *reg, int regno, + enum bpf_arg_type arg_type, + bool is_release_func) +{ + bool fixed_off_ok = false, release_reg; + enum bpf_reg_type type = reg->type; + + switch ((u32)type) { + case SCALAR_VALUE: + /* Pointer types where reg offset is explicitly allowed: */ + case PTR_TO_PACKET: + case PTR_TO_PACKET_META: + case PTR_TO_MAP_KEY: + case PTR_TO_MAP_VALUE: + case PTR_TO_MEM: + case PTR_TO_MEM | MEM_RDONLY: + case PTR_TO_MEM | MEM_ALLOC: + case PTR_TO_BUF: + case PTR_TO_BUF | MEM_RDONLY: + case PTR_TO_STACK: + /* Some of the argument types nevertheless require a + * zero register offset. + */ + if (arg_type != ARG_PTR_TO_ALLOC_MEM) + return 0; + break; + /* All the rest must be rejected, except PTR_TO_BTF_ID which allows + * fixed offset. + */ + case PTR_TO_BTF_ID: + /* When referenced PTR_TO_BTF_ID is passed to release function, + * it's fixed offset must be 0. We rely on the property that + * only one referenced register can be passed to BPF helpers and + * kfuncs. In the other cases, fixed offset can be non-zero. + */ + release_reg = is_release_func && reg->ref_obj_id; + if (release_reg && reg->off) { + verbose(env, "R%d must have zero offset when passed to release func\n", + regno); + return -EINVAL; + } + /* For release_reg == true, fixed_off_ok must be false, but we + * already checked and rejected reg->off != 0 above, so set to + * true to allow fixed offset for all other cases. + */ + fixed_off_ok = true; + break; + default: + break; + } + return __check_ptr_off_reg(env, reg, regno, fixed_off_ok); +} + static int check_func_arg(struct bpf_verifier_env *env, u32 arg, struct bpf_call_arg_meta *meta, const struct bpf_func_proto *fn) @@ -5408,36 +5468,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, if (err) return err; - switch ((u32)type) { - case SCALAR_VALUE: - /* Pointer types where reg offset is explicitly allowed: */ - case PTR_TO_PACKET: - case PTR_TO_PACKET_META: - case PTR_TO_MAP_KEY: - case PTR_TO_MAP_VALUE: - case PTR_TO_MEM: - case PTR_TO_MEM | MEM_RDONLY: - case PTR_TO_MEM | MEM_ALLOC: - case PTR_TO_BUF: - case PTR_TO_BUF | MEM_RDONLY: - case PTR_TO_STACK: - /* Some of the argument types nevertheless require a - * zero register offset. - */ - if (arg_type == ARG_PTR_TO_ALLOC_MEM) - goto force_off_check; - break; - /* All the rest must be rejected: */ - default: -force_off_check: - err = __check_ptr_off_reg(env, reg, regno, - type == PTR_TO_BTF_ID); - if (err < 0) - return err; - break; - } + err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id)); + if (err) + return err; skip_type_check: + /* check_func_arg_reg_off relies on only one referenced register being + * allowed for BPF helpers. + */ if (reg->ref_obj_id) { if (meta->ref_obj_id) { verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n", @@ -9638,7 +9676,6 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn) dst_reg->mem_size = aux->btf_var.mem_size; break; case PTR_TO_BTF_ID: - case PTR_TO_PERCPU_BTF_ID: dst_reg->btf = aux->btf_var.btf; dst_reg->btf_id = aux->btf_var.btf_id; break; @@ -10363,8 +10400,7 @@ static void adjust_btf_func(struct bpf_verifier_env *env) aux->func_info[i].insn_off = env->subprog_info[i].start; } -#define MIN_BPF_LINEINFO_SIZE (offsetof(struct bpf_line_info, line_col) + \ - sizeof(((struct bpf_line_info *)(0))->line_col)) +#define MIN_BPF_LINEINFO_SIZE offsetofend(struct bpf_line_info, line_col) #define MAX_LINEINFO_REC_SIZE MAX_FUNCINFO_REC_SIZE static int check_btf_line(struct bpf_verifier_env *env, @@ -11838,7 +11874,7 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env, type = t->type; t = btf_type_skip_modifiers(btf, type, NULL); if (percpu) { - aux->btf_var.reg_type = PTR_TO_PERCPU_BTF_ID; + aux->btf_var.reg_type = PTR_TO_BTF_ID | MEM_PERCPU; aux->btf_var.btf = btf; aux->btf_var.btf_id = type; } else if (!btf_type_is_struct(t)) { @@ -12987,6 +13023,7 @@ static int jit_subprogs(struct bpf_verifier_env *env) func[i]->aux->name[0] = 'F'; func[i]->aux->stack_depth = env->subprog_info[i].stack_depth; func[i]->jit_requested = 1; + func[i]->blinding_requested = prog->blinding_requested; func[i]->aux->kfunc_tab = prog->aux->kfunc_tab; func[i]->aux->kfunc_btf_tab = prog->aux->kfunc_btf_tab; func[i]->aux->linfo = prog->aux->linfo; @@ -13110,6 +13147,7 @@ out_free: out_undo_insn: /* cleanup main prog to be interpreted */ prog->jit_requested = 0; + prog->blinding_requested = 0; for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) { if (!bpf_pseudo_call(insn)) continue; @@ -13203,7 +13241,6 @@ static int do_misc_fixups(struct bpf_verifier_env *env) { struct bpf_prog *prog = env->prog; enum bpf_attach_type eatype = prog->expected_attach_type; - bool expect_blinding = bpf_jit_blinding_enabled(prog); enum bpf_prog_type prog_type = resolve_prog_type(prog); struct bpf_insn *insn = prog->insnsi; const struct bpf_func_proto *fn; @@ -13367,7 +13404,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env) insn->code = BPF_JMP | BPF_TAIL_CALL; aux = &env->insn_aux_data[i + delta]; - if (env->bpf_capable && !expect_blinding && + if (env->bpf_capable && !prog->blinding_requested && prog->jit_requested && !bpf_map_key_poisoned(aux) && !bpf_map_ptr_poisoned(aux) && @@ -13455,6 +13492,26 @@ static int do_misc_fixups(struct bpf_verifier_env *env) goto patch_call_imm; } + if (insn->imm == BPF_FUNC_task_storage_get || + insn->imm == BPF_FUNC_sk_storage_get || + insn->imm == BPF_FUNC_inode_storage_get) { + if (env->prog->aux->sleepable) + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL); + else + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC); + insn_buf[1] = *insn; + cnt = 2; + + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = prog = new_prog; + insn = new_prog->insnsi + i + delta; + goto patch_call_imm; + } + /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup * and other inlining handlers are currently limited to 64 bit * only. diff --git a/kernel/exit.c b/kernel/exit.c index b00a25bb4ab9..2d1803fa8fe6 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -64,6 +64,7 @@ #include <linux/compat.h> #include <linux/io_uring.h> #include <linux/kprobes.h> +#include <linux/rethook.h> #include <linux/uaccess.h> #include <asm/unistd.h> @@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp) struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); kprobe_flush_task(tsk); + rethook_flush_task(tsk); perf_event_delayed_put(tsk); trace_sched_process_free(tsk); put_task_struct(tsk); diff --git a/kernel/fork.c b/kernel/fork.c index f1e89007f228..d94f002d00cc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2255,6 +2255,9 @@ static __latent_entropy struct task_struct *copy_process( #ifdef CONFIG_KRETPROBES p->kretprobe_instances.first = NULL; #endif +#ifdef CONFIG_RETHOOK + p->rethooks.first = NULL; +#endif /* * Ensure that the cgroup subsystem policies allow the new process to be diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 951c93216fc4..79f2eb617a62 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -212,6 +212,10 @@ unsigned long kallsyms_lookup_name(const char *name) unsigned long i; unsigned int off; + /* Skip the search for empty string. */ + if (!*name) + return 0; + for (i = 0, off = 0; i < kallsyms_num_syms; i++) { off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index a5eb5e7fd624..99dd4ca63d68 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT config NOP_TRACER bool +config HAVE_RETHOOK + bool + +config RETHOOK + bool + depends on HAVE_RETHOOK + help + Enable generic return hooking feature. This is an internal + API, which will be used by other function-entry hooking + features like fprobe and kprobes. + config HAVE_FUNCTION_TRACER bool help @@ -236,6 +247,21 @@ config DYNAMIC_FTRACE_WITH_ARGS depends on DYNAMIC_FTRACE depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS +config FPROBE + bool "Kernel Function Probe (fprobe)" + depends on FUNCTION_TRACER + depends on DYNAMIC_FTRACE_WITH_REGS + depends on HAVE_RETHOOK + select RETHOOK + default n + help + This option enables kernel function probe (fprobe) based on ftrace. + The fprobe is similar to kprobes, but probes only for kernel function + entries and exits. This also can probe multiple functions by one + fprobe. + + If unsure, say N. + config FUNCTION_PROFILER bool "Kernel function profiler" depends on FUNCTION_TRACER diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index bedc5caceec7..c6f11a139eac 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -97,6 +97,8 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o +obj-$(CONFIG_FPROBE) += fprobe.o +obj-$(CONFIG_RETHOOK) += rethook.o obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index a2024ba32a20..172ef545730d 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -17,6 +17,9 @@ #include <linux/error-injection.h> #include <linux/btf_ids.h> #include <linux/bpf_lsm.h> +#include <linux/fprobe.h> +#include <linux/bsearch.h> +#include <linux/sort.h> #include <net/bpf_sk_storage.h> @@ -77,6 +80,8 @@ u64 bpf_get_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); static int bpf_btf_printf_prepare(struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags, const struct btf **btf, s32 *btf_id); +static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx); +static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx); /** * trace_call_bpf - invoke BPF program @@ -1036,6 +1041,30 @@ static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = { .arg1_type = ARG_PTR_TO_CTX, }; +BPF_CALL_1(bpf_get_func_ip_kprobe_multi, struct pt_regs *, regs) +{ + return bpf_kprobe_multi_entry_ip(current->bpf_ctx); +} + +static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe_multi = { + .func = bpf_get_func_ip_kprobe_multi, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, +}; + +BPF_CALL_1(bpf_get_attach_cookie_kprobe_multi, struct pt_regs *, regs) +{ + return bpf_kprobe_multi_cookie(current->bpf_ctx); +} + +static const struct bpf_func_proto bpf_get_attach_cookie_proto_kmulti = { + .func = bpf_get_attach_cookie_kprobe_multi, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, +}; + BPF_CALL_1(bpf_get_attach_cookie_trace, void *, ctx) { struct bpf_trace_run_ctx *run_ctx; @@ -1279,9 +1308,13 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_override_return_proto; #endif case BPF_FUNC_get_func_ip: - return &bpf_get_func_ip_proto_kprobe; + return prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI ? + &bpf_get_func_ip_proto_kprobe_multi : + &bpf_get_func_ip_proto_kprobe; case BPF_FUNC_get_attach_cookie: - return &bpf_get_attach_cookie_proto_trace; + return prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI ? + &bpf_get_attach_cookie_proto_kmulti : + &bpf_get_attach_cookie_proto_trace; default: return bpf_tracing_func_proto(func_id, prog); } @@ -2181,3 +2214,314 @@ static int __init bpf_event_init(void) fs_initcall(bpf_event_init); #endif /* CONFIG_MODULES */ + +#ifdef CONFIG_FPROBE +struct bpf_kprobe_multi_link { + struct bpf_link link; + struct fprobe fp; + unsigned long *addrs; + u64 *cookies; + u32 cnt; +}; + +struct bpf_kprobe_multi_run_ctx { + struct bpf_run_ctx run_ctx; + struct bpf_kprobe_multi_link *link; + unsigned long entry_ip; +}; + +static void bpf_kprobe_multi_link_release(struct bpf_link *link) +{ + struct bpf_kprobe_multi_link *kmulti_link; + + kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link); + unregister_fprobe(&kmulti_link->fp); +} + +static void bpf_kprobe_multi_link_dealloc(struct bpf_link *link) +{ + struct bpf_kprobe_multi_link *kmulti_link; + + kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link); + kvfree(kmulti_link->addrs); + kvfree(kmulti_link->cookies); + kfree(kmulti_link); +} + +static const struct bpf_link_ops bpf_kprobe_multi_link_lops = { + .release = bpf_kprobe_multi_link_release, + .dealloc = bpf_kprobe_multi_link_dealloc, +}; + +static void bpf_kprobe_multi_cookie_swap(void *a, void *b, int size, const void *priv) +{ + const struct bpf_kprobe_multi_link *link = priv; + unsigned long *addr_a = a, *addr_b = b; + u64 *cookie_a, *cookie_b; + unsigned long tmp1; + u64 tmp2; + + cookie_a = link->cookies + (addr_a - link->addrs); + cookie_b = link->cookies + (addr_b - link->addrs); + + /* swap addr_a/addr_b and cookie_a/cookie_b values */ + tmp1 = *addr_a; *addr_a = *addr_b; *addr_b = tmp1; + tmp2 = *cookie_a; *cookie_a = *cookie_b; *cookie_b = tmp2; +} + +static int __bpf_kprobe_multi_cookie_cmp(const void *a, const void *b) +{ + const unsigned long *addr_a = a, *addr_b = b; + + if (*addr_a == *addr_b) + return 0; + return *addr_a < *addr_b ? -1 : 1; +} + +static int bpf_kprobe_multi_cookie_cmp(const void *a, const void *b, const void *priv) +{ + return __bpf_kprobe_multi_cookie_cmp(a, b); +} + +static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx) +{ + struct bpf_kprobe_multi_run_ctx *run_ctx; + struct bpf_kprobe_multi_link *link; + u64 *cookie, entry_ip; + unsigned long *addr; + + if (WARN_ON_ONCE(!ctx)) + return 0; + run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx); + link = run_ctx->link; + if (!link->cookies) + return 0; + entry_ip = run_ctx->entry_ip; + addr = bsearch(&entry_ip, link->addrs, link->cnt, sizeof(entry_ip), + __bpf_kprobe_multi_cookie_cmp); + if (!addr) + return 0; + cookie = link->cookies + (addr - link->addrs); + return *cookie; +} + +static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx) +{ + struct bpf_kprobe_multi_run_ctx *run_ctx; + + run_ctx = container_of(current->bpf_ctx, struct bpf_kprobe_multi_run_ctx, run_ctx); + return run_ctx->entry_ip; +} + +static int +kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, + unsigned long entry_ip, struct pt_regs *regs) +{ + struct bpf_kprobe_multi_run_ctx run_ctx = { + .link = link, + .entry_ip = entry_ip, + }; + struct bpf_run_ctx *old_run_ctx; + int err; + + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { + err = 0; + goto out; + } + + migrate_disable(); + rcu_read_lock(); + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); + err = bpf_prog_run(link->link.prog, regs); + bpf_reset_run_ctx(old_run_ctx); + rcu_read_unlock(); + migrate_enable(); + + out: + __this_cpu_dec(bpf_prog_active); + return err; +} + +static void +kprobe_multi_link_handler(struct fprobe *fp, unsigned long entry_ip, + struct pt_regs *regs) +{ + struct bpf_kprobe_multi_link *link; + + link = container_of(fp, struct bpf_kprobe_multi_link, fp); + kprobe_multi_link_prog_run(link, entry_ip, regs); +} + +static int +kprobe_multi_resolve_syms(const void *usyms, u32 cnt, + unsigned long *addrs) +{ + unsigned long addr, size; + const char **syms; + int err = -ENOMEM; + unsigned int i; + char *func; + + size = cnt * sizeof(*syms); + syms = kvzalloc(size, GFP_KERNEL); + if (!syms) + return -ENOMEM; + + func = kmalloc(KSYM_NAME_LEN, GFP_KERNEL); + if (!func) + goto error; + + if (copy_from_user(syms, usyms, size)) { + err = -EFAULT; + goto error; + } + + for (i = 0; i < cnt; i++) { + err = strncpy_from_user(func, syms[i], KSYM_NAME_LEN); + if (err == KSYM_NAME_LEN) + err = -E2BIG; + if (err < 0) + goto error; + err = -EINVAL; + addr = kallsyms_lookup_name(func); + if (!addr) + goto error; + if (!kallsyms_lookup_size_offset(addr, &size, NULL)) + goto error; + addr = ftrace_location_range(addr, addr + size - 1); + if (!addr) + goto error; + addrs[i] = addr; + } + + err = 0; +error: + kvfree(syms); + kfree(func); + return err; +} + +int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + struct bpf_kprobe_multi_link *link = NULL; + struct bpf_link_primer link_primer; + void __user *ucookies; + unsigned long *addrs; + u32 flags, cnt, size; + void __user *uaddrs; + u64 *cookies = NULL; + void __user *usyms; + int err; + + /* no support for 32bit archs yet */ + if (sizeof(u64) != sizeof(void *)) + return -EOPNOTSUPP; + + if (prog->expected_attach_type != BPF_TRACE_KPROBE_MULTI) + return -EINVAL; + + flags = attr->link_create.kprobe_multi.flags; + if (flags & ~BPF_F_KPROBE_MULTI_RETURN) + return -EINVAL; + + uaddrs = u64_to_user_ptr(attr->link_create.kprobe_multi.addrs); + usyms = u64_to_user_ptr(attr->link_create.kprobe_multi.syms); + if (!!uaddrs == !!usyms) + return -EINVAL; + + cnt = attr->link_create.kprobe_multi.cnt; + if (!cnt) + return -EINVAL; + + size = cnt * sizeof(*addrs); + addrs = kvmalloc(size, GFP_KERNEL); + if (!addrs) + return -ENOMEM; + + if (uaddrs) { + if (copy_from_user(addrs, uaddrs, size)) { + err = -EFAULT; + goto error; + } + } else { + err = kprobe_multi_resolve_syms(usyms, cnt, addrs); + if (err) + goto error; + } + + ucookies = u64_to_user_ptr(attr->link_create.kprobe_multi.cookies); + if (ucookies) { + cookies = kvmalloc(size, GFP_KERNEL); + if (!cookies) { + err = -ENOMEM; + goto error; + } + if (copy_from_user(cookies, ucookies, size)) { + err = -EFAULT; + goto error; + } + } + + link = kzalloc(sizeof(*link), GFP_KERNEL); + if (!link) { + err = -ENOMEM; + goto error; + } + + bpf_link_init(&link->link, BPF_LINK_TYPE_KPROBE_MULTI, + &bpf_kprobe_multi_link_lops, prog); + + err = bpf_link_prime(&link->link, &link_primer); + if (err) + goto error; + + if (flags & BPF_F_KPROBE_MULTI_RETURN) + link->fp.exit_handler = kprobe_multi_link_handler; + else + link->fp.entry_handler = kprobe_multi_link_handler; + + link->addrs = addrs; + link->cookies = cookies; + link->cnt = cnt; + + if (cookies) { + /* + * Sorting addresses will trigger sorting cookies as well + * (check bpf_kprobe_multi_cookie_swap). This way we can + * find cookie based on the address in bpf_get_attach_cookie + * helper. + */ + sort_r(addrs, cnt, sizeof(*addrs), + bpf_kprobe_multi_cookie_cmp, + bpf_kprobe_multi_cookie_swap, + link); + } + + err = register_fprobe_ips(&link->fp, addrs, cnt); + if (err) { + bpf_link_cleanup(&link_primer); + return err; + } + + return bpf_link_settle(&link_primer); + +error: + kfree(link); + kvfree(addrs); + kvfree(cookies); + return err; +} +#else /* !CONFIG_FPROBE */ +int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -EOPNOTSUPP; +} +static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx) +{ + return 0; +} +static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx) +{ + return 0; +} +#endif diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c new file mode 100644 index 000000000000..8b2dd5b9dcd1 --- /dev/null +++ b/kernel/trace/fprobe.c @@ -0,0 +1,332 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * fprobe - Simple ftrace probe wrapper for function entry. + */ +#define pr_fmt(fmt) "fprobe: " fmt + +#include <linux/err.h> +#include <linux/fprobe.h> +#include <linux/kallsyms.h> +#include <linux/kprobes.h> +#include <linux/rethook.h> +#include <linux/slab.h> +#include <linux/sort.h> + +#include "trace.h" + +struct fprobe_rethook_node { + struct rethook_node node; + unsigned long entry_ip; +}; + +static void fprobe_handler(unsigned long ip, unsigned long parent_ip, + struct ftrace_ops *ops, struct ftrace_regs *fregs) +{ + struct fprobe_rethook_node *fpr; + struct rethook_node *rh; + struct fprobe *fp; + int bit; + + fp = container_of(ops, struct fprobe, ops); + if (fprobe_disabled(fp)) + return; + + bit = ftrace_test_recursion_trylock(ip, parent_ip); + if (bit < 0) { + fp->nmissed++; + return; + } + + if (fp->entry_handler) + fp->entry_handler(fp, ip, ftrace_get_regs(fregs)); + + if (fp->exit_handler) { + rh = rethook_try_get(fp->rethook); + if (!rh) { + fp->nmissed++; + goto out; + } + fpr = container_of(rh, struct fprobe_rethook_node, node); + fpr->entry_ip = ip; + rethook_hook(rh, ftrace_get_regs(fregs), true); + } + +out: + ftrace_test_recursion_unlock(bit); +} +NOKPROBE_SYMBOL(fprobe_handler); + +static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip, + struct ftrace_ops *ops, struct ftrace_regs *fregs) +{ + struct fprobe *fp = container_of(ops, struct fprobe, ops); + + if (unlikely(kprobe_running())) { + fp->nmissed++; + return; + } + kprobe_busy_begin(); + fprobe_handler(ip, parent_ip, ops, fregs); + kprobe_busy_end(); +} + +static void fprobe_exit_handler(struct rethook_node *rh, void *data, + struct pt_regs *regs) +{ + struct fprobe *fp = (struct fprobe *)data; + struct fprobe_rethook_node *fpr; + + if (!fp || fprobe_disabled(fp)) + return; + + fpr = container_of(rh, struct fprobe_rethook_node, node); + + fp->exit_handler(fp, fpr->entry_ip, regs); +} +NOKPROBE_SYMBOL(fprobe_exit_handler); + +/* Convert ftrace location address from symbols */ +static unsigned long *get_ftrace_locations(const char **syms, int num) +{ + unsigned long addr, size; + unsigned long *addrs; + int i; + + /* Convert symbols to symbol address */ + addrs = kcalloc(num, sizeof(*addrs), GFP_KERNEL); + if (!addrs) + return ERR_PTR(-ENOMEM); + + for (i = 0; i < num; i++) { + addr = kallsyms_lookup_name(syms[i]); + if (!addr) /* Maybe wrong symbol */ + goto error; + + /* Convert symbol address to ftrace location. */ + if (!kallsyms_lookup_size_offset(addr, &size, NULL) || !size) + goto error; + + addr = ftrace_location_range(addr, addr + size - 1); + if (!addr) /* No dynamic ftrace there. */ + goto error; + + addrs[i] = addr; + } + + return addrs; + +error: + kfree(addrs); + + return ERR_PTR(-ENOENT); +} + +static void fprobe_init(struct fprobe *fp) +{ + fp->nmissed = 0; + if (fprobe_shared_with_kprobes(fp)) + fp->ops.func = fprobe_kprobe_handler; + else + fp->ops.func = fprobe_handler; + fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS; +} + +static int fprobe_init_rethook(struct fprobe *fp, int num) +{ + int i, size; + + if (num < 0) + return -EINVAL; + + if (!fp->exit_handler) { + fp->rethook = NULL; + return 0; + } + + /* Initialize rethook if needed */ + size = num * num_possible_cpus() * 2; + if (size < 0) + return -E2BIG; + + fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler); + for (i = 0; i < size; i++) { + struct rethook_node *node; + + node = kzalloc(sizeof(struct fprobe_rethook_node), GFP_KERNEL); + if (!node) { + rethook_free(fp->rethook); + fp->rethook = NULL; + return -ENOMEM; + } + rethook_add_node(fp->rethook, node); + } + return 0; +} + +static void fprobe_fail_cleanup(struct fprobe *fp) +{ + if (fp->rethook) { + /* Don't need to cleanup rethook->handler because this is not used. */ + rethook_free(fp->rethook); + fp->rethook = NULL; + } + ftrace_free_filter(&fp->ops); +} + +/** + * register_fprobe() - Register fprobe to ftrace by pattern. + * @fp: A fprobe data structure to be registered. + * @filter: A wildcard pattern of probed symbols. + * @notfilter: A wildcard pattern of NOT probed symbols. + * + * Register @fp to ftrace for enabling the probe on the symbols matched to @filter. + * If @notfilter is not NULL, the symbols matched the @notfilter are not probed. + * + * Return 0 if @fp is registered successfully, -errno if not. + */ +int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter) +{ + struct ftrace_hash *hash; + unsigned char *str; + int ret, len; + + if (!fp || !filter) + return -EINVAL; + + fprobe_init(fp); + + len = strlen(filter); + str = kstrdup(filter, GFP_KERNEL); + ret = ftrace_set_filter(&fp->ops, str, len, 0); + kfree(str); + if (ret) + return ret; + + if (notfilter) { + len = strlen(notfilter); + str = kstrdup(notfilter, GFP_KERNEL); + ret = ftrace_set_notrace(&fp->ops, str, len, 0); + kfree(str); + if (ret) + goto out; + } + + /* TODO: + * correctly calculate the total number of filtered symbols + * from both filter and notfilter. + */ + hash = fp->ops.local_hash.filter_hash; + if (WARN_ON_ONCE(!hash)) + goto out; + + ret = fprobe_init_rethook(fp, (int)hash->count); + if (!ret) + ret = register_ftrace_function(&fp->ops); + +out: + if (ret) + fprobe_fail_cleanup(fp); + return ret; +} +EXPORT_SYMBOL_GPL(register_fprobe); + +/** + * register_fprobe_ips() - Register fprobe to ftrace by address. + * @fp: A fprobe data structure to be registered. + * @addrs: An array of target ftrace location addresses. + * @num: The number of entries of @addrs. + * + * Register @fp to ftrace for enabling the probe on the address given by @addrs. + * The @addrs must be the addresses of ftrace location address, which may be + * the symbol address + arch-dependent offset. + * If you unsure what this mean, please use other registration functions. + * + * Return 0 if @fp is registered successfully, -errno if not. + */ +int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num) +{ + int ret; + + if (!fp || !addrs || num <= 0) + return -EINVAL; + + fprobe_init(fp); + + ret = ftrace_set_filter_ips(&fp->ops, addrs, num, 0, 0); + if (ret) + return ret; + + ret = fprobe_init_rethook(fp, num); + if (!ret) + ret = register_ftrace_function(&fp->ops); + + if (ret) + fprobe_fail_cleanup(fp); + return ret; +} +EXPORT_SYMBOL_GPL(register_fprobe_ips); + +/** + * register_fprobe_syms() - Register fprobe to ftrace by symbols. + * @fp: A fprobe data structure to be registered. + * @syms: An array of target symbols. + * @num: The number of entries of @syms. + * + * Register @fp to the symbols given by @syms array. This will be useful if + * you are sure the symbols exist in the kernel. + * + * Return 0 if @fp is registered successfully, -errno if not. + */ +int register_fprobe_syms(struct fprobe *fp, const char **syms, int num) +{ + unsigned long *addrs; + int ret; + + if (!fp || !syms || num <= 0) + return -EINVAL; + + addrs = get_ftrace_locations(syms, num); + if (IS_ERR(addrs)) + return PTR_ERR(addrs); + + ret = register_fprobe_ips(fp, addrs, num); + + kfree(addrs); + + return ret; +} +EXPORT_SYMBOL_GPL(register_fprobe_syms); + +/** + * unregister_fprobe() - Unregister fprobe from ftrace + * @fp: A fprobe data structure to be unregistered. + * + * Unregister fprobe (and remove ftrace hooks from the function entries). + * + * Return 0 if @fp is unregistered successfully, -errno if not. + */ +int unregister_fprobe(struct fprobe *fp) +{ + int ret; + + if (!fp || fp->ops.func != fprobe_handler) + return -EINVAL; + + /* + * rethook_free() starts disabling the rethook, but the rethook handlers + * may be running on other processors at this point. To make sure that all + * current running handlers are finished, call unregister_ftrace_function() + * after this. + */ + if (fp->rethook) + rethook_free(fp->rethook); + + ret = unregister_ftrace_function(&fp->ops); + if (ret < 0) + return ret; + + ftrace_free_filter(&fp->ops); + + return ret; +} +EXPORT_SYMBOL_GPL(unregister_fprobe); diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 6105b7036482..4214d6a69e60 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf, } static int -ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove) +__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove) { struct ftrace_func_entry *entry; @@ -4977,8 +4977,29 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove) } static int +ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips, + unsigned int cnt, int remove) +{ + unsigned int i; + int err; + + for (i = 0; i < cnt; i++) { + err = __ftrace_match_addr(hash, ips[i], remove); + if (err) { + /* + * This expects the @hash is a temporary hash and if this + * fails the caller must free the @hash. + */ + return err; + } + } + return 0; +} + +static int ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len, - unsigned long ip, int remove, int reset, int enable) + unsigned long *ips, unsigned int cnt, + int remove, int reset, int enable) { struct ftrace_hash **orig_hash; struct ftrace_hash *hash; @@ -5008,8 +5029,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len, ret = -EINVAL; goto out_regex_unlock; } - if (ip) { - ret = ftrace_match_addr(hash, ip, remove); + if (ips) { + ret = ftrace_match_addr(hash, ips, cnt, remove); if (ret < 0) goto out_regex_unlock; } @@ -5026,10 +5047,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len, } static int -ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove, - int reset, int enable) +ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt, + int remove, int reset, int enable) { - return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable); + return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable); } #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS @@ -5634,11 +5655,30 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip, int remove, int reset) { ftrace_ops_init(ops); - return ftrace_set_addr(ops, ip, remove, reset, 1); + return ftrace_set_addr(ops, &ip, 1, remove, reset, 1); } EXPORT_SYMBOL_GPL(ftrace_set_filter_ip); /** + * ftrace_set_filter_ips - set functions to filter on in ftrace by addresses + * @ops - the ops to set the filter with + * @ips - the array of addresses to add to or remove from the filter. + * @cnt - the number of addresses in @ips + * @remove - non zero to remove ips from the filter + * @reset - non zero to reset all filters before applying this filter. + * + * Filters denote which functions should be enabled when tracing is enabled + * If @ips array or any ip specified within is NULL , it fails to update filter. + */ +int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips, + unsigned int cnt, int remove, int reset) +{ + ftrace_ops_init(ops); + return ftrace_set_addr(ops, ips, cnt, remove, reset, 1); +} +EXPORT_SYMBOL_GPL(ftrace_set_filter_ips); + +/** * ftrace_ops_set_global_filter - setup ops to use global filters * @ops - the ops which will use the global filters * @@ -5659,7 +5699,7 @@ static int ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len, int reset, int enable) { - return ftrace_set_hash(ops, buf, len, 0, 0, reset, enable); + return ftrace_set_hash(ops, buf, len, NULL, 0, 0, reset, enable); } /** diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c new file mode 100644 index 000000000000..ab463a4d2b23 --- /dev/null +++ b/kernel/trace/rethook.c @@ -0,0 +1,317 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) "rethook: " fmt + +#include <linux/bug.h> +#include <linux/kallsyms.h> +#include <linux/kprobes.h> +#include <linux/preempt.h> +#include <linux/rethook.h> +#include <linux/slab.h> +#include <linux/sort.h> + +/* Return hook list (shadow stack by list) */ + +/* + * This function is called from delayed_put_task_struct() when a task is + * dead and cleaned up to recycle any kretprobe instances associated with + * this task. These left over instances represent probed functions that + * have been called but will never return. + */ +void rethook_flush_task(struct task_struct *tk) +{ + struct rethook_node *rhn; + struct llist_node *node; + + node = __llist_del_all(&tk->rethooks); + while (node) { + rhn = container_of(node, struct rethook_node, llist); + node = node->next; + preempt_disable(); + rethook_recycle(rhn); + preempt_enable(); + } +} + +static void rethook_free_rcu(struct rcu_head *head) +{ + struct rethook *rh = container_of(head, struct rethook, rcu); + struct rethook_node *rhn; + struct freelist_node *node; + int count = 1; + + node = rh->pool.head; + while (node) { + rhn = container_of(node, struct rethook_node, freelist); + node = node->next; + kfree(rhn); + count++; + } + + /* The rh->ref is the number of pooled node + 1 */ + if (refcount_sub_and_test(count, &rh->ref)) + kfree(rh); +} + +/** + * rethook_free() - Free struct rethook. + * @rh: the struct rethook to be freed. + * + * Free the rethook. Before calling this function, user must ensure the + * @rh::data is cleaned if needed (or, the handler can access it after + * calling this function.) This function will set the @rh to be freed + * after all rethook_node are freed (not soon). And the caller must + * not touch @rh after calling this. + */ +void rethook_free(struct rethook *rh) +{ + rcu_assign_pointer(rh->handler, NULL); + + call_rcu(&rh->rcu, rethook_free_rcu); +} + +/** + * rethook_alloc() - Allocate struct rethook. + * @data: a data to pass the @handler when hooking the return. + * @handler: the return hook callback function. + * + * Allocate and initialize a new rethook with @data and @handler. + * Return NULL if memory allocation fails or @handler is NULL. + * Note that @handler == NULL means this rethook is going to be freed. + */ +struct rethook *rethook_alloc(void *data, rethook_handler_t handler) +{ + struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL); + + if (!rh || !handler) + return NULL; + + rh->data = data; + rh->handler = handler; + rh->pool.head = NULL; + refcount_set(&rh->ref, 1); + + return rh; +} + +/** + * rethook_add_node() - Add a new node to the rethook. + * @rh: the struct rethook. + * @node: the struct rethook_node to be added. + * + * Add @node to @rh. User must allocate @node (as a part of user's + * data structure.) The @node fields are initialized in this function. + */ +void rethook_add_node(struct rethook *rh, struct rethook_node *node) +{ + node->rethook = rh; + freelist_add(&node->freelist, &rh->pool); + refcount_inc(&rh->ref); +} + +static void free_rethook_node_rcu(struct rcu_head *head) +{ + struct rethook_node *node = container_of(head, struct rethook_node, rcu); + + if (refcount_dec_and_test(&node->rethook->ref)) + kfree(node->rethook); + kfree(node); +} + +/** + * rethook_recycle() - return the node to rethook. + * @node: The struct rethook_node to be returned. + * + * Return back the @node to @node::rethook. If the @node::rethook is already + * marked as freed, this will free the @node. + */ +void rethook_recycle(struct rethook_node *node) +{ + lockdep_assert_preemption_disabled(); + + if (likely(READ_ONCE(node->rethook->handler))) + freelist_add(&node->freelist, &node->rethook->pool); + else + call_rcu(&node->rcu, free_rethook_node_rcu); +} +NOKPROBE_SYMBOL(rethook_recycle); + +/** + * rethook_try_get() - get an unused rethook node. + * @rh: The struct rethook which pools the nodes. + * + * Get an unused rethook node from @rh. If the node pool is empty, this + * will return NULL. Caller must disable preemption. + */ +struct rethook_node *rethook_try_get(struct rethook *rh) +{ + rethook_handler_t handler = READ_ONCE(rh->handler); + struct freelist_node *fn; + + lockdep_assert_preemption_disabled(); + + /* Check whether @rh is going to be freed. */ + if (unlikely(!handler)) + return NULL; + + fn = freelist_try_get(&rh->pool); + if (!fn) + return NULL; + + return container_of(fn, struct rethook_node, freelist); +} +NOKPROBE_SYMBOL(rethook_try_get); + +/** + * rethook_hook() - Hook the current function return. + * @node: The struct rethook node to hook the function return. + * @regs: The struct pt_regs for the function entry. + * @mcount: True if this is called from mcount(ftrace) context. + * + * Hook the current running function return. This must be called when the + * function entry (or at least @regs must be the registers of the function + * entry.) @mcount is used for identifying the context. If this is called + * from ftrace (mcount) callback, @mcount must be set true. If this is called + * from the real function entry (e.g. kprobes) @mcount must be set false. + * This is because the way to hook the function return depends on the context. + */ +void rethook_hook(struct rethook_node *node, struct pt_regs *regs, bool mcount) +{ + arch_rethook_prepare(node, regs, mcount); + __llist_add(&node->llist, ¤t->rethooks); +} +NOKPROBE_SYMBOL(rethook_hook); + +/* This assumes the 'tsk' is the current task or is not running. */ +static unsigned long __rethook_find_ret_addr(struct task_struct *tsk, + struct llist_node **cur) +{ + struct rethook_node *rh = NULL; + struct llist_node *node = *cur; + + if (!node) + node = tsk->rethooks.first; + else + node = node->next; + + while (node) { + rh = container_of(node, struct rethook_node, llist); + if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) { + *cur = node; + return rh->ret_addr; + } + node = node->next; + } + return 0; +} +NOKPROBE_SYMBOL(__rethook_find_ret_addr); + +/** + * rethook_find_ret_addr -- Find correct return address modified by rethook + * @tsk: Target task + * @frame: A frame pointer + * @cur: a storage of the loop cursor llist_node pointer for next call + * + * Find the correct return address modified by a rethook on @tsk in unsigned + * long type. + * The @tsk must be 'current' or a task which is not running. @frame is a hint + * to get the currect return address - which is compared with the + * rethook::frame field. The @cur is a loop cursor for searching the + * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the + * first call, but '@cur' itself must NOT NULL. + * + * Returns found address value or zero if not found. + */ +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame, + struct llist_node **cur) +{ + struct rethook_node *rhn = NULL; + unsigned long ret; + + if (WARN_ON_ONCE(!cur)) + return 0; + + if (WARN_ON_ONCE(tsk != current && task_is_running(tsk))) + return 0; + + do { + ret = __rethook_find_ret_addr(tsk, cur); + if (!ret) + break; + rhn = container_of(*cur, struct rethook_node, llist); + } while (rhn->frame != frame); + + return ret; +} +NOKPROBE_SYMBOL(rethook_find_ret_addr); + +void __weak arch_rethook_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr) +{ + /* + * Do nothing by default. If the architecture which uses a + * frame pointer to record real return address on the stack, + * it should fill this function to fixup the return address + * so that stacktrace works from the rethook handler. + */ +} + +/* This function will be called from each arch-defined trampoline. */ +unsigned long rethook_trampoline_handler(struct pt_regs *regs, + unsigned long frame) +{ + struct llist_node *first, *node = NULL; + unsigned long correct_ret_addr; + rethook_handler_t handler; + struct rethook_node *rhn; + + correct_ret_addr = __rethook_find_ret_addr(current, &node); + if (!correct_ret_addr) { + pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n"); + BUG_ON(1); + } + + instruction_pointer_set(regs, correct_ret_addr); + + /* + * These loops must be protected from rethook_free_rcu() because those + * are accessing 'rhn->rethook'. + */ + preempt_disable(); + + /* + * Run the handler on the shadow stack. Do not unlink the list here because + * stackdump inside the handlers needs to decode it. + */ + first = current->rethooks.first; + while (first) { + rhn = container_of(first, struct rethook_node, llist); + if (WARN_ON_ONCE(rhn->frame != frame)) + break; + handler = READ_ONCE(rhn->rethook->handler); + if (handler) + handler(rhn, rhn->rethook->data, regs); + + if (first == node) + break; + first = first->next; + } + + /* Fixup registers for returning to correct address. */ + arch_rethook_fixup_return(regs, correct_ret_addr); + + /* Unlink used shadow stack */ + first = current->rethooks.first; + current->rethooks.first = node->next; + node->next = NULL; + + while (first) { + rhn = container_of(first, struct rethook_node, llist); + first = first->next; + rethook_recycle(rhn); + } + preempt_enable(); + + return correct_ret_addr; +} +NOKPROBE_SYMBOL(rethook_trampoline_handler); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 72ca4684beda..b0bf0d224b2c 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2118,6 +2118,18 @@ config KPROBES_SANITY_TEST Say N if you are unsure. +config FPROBE_SANITY_TEST + bool "Self test for fprobe" + depends on DEBUG_KERNEL + depends on FPROBE + depends on KUNIT=y + help + This option will enable testing the fprobe when the system boot. + A series of tests are made to verify that the fprobe is functioning + properly. + + Say N if you are unsure. + config BACKTRACE_SELF_TEST tristate "Self test for the backtrace code" depends on DEBUG_KERNEL diff --git a/lib/Makefile b/lib/Makefile index 300f569c626b..154008764b16 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -103,6 +103,8 @@ obj-$(CONFIG_TEST_HMM) += test_hmm.o obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o +CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) +obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o # # CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS diff --git a/lib/sort.c b/lib/sort.c index aa18153864d2..b399bf10d675 100644 --- a/lib/sort.c +++ b/lib/sort.c @@ -122,16 +122,27 @@ static void swap_bytes(void *a, void *b, size_t n) * a pointer, but small integers make for the smallest compare * instructions. */ -#define SWAP_WORDS_64 (swap_func_t)0 -#define SWAP_WORDS_32 (swap_func_t)1 -#define SWAP_BYTES (swap_func_t)2 +#define SWAP_WORDS_64 (swap_r_func_t)0 +#define SWAP_WORDS_32 (swap_r_func_t)1 +#define SWAP_BYTES (swap_r_func_t)2 +#define SWAP_WRAPPER (swap_r_func_t)3 + +struct wrapper { + cmp_func_t cmp; + swap_func_t swap; +}; /* * The function pointer is last to make tail calls most efficient if the * compiler decides not to inline this function. */ -static void do_swap(void *a, void *b, size_t size, swap_func_t swap_func) +static void do_swap(void *a, void *b, size_t size, swap_r_func_t swap_func, const void *priv) { + if (swap_func == SWAP_WRAPPER) { + ((const struct wrapper *)priv)->swap(a, b, (int)size); + return; + } + if (swap_func == SWAP_WORDS_64) swap_words_64(a, b, size); else if (swap_func == SWAP_WORDS_32) @@ -139,7 +150,7 @@ static void do_swap(void *a, void *b, size_t size, swap_func_t swap_func) else if (swap_func == SWAP_BYTES) swap_bytes(a, b, size); else - swap_func(a, b, (int)size); + swap_func(a, b, (int)size, priv); } #define _CMP_WRAPPER ((cmp_r_func_t)0L) @@ -147,7 +158,7 @@ static void do_swap(void *a, void *b, size_t size, swap_func_t swap_func) static int do_cmp(const void *a, const void *b, cmp_r_func_t cmp, const void *priv) { if (cmp == _CMP_WRAPPER) - return ((cmp_func_t)(priv))(a, b); + return ((const struct wrapper *)priv)->cmp(a, b); return cmp(a, b, priv); } @@ -198,7 +209,7 @@ static size_t parent(size_t i, unsigned int lsbit, size_t size) */ void sort_r(void *base, size_t num, size_t size, cmp_r_func_t cmp_func, - swap_func_t swap_func, + swap_r_func_t swap_func, const void *priv) { /* pre-scale counters for performance */ @@ -208,6 +219,10 @@ void sort_r(void *base, size_t num, size_t size, if (!a) /* num < 2 || size == 0 */ return; + /* called from 'sort' without swap function, let's pick the default */ + if (swap_func == SWAP_WRAPPER && !((struct wrapper *)priv)->swap) + swap_func = NULL; + if (!swap_func) { if (is_aligned(base, size, 8)) swap_func = SWAP_WORDS_64; @@ -230,7 +245,7 @@ void sort_r(void *base, size_t num, size_t size, if (a) /* Building heap: sift down --a */ a -= size; else if (n -= size) /* Sorting: Extract root to --n */ - do_swap(base, base + n, size, swap_func); + do_swap(base, base + n, size, swap_func, priv); else /* Sort complete */ break; @@ -257,7 +272,7 @@ void sort_r(void *base, size_t num, size_t size, c = b; /* Where "a" belongs */ while (b != a) { /* Shift it into place */ b = parent(b, lsbit, size); - do_swap(base + b, base + c, size, swap_func); + do_swap(base + b, base + c, size, swap_func, priv); } } } @@ -267,6 +282,11 @@ void sort(void *base, size_t num, size_t size, cmp_func_t cmp_func, swap_func_t swap_func) { - return sort_r(base, num, size, _CMP_WRAPPER, swap_func, cmp_func); + struct wrapper w = { + .cmp = cmp_func, + .swap = swap_func, + }; + + return sort_r(base, num, size, _CMP_WRAPPER, SWAP_WRAPPER, &w); } EXPORT_SYMBOL(sort); diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c new file mode 100644 index 000000000000..ed70637a2ffa --- /dev/null +++ b/lib/test_fprobe.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * test_fprobe.c - simple sanity test for fprobe + */ + +#include <linux/kernel.h> +#include <linux/fprobe.h> +#include <linux/random.h> +#include <kunit/test.h> + +#define div_factor 3 + +static struct kunit *current_test; + +static u32 rand1, entry_val, exit_val; + +/* Use indirect calls to avoid inlining the target functions */ +static u32 (*target)(u32 value); +static u32 (*target2)(u32 value); +static unsigned long target_ip; +static unsigned long target2_ip; + +static noinline u32 fprobe_selftest_target(u32 value) +{ + return (value / div_factor); +} + +static noinline u32 fprobe_selftest_target2(u32 value) +{ + return (value / div_factor) + 1; +} + +static notrace void fp_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) +{ + KUNIT_EXPECT_FALSE(current_test, preemptible()); + /* This can be called on the fprobe_selftest_target and the fprobe_selftest_target2 */ + if (ip != target_ip) + KUNIT_EXPECT_EQ(current_test, ip, target2_ip); + entry_val = (rand1 / div_factor); +} + +static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) +{ + unsigned long ret = regs_return_value(regs); + + KUNIT_EXPECT_FALSE(current_test, preemptible()); + if (ip != target_ip) { + KUNIT_EXPECT_EQ(current_test, ip, target2_ip); + KUNIT_EXPECT_EQ(current_test, ret, (rand1 / div_factor) + 1); + } else + KUNIT_EXPECT_EQ(current_test, ret, (rand1 / div_factor)); + KUNIT_EXPECT_EQ(current_test, entry_val, (rand1 / div_factor)); + exit_val = entry_val + div_factor; +} + +/* Test entry only (no rethook) */ +static void test_fprobe_entry(struct kunit *test) +{ + struct fprobe fp_entry = { + .entry_handler = fp_entry_handler, + }; + + current_test = test; + + /* Before register, unregister should be failed. */ + KUNIT_EXPECT_NE(test, 0, unregister_fprobe(&fp_entry)); + KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp_entry, "fprobe_selftest_target*", NULL)); + + entry_val = 0; + exit_val = 0; + target(rand1); + KUNIT_EXPECT_NE(test, 0, entry_val); + KUNIT_EXPECT_EQ(test, 0, exit_val); + + entry_val = 0; + exit_val = 0; + target2(rand1); + KUNIT_EXPECT_NE(test, 0, entry_val); + KUNIT_EXPECT_EQ(test, 0, exit_val); + + KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp_entry)); +} + +static void test_fprobe(struct kunit *test) +{ + struct fprobe fp = { + .entry_handler = fp_entry_handler, + .exit_handler = fp_exit_handler, + }; + + current_test = test; + KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp, "fprobe_selftest_target*", NULL)); + + entry_val = 0; + exit_val = 0; + target(rand1); + KUNIT_EXPECT_NE(test, 0, entry_val); + KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val); + + entry_val = 0; + exit_val = 0; + target2(rand1); + KUNIT_EXPECT_NE(test, 0, entry_val); + KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val); + + KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp)); +} + +static void test_fprobe_syms(struct kunit *test) +{ + static const char *syms[] = {"fprobe_selftest_target", "fprobe_selftest_target2"}; + struct fprobe fp = { + .entry_handler = fp_entry_handler, + .exit_handler = fp_exit_handler, + }; + + current_test = test; + KUNIT_EXPECT_EQ(test, 0, register_fprobe_syms(&fp, syms, 2)); + + entry_val = 0; + exit_val = 0; + target(rand1); + KUNIT_EXPECT_NE(test, 0, entry_val); + KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val); + + entry_val = 0; + exit_val = 0; + target2(rand1); + KUNIT_EXPECT_NE(test, 0, entry_val); + KUNIT_EXPECT_EQ(test, entry_val + div_factor, exit_val); + + KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp)); +} + +static unsigned long get_ftrace_location(void *func) +{ + unsigned long size, addr = (unsigned long)func; + + if (!kallsyms_lookup_size_offset(addr, &size, NULL) || !size) + return 0; + + return ftrace_location_range(addr, addr + size - 1); +} + +static int fprobe_test_init(struct kunit *test) +{ + do { + rand1 = prandom_u32(); + } while (rand1 <= div_factor); + + target = fprobe_selftest_target; + target2 = fprobe_selftest_target2; + target_ip = get_ftrace_location(target); + target2_ip = get_ftrace_location(target2); + + return 0; +} + +static struct kunit_case fprobe_testcases[] = { + KUNIT_CASE(test_fprobe_entry), + KUNIT_CASE(test_fprobe), + KUNIT_CASE(test_fprobe_syms), + {} +}; + +static struct kunit_suite fprobe_test_suite = { + .name = "fprobe_test", + .init = fprobe_test_init, + .test_cases = fprobe_testcases, +}; + +kunit_test_suites(&fprobe_test_suite); + +MODULE_LICENSE("GPL"); diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index eb129e48f90b..e7b9c2636d10 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -15,6 +15,7 @@ #include <net/sock.h> #include <net/tcp.h> #include <net/net_namespace.h> +#include <net/page_pool.h> #include <linux/error-injection.h> #include <linux/smp.h> #include <linux/sock_diag.h> @@ -53,10 +54,11 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t) rcu_read_unlock(); } -static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *err, u32 *duration) +static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations, + u32 repeat, int *err, u32 *duration) __must_hold(rcu) { - t->i++; + t->i += iterations; if (t->i >= repeat) { /* We're done. */ t->time_spent += ktime_get_ns() - t->time_start; @@ -88,6 +90,284 @@ reset: return false; } +/* We put this struct at the head of each page with a context and frame + * initialised when the page is allocated, so we don't have to do this on each + * repetition of the test run. + */ +struct xdp_page_head { + struct xdp_buff orig_ctx; + struct xdp_buff ctx; + struct xdp_frame frm; + u8 data[]; +}; + +struct xdp_test_data { + struct xdp_buff *orig_ctx; + struct xdp_rxq_info rxq; + struct net_device *dev; + struct page_pool *pp; + struct xdp_frame **frames; + struct sk_buff **skbs; + u32 batch_size; + u32 frame_cnt; +}; + +#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)) +#define TEST_XDP_MAX_BATCH 256 + +static void xdp_test_run_init_page(struct page *page, void *arg) +{ + struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); + struct xdp_buff *new_ctx, *orig_ctx; + u32 headroom = XDP_PACKET_HEADROOM; + struct xdp_test_data *xdp = arg; + size_t frm_len, meta_len; + struct xdp_frame *frm; + void *data; + + orig_ctx = xdp->orig_ctx; + frm_len = orig_ctx->data_end - orig_ctx->data_meta; + meta_len = orig_ctx->data - orig_ctx->data_meta; + headroom -= meta_len; + + new_ctx = &head->ctx; + frm = &head->frm; + data = &head->data; + memcpy(data + headroom, orig_ctx->data_meta, frm_len); + + xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq); + xdp_prepare_buff(new_ctx, data, headroom, frm_len, true); + new_ctx->data = new_ctx->data_meta + meta_len; + + xdp_update_frame_from_buff(new_ctx, frm); + frm->mem = new_ctx->rxq->mem; + + memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx)); +} + +static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx) +{ + struct xdp_mem_info mem = {}; + struct page_pool *pp; + int err = -ENOMEM; + struct page_pool_params pp_params = { + .order = 0, + .flags = 0, + .pool_size = xdp->batch_size, + .nid = NUMA_NO_NODE, + .init_callback = xdp_test_run_init_page, + .init_arg = xdp, + }; + + xdp->frames = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL); + if (!xdp->frames) + return -ENOMEM; + + xdp->skbs = kvmalloc_array(xdp->batch_size, sizeof(void *), GFP_KERNEL); + if (!xdp->skbs) + goto err_skbs; + + pp = page_pool_create(&pp_params); + if (IS_ERR(pp)) { + err = PTR_ERR(pp); + goto err_pp; + } + + /* will copy 'mem.id' into pp->xdp_mem_id */ + err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp); + if (err) + goto err_mmodel; + + xdp->pp = pp; + + /* We create a 'fake' RXQ referencing the original dev, but with an + * xdp_mem_info pointing to our page_pool + */ + xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0); + xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL; + xdp->rxq.mem.id = pp->xdp_mem_id; + xdp->dev = orig_ctx->rxq->dev; + xdp->orig_ctx = orig_ctx; + + return 0; + +err_mmodel: + page_pool_destroy(pp); +err_pp: + kvfree(xdp->skbs); +err_skbs: + kvfree(xdp->frames); + return err; +} + +static void xdp_test_run_teardown(struct xdp_test_data *xdp) +{ + page_pool_destroy(xdp->pp); + kfree(xdp->frames); + kfree(xdp->skbs); +} + +static bool ctx_was_changed(struct xdp_page_head *head) +{ + return head->orig_ctx.data != head->ctx.data || + head->orig_ctx.data_meta != head->ctx.data_meta || + head->orig_ctx.data_end != head->ctx.data_end; +} + +static void reset_ctx(struct xdp_page_head *head) +{ + if (likely(!ctx_was_changed(head))) + return; + + head->ctx.data = head->orig_ctx.data; + head->ctx.data_meta = head->orig_ctx.data_meta; + head->ctx.data_end = head->orig_ctx.data_end; + xdp_update_frame_from_buff(&head->ctx, &head->frm); +} + +static int xdp_recv_frames(struct xdp_frame **frames, int nframes, + struct sk_buff **skbs, + struct net_device *dev) +{ + gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; + int i, n; + LIST_HEAD(list); + + n = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, (void **)skbs); + if (unlikely(n == 0)) { + for (i = 0; i < nframes; i++) + xdp_return_frame(frames[i]); + return -ENOMEM; + } + + for (i = 0; i < nframes; i++) { + struct xdp_frame *xdpf = frames[i]; + struct sk_buff *skb = skbs[i]; + + skb = __xdp_build_skb_from_frame(xdpf, skb, dev); + if (!skb) { + xdp_return_frame(xdpf); + continue; + } + + list_add_tail(&skb->list, &list); + } + netif_receive_skb_list(&list); + + return 0; +} + +static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, + u32 repeat) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + int err = 0, act, ret, i, nframes = 0, batch_sz; + struct xdp_frame **frames = xdp->frames; + struct xdp_page_head *head; + struct xdp_frame *frm; + bool redirect = false; + struct xdp_buff *ctx; + struct page *page; + + batch_sz = min_t(u32, repeat, xdp->batch_size); + + local_bh_disable(); + xdp_set_return_frame_no_direct(); + + for (i = 0; i < batch_sz; i++) { + page = page_pool_dev_alloc_pages(xdp->pp); + if (!page) { + err = -ENOMEM; + goto out; + } + + head = phys_to_virt(page_to_phys(page)); + reset_ctx(head); + ctx = &head->ctx; + frm = &head->frm; + xdp->frame_cnt++; + + act = bpf_prog_run_xdp(prog, ctx); + + /* if program changed pkt bounds we need to update the xdp_frame */ + if (unlikely(ctx_was_changed(head))) { + ret = xdp_update_frame_from_buff(ctx, frm); + if (ret) { + xdp_return_buff(ctx); + continue; + } + } + + switch (act) { + case XDP_TX: + /* we can't do a real XDP_TX since we're not in the + * driver, so turn it into a REDIRECT back to the same + * index + */ + ri->tgt_index = xdp->dev->ifindex; + ri->map_id = INT_MAX; + ri->map_type = BPF_MAP_TYPE_UNSPEC; + fallthrough; + case XDP_REDIRECT: + redirect = true; + ret = xdp_do_redirect_frame(xdp->dev, ctx, frm, prog); + if (ret) + xdp_return_buff(ctx); + break; + case XDP_PASS: + frames[nframes++] = frm; + break; + default: + bpf_warn_invalid_xdp_action(NULL, prog, act); + fallthrough; + case XDP_DROP: + xdp_return_buff(ctx); + break; + } + } + +out: + if (redirect) + xdp_do_flush(); + if (nframes) { + ret = xdp_recv_frames(frames, nframes, xdp->skbs, xdp->dev); + if (ret) + err = ret; + } + + xdp_clear_return_frame_no_direct(); + local_bh_enable(); + return err; +} + +static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx, + u32 repeat, u32 batch_size, u32 *time) + +{ + struct xdp_test_data xdp = { .batch_size = batch_size }; + struct bpf_test_timer t = { .mode = NO_MIGRATE }; + int ret; + + if (!repeat) + repeat = 1; + + ret = xdp_test_run_setup(&xdp, ctx); + if (ret) + return ret; + + bpf_test_timer_enter(&t); + do { + xdp.frame_cnt = 0; + ret = xdp_test_run_batch(&xdp, prog, repeat - t.i); + if (unlikely(ret < 0)) + break; + } while (bpf_test_timer_continue(&t, xdp.frame_cnt, repeat, &ret, time)); + bpf_test_timer_leave(&t); + + xdp_test_run_teardown(&xdp); + return ret; +} + static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *retval, u32 *time, bool xdp) { @@ -119,7 +399,7 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, *retval = bpf_prog_run_xdp(prog, ctx); else *retval = bpf_prog_run(prog, ctx); - } while (bpf_test_timer_continue(&t, repeat, &ret, time)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); bpf_test_timer_leave(&t); @@ -201,8 +481,8 @@ out: * future. */ __diag_push(); -__diag_ignore(GCC, 8, "-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); int noinline bpf_fentry_test1(int a) { return a + 1; @@ -270,9 +550,14 @@ struct sock * noinline bpf_kfunc_call_test3(struct sock *sk) return sk; } +struct prog_test_member { + u64 c; +}; + struct prog_test_ref_kfunc { int a; int b; + struct prog_test_member memb; struct prog_test_ref_kfunc *next; }; @@ -295,6 +580,10 @@ noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) { } +noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p) +{ +} + struct prog_test_pass1 { int x0; struct { @@ -379,6 +668,7 @@ BTF_ID(func, bpf_kfunc_call_test2) BTF_ID(func, bpf_kfunc_call_test3) BTF_ID(func, bpf_kfunc_call_test_acquire) BTF_ID(func, bpf_kfunc_call_test_release) +BTF_ID(func, bpf_kfunc_call_memb_release) BTF_ID(func, bpf_kfunc_call_test_pass_ctx) BTF_ID(func, bpf_kfunc_call_test_pass1) BTF_ID(func, bpf_kfunc_call_test_pass2) @@ -396,6 +686,7 @@ BTF_SET_END(test_sk_acquire_kfunc_ids) BTF_SET_START(test_sk_release_kfunc_ids) BTF_ID(func, bpf_kfunc_call_test_release) +BTF_ID(func, bpf_kfunc_call_memb_release) BTF_SET_END(test_sk_release_kfunc_ids) BTF_SET_START(test_sk_ret_null_kfunc_ids) @@ -435,7 +726,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog, int b = 2, err = -EFAULT; u32 retval = 0; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; switch (prog->expected_attach_type) { @@ -499,7 +790,7 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, /* doesn't support data_in/out, ctx_out, duration, or repeat */ if (kattr->test.data_in || kattr->test.data_out || kattr->test.ctx_out || kattr->test.duration || - kattr->test.repeat) + kattr->test.repeat || kattr->test.batch_size) return -EINVAL; if (ctx_size_in < prog->aux->max_ctx_offset || @@ -730,7 +1021,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, void *data; int ret; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; data = bpf_test_init(kattr, kattr->test.data_size_in, @@ -911,10 +1202,12 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md) int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { + bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES); u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + u32 batch_size = kattr->test.batch_size; + u32 retval = 0, duration, max_data_sz; u32 size = kattr->test.data_size_in; u32 headroom = XDP_PACKET_HEADROOM; - u32 retval, duration, max_data_sz; u32 repeat = kattr->test.repeat; struct netdev_rx_queue *rxqueue; struct skb_shared_info *sinfo; @@ -927,6 +1220,20 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, prog->expected_attach_type == BPF_XDP_CPUMAP) return -EINVAL; + if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES) + return -EINVAL; + + if (do_live) { + if (!batch_size) + batch_size = NAPI_POLL_WEIGHT; + else if (batch_size > TEST_XDP_MAX_BATCH) + return -E2BIG; + + headroom += sizeof(struct xdp_page_head); + } else if (batch_size) { + return -EINVAL; + } + ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md)); if (IS_ERR(ctx)) return PTR_ERR(ctx); @@ -935,14 +1242,20 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, /* There can't be user provided data before the meta data */ if (ctx->data_meta || ctx->data_end != size || ctx->data > ctx->data_end || - unlikely(xdp_metalen_invalid(ctx->data))) + unlikely(xdp_metalen_invalid(ctx->data)) || + (do_live && (kattr->test.data_out || kattr->test.ctx_out))) goto free_ctx; /* Meta data is allocated from the headroom */ headroom -= ctx->data; } max_data_sz = 4096 - headroom - tailroom; - size = min_t(u32, size, max_data_sz); + if (size > max_data_sz) { + /* disallow live data mode for jumbo frames */ + if (do_live) + goto free_ctx; + size = max_data_sz; + } data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom); if (IS_ERR(data)) { @@ -1000,7 +1313,10 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (repeat > 1) bpf_prog_change_xdp(NULL, prog); - ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); + if (do_live) + ret = bpf_test_run_xdp_live(prog, &xdp, repeat, batch_size, &duration); + else + ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); /* We convert the xdp_buff back to an xdp_md before checking the return * code so the reference count of any held netdevice will be decremented * even if the test run failed. @@ -1062,7 +1378,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR) return -EINVAL; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; if (size < ETH_HLEN) @@ -1097,7 +1413,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, do { retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN, size, flags); - } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) @@ -1129,7 +1445,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat if (prog->type != BPF_PROG_TYPE_SK_LOOKUP) return -EINVAL; - if (kattr->test.flags || kattr->test.cpu) + if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; if (kattr->test.data_in || kattr->test.data_size_in || kattr->test.data_out || @@ -1192,7 +1508,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat do { ctx.selected_sk = NULL; retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, bpf_prog_run); - } while (bpf_test_timer_continue(&t, repeat, &ret, &duration)); + } while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration)); bpf_test_timer_leave(&t); if (ret < 0) @@ -1231,7 +1547,8 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, /* doesn't support data_in/out, ctx_out, duration, or repeat or flags */ if (kattr->test.data_in || kattr->test.data_out || kattr->test.ctx_out || kattr->test.duration || - kattr->test.repeat || kattr->test.flags) + kattr->test.repeat || kattr->test.flags || + kattr->test.batch_size) return -EINVAL; if (ctx_size_in < prog->aux->max_ctx_offset || diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index d9c37fd10809..e3ac36380520 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -141,7 +141,7 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, if (sock) { sdata = bpf_local_storage_update( sock->sk, (struct bpf_local_storage_map *)map, value, - map_flags); + map_flags, GFP_ATOMIC); sockfd_put(sock); return PTR_ERR_OR_ZERO(sdata); } @@ -172,7 +172,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk, { struct bpf_local_storage_elem *copy_selem; - copy_selem = bpf_selem_alloc(smap, newsk, NULL, true); + copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, GFP_ATOMIC); if (!copy_selem) return NULL; @@ -230,7 +230,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) bpf_selem_link_map(smap, copy_selem); bpf_selem_link_storage_nolock(new_sk_storage, copy_selem); } else { - ret = bpf_local_storage_alloc(newsk, smap, copy_selem); + ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC); if (ret) { kfree(copy_selem); atomic_sub(smap->elem_size, @@ -255,8 +255,9 @@ out: return ret; } -BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, - void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, + void *, value, u64, flags, gfp_t, gfp_flags) { struct bpf_local_storage_data *sdata; @@ -277,7 +278,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, refcount_inc_not_zero(&sk->sk_refcnt)) { sdata = bpf_local_storage_update( sk, (struct bpf_local_storage_map *)map, value, - BPF_NOEXIST); + BPF_NOEXIST, gfp_flags); /* sk must be a fullsock (guaranteed by verifier), * so sock_gen_put() is unnecessary. */ @@ -405,6 +406,8 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog) case BPF_TRACE_FENTRY: case BPF_TRACE_FEXIT: btf_vmlinux = bpf_get_btf_vmlinux(); + if (IS_ERR_OR_NULL(btf_vmlinux)) + return false; btf_id = prog->aux->attach_btf_id; t = btf_type_by_id(btf_vmlinux, btf_id); tname = btf_name_by_offset(btf_vmlinux, t->name_off); @@ -417,14 +420,16 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog) return false; } -BPF_CALL_4(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk, - void *, value, u64, flags) +/* *gfp_flags* is a hidden argument provided by the verifier */ +BPF_CALL_5(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk, + void *, value, u64, flags, gfp_t, gfp_flags) { WARN_ON_ONCE(!bpf_rcu_lock_held()); if (in_hardirq() || in_nmi()) return (unsigned long)NULL; - return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags); + return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags, + gfp_flags); } BPF_CALL_2(bpf_sk_storage_delete_tracing, struct bpf_map *, map, diff --git a/net/core/filter.c b/net/core/filter.c index 88767f7da150..a7044e98765e 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7388,36 +7388,36 @@ static const struct bpf_func_proto bpf_sock_ops_reserve_hdr_opt_proto = { .arg3_type = ARG_ANYTHING, }; -BPF_CALL_3(bpf_skb_set_delivery_time, struct sk_buff *, skb, - u64, dtime, u32, dtime_type) +BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb, + u64, tstamp, u32, tstamp_type) { /* skb_clear_delivery_time() is done for inet protocol */ if (skb->protocol != htons(ETH_P_IP) && skb->protocol != htons(ETH_P_IPV6)) return -EOPNOTSUPP; - switch (dtime_type) { - case BPF_SKB_DELIVERY_TIME_MONO: - if (!dtime) + switch (tstamp_type) { + case BPF_SKB_TSTAMP_DELIVERY_MONO: + if (!tstamp) return -EINVAL; - skb->tstamp = dtime; + skb->tstamp = tstamp; skb->mono_delivery_time = 1; break; - case BPF_SKB_DELIVERY_TIME_NONE: - if (dtime) + case BPF_SKB_TSTAMP_UNSPEC: + if (tstamp) return -EINVAL; skb->tstamp = 0; skb->mono_delivery_time = 0; break; default: - return -EOPNOTSUPP; + return -EINVAL; } return 0; } -static const struct bpf_func_proto bpf_skb_set_delivery_time_proto = { - .func = bpf_skb_set_delivery_time, +static const struct bpf_func_proto bpf_skb_set_tstamp_proto = { + .func = bpf_skb_set_tstamp, .gpl_only = false, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_CTX, @@ -7786,8 +7786,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_tcp_gen_syncookie_proto; case BPF_FUNC_sk_assign: return &bpf_sk_assign_proto; - case BPF_FUNC_skb_set_delivery_time: - return &bpf_skb_set_delivery_time_proto; + case BPF_FUNC_skb_set_tstamp: + return &bpf_skb_set_tstamp_proto; #endif default: return bpf_sk_base_func_proto(func_id); @@ -8127,9 +8127,9 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type return false; info->reg_type = PTR_TO_SOCK_COMMON_OR_NULL; break; - case offsetof(struct __sk_buff, delivery_time_type): + case offsetof(struct __sk_buff, tstamp_type): return false; - case offsetofend(struct __sk_buff, delivery_time_type) ... offsetof(struct __sk_buff, hwtstamp) - 1: + case offsetofend(struct __sk_buff, tstamp_type) ... offsetof(struct __sk_buff, hwtstamp) - 1: /* Explicitly prohibit access to padding in __sk_buff. */ return false; default: @@ -8484,14 +8484,14 @@ static bool tc_cls_act_is_valid_access(int off, int size, break; case bpf_ctx_range_till(struct __sk_buff, family, local_port): return false; - case offsetof(struct __sk_buff, delivery_time_type): + case offsetof(struct __sk_buff, tstamp_type): /* The convert_ctx_access() on reading and writing * __sk_buff->tstamp depends on whether the bpf prog - * has used __sk_buff->delivery_time_type or not. - * Thus, we need to set prog->delivery_time_access + * has used __sk_buff->tstamp_type or not. + * Thus, we need to set prog->tstamp_type_access * earlier during is_valid_access() here. */ - ((struct bpf_prog *)prog)->delivery_time_access = 1; + ((struct bpf_prog *)prog)->tstamp_type_access = 1; return size == sizeof(__u8); } @@ -8888,42 +8888,22 @@ static u32 flow_dissector_convert_ctx_access(enum bpf_access_type type, return insn - insn_buf; } -static struct bpf_insn *bpf_convert_dtime_type_read(const struct bpf_insn *si, - struct bpf_insn *insn) +static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, + struct bpf_insn *insn) { __u8 value_reg = si->dst_reg; __u8 skb_reg = si->src_reg; + /* AX is needed because src_reg and dst_reg could be the same */ __u8 tmp_reg = BPF_REG_AX; *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, - SKB_MONO_DELIVERY_TIME_OFFSET); - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, - SKB_MONO_DELIVERY_TIME_MASK); - *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2); - /* value_reg = BPF_SKB_DELIVERY_TIME_MONO */ - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_MONO); - *insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 10 : 5); - - *insn++ = BPF_LDX_MEM(BPF_DW, tmp_reg, skb_reg, - offsetof(struct sk_buff, tstamp)); - *insn++ = BPF_JMP_IMM(BPF_JNE, tmp_reg, 0, 2); - /* value_reg = BPF_SKB_DELIVERY_TIME_NONE */ - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_NONE); - *insn++ = BPF_JMP_A(IS_ENABLED(CONFIG_NET_CLS_ACT) ? 6 : 1); - -#ifdef CONFIG_NET_CLS_ACT - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET); - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK); - *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2); - /* At ingress, value_reg = 0 */ - *insn++ = BPF_MOV32_IMM(value_reg, 0); + PKT_VLAN_PRESENT_OFFSET); + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, + SKB_MONO_DELIVERY_TIME_MASK, 2); + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); *insn++ = BPF_JMP_A(1); -#endif - - /* value_reg = BPF_SKB_DELIVERYT_TIME_UNSPEC */ - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_DELIVERY_TIME_UNSPEC); + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); - /* 15 insns with CONFIG_NET_CLS_ACT */ return insn; } @@ -8956,21 +8936,22 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, __u8 skb_reg = si->src_reg; #ifdef CONFIG_NET_CLS_ACT - if (!prog->delivery_time_access) { + /* If the tstamp_type is read, + * the bpf prog is aware the tstamp could have delivery time. + * Thus, read skb->tstamp as is if tstamp_type_access is true. + */ + if (!prog->tstamp_type_access) { + /* AX is needed because src_reg and dst_reg could be the same */ __u8 tmp_reg = BPF_REG_AX; - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET); - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK); - *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 5); - /* @ingress, read __sk_buff->tstamp as the (rcv) timestamp, - * so check the skb->mono_delivery_time. - */ - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, - SKB_MONO_DELIVERY_TIME_OFFSET); + *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, PKT_VLAN_PRESENT_OFFSET); *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, - SKB_MONO_DELIVERY_TIME_MASK); - *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 2); - /* skb->mono_delivery_time is set, read 0 as the (rcv) timestamp. */ + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); + *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); + /* skb->tc_at_ingress && skb->mono_delivery_time, + * read 0 as the (rcv) timestamp. + */ *insn++ = BPF_MOV64_IMM(value_reg, 0); *insn++ = BPF_JMP_A(1); } @@ -8989,25 +8970,27 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog, __u8 skb_reg = si->dst_reg; #ifdef CONFIG_NET_CLS_ACT - if (!prog->delivery_time_access) { + /* If the tstamp_type is read, + * the bpf prog is aware the tstamp could have delivery time. + * Thus, write skb->tstamp as is if tstamp_type_access is true. + * Otherwise, writing at ingress will have to clear the + * mono_delivery_time bit also. + */ + if (!prog->tstamp_type_access) { __u8 tmp_reg = BPF_REG_AX; - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, TC_AT_INGRESS_OFFSET); - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK); - *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0, 3); - /* Writing __sk_buff->tstamp at ingress as the (rcv) timestamp. - * Clear the skb->mono_delivery_time. - */ - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, - SKB_MONO_DELIVERY_TIME_OFFSET); - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, - ~SKB_MONO_DELIVERY_TIME_MASK); - *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, - SKB_MONO_DELIVERY_TIME_OFFSET); + *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, PKT_VLAN_PRESENT_OFFSET); + /* Writing __sk_buff->tstamp as ingress, goto <clear> */ + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); + /* goto <store> */ + *insn++ = BPF_JMP_A(2); + /* <clear>: mono_delivery_time */ + *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK); + *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, PKT_VLAN_PRESENT_OFFSET); } #endif - /* skb->tstamp = tstamp */ + /* <store>: skb->tstamp = tstamp */ *insn++ = BPF_STX_MEM(BPF_DW, skb_reg, value_reg, offsetof(struct sk_buff, tstamp)); return insn; @@ -9326,8 +9309,8 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, insn = bpf_convert_tstamp_read(prog, si, insn); break; - case offsetof(struct __sk_buff, delivery_time_type): - insn = bpf_convert_dtime_type_read(si, insn); + case offsetof(struct __sk_buff, tstamp_type): + insn = bpf_convert_tstamp_type_read(si, insn); break; case offsetof(struct __sk_buff, gso_segs): @@ -11006,13 +10989,24 @@ static bool sk_lookup_is_valid_access(int off, int size, case bpf_ctx_range(struct bpf_sk_lookup, local_ip4): case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]): case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]): - case offsetof(struct bpf_sk_lookup, remote_port) ... - offsetof(struct bpf_sk_lookup, local_ip4) - 1: case bpf_ctx_range(struct bpf_sk_lookup, local_port): case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex): bpf_ctx_record_field_size(info, sizeof(__u32)); return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32)); + case bpf_ctx_range(struct bpf_sk_lookup, remote_port): + /* Allow 4-byte access to 2-byte field for backward compatibility */ + if (size == sizeof(__u32)) + return true; + bpf_ctx_record_field_size(info, sizeof(__be16)); + return bpf_ctx_narrow_access_ok(off, size, sizeof(__be16)); + + case offsetofend(struct bpf_sk_lookup, remote_port) ... + offsetof(struct bpf_sk_lookup, local_ip4) - 1: + /* Allow access to zero padding for backward compatibility */ + bpf_ctx_record_field_size(info, sizeof(__u16)); + return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16)); + default: return false; } @@ -11094,6 +11088,11 @@ static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type, sport, 2, target_size)); break; + case offsetofend(struct bpf_sk_lookup, remote_port): + *target_size = 2; + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0); + break; + case offsetof(struct bpf_sk_lookup, local_port): *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, bpf_target_off(struct bpf_sk_lookup_kern, diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 929a2b096b04..cc381165ea08 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -27,6 +27,7 @@ int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len, int elem_first_coalesce) { struct page_frag *pfrag = sk_page_frag(sk); + u32 osize = msg->sg.size; int ret = 0; len -= msg->sg.size; @@ -35,13 +36,17 @@ int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len, u32 orig_offset; int use, i; - if (!sk_page_frag_refill(sk, pfrag)) - return -ENOMEM; + if (!sk_page_frag_refill(sk, pfrag)) { + ret = -ENOMEM; + goto msg_trim; + } orig_offset = pfrag->offset; use = min_t(int, len, pfrag->size - orig_offset); - if (!sk_wmem_schedule(sk, use)) - return -ENOMEM; + if (!sk_wmem_schedule(sk, use)) { + ret = -ENOMEM; + goto msg_trim; + } i = msg->sg.end; sk_msg_iter_var_prev(i); @@ -71,6 +76,10 @@ int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len, } return ret; + +msg_trim: + sk_msg_trim(sk, msg, osize); + return ret; } EXPORT_SYMBOL_GPL(sk_msg_alloc); diff --git a/net/core/xdp.c b/net/core/xdp.c index 7577adf19ef4..24420209bf0e 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -529,6 +529,7 @@ void xdp_return_buff(struct xdp_buff *xdp) out: __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp); } +EXPORT_SYMBOL_GPL(xdp_return_buff); /* Only called for MEM_TYPE_PAGE_POOL see xdp.h */ void __xdp_release_frame(void *data, struct xdp_mem_info *mem) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 9b9b02052fd3..1cdcb4df0eb7 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -138,10 +138,9 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, struct sk_psock *psock = sk_psock_get(sk); int ret; - if (unlikely(!psock)) { - sk_msg_free(sk, msg); - return 0; - } + if (unlikely(!psock)) + return -EPIPE; + ret = ingress ? bpf_tcp_ingress(sk, psock, msg, bytes, flags) : tcp_bpf_push_locked(sk, msg, bytes, flags, false); sk_psock_put(sk, psock); @@ -335,7 +334,7 @@ more_data: cork = true; psock->cork = NULL; } - sk_msg_return(sk, msg, tosend); + sk_msg_return(sk, msg, msg->sg.size); release_sock(sk); ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags); @@ -375,8 +374,11 @@ more_data: } if (msg && msg->sg.data[msg->sg.start].page_link && - msg->sg.data[msg->sg.start].length) + msg->sg.data[msg->sg.start].length) { + if (eval == __SK_REDIRECT) + sk_mem_charge(sk, msg->sg.size); goto more_data; + } } return ret; } diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c index 8ad3f52579f3..fe98673dd5ac 100644 --- a/net/netfilter/nf_conntrack_bpf.c +++ b/net/netfilter/nf_conntrack_bpf.c @@ -12,6 +12,7 @@ #include <linux/btf_ids.h> #include <linux/net_namespace.h> #include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_conntrack_bpf.h> #include <net/netfilter/nf_conntrack_core.h> /* bpf_ct_opts - Options for CT lookup helpers @@ -102,8 +103,8 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net, } __diag_push(); -__diag_ignore(GCC, 8, "-Wmissing-prototypes", - "Global functions as their definitions will be in nf_conntrack BTF"); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in nf_conntrack BTF"); /* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a * reference to it diff --git a/samples/Kconfig b/samples/Kconfig index 22cc921ae291..8415d60ea5f4 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -73,6 +73,13 @@ config SAMPLE_HW_BREAKPOINT help This builds kernel hardware breakpoint example modules. +config SAMPLE_FPROBE + tristate "Build fprobe examples -- loadable modules only" + depends on FPROBE && m + help + This builds a fprobe example module. This module has an option 'symbol'. + You can specify a probed symbol or symbols separated with ','. + config SAMPLE_KFIFO tristate "Build kfifo examples -- loadable modules only" depends on m diff --git a/samples/Makefile b/samples/Makefile index 1ae4de99c983..6d662965be5b 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -33,3 +33,4 @@ subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/ obj-$(CONFIG_SAMPLE_CORESIGHT_SYSCFG) += coresight/ +obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/ diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index 19288a2bbc75..6f3fe30ad283 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -1984,15 +1984,15 @@ int main(int argc, char **argv) setlocale(LC_ALL, ""); + prev_time = get_nsecs(); + start_time = prev_time; + if (!opt_quiet) { ret = pthread_create(&pt, NULL, poller, NULL); if (ret) exit_with_error(ret); } - prev_time = get_nsecs(); - start_time = prev_time; - /* Configure sched priority for better wake-up accuracy */ memset(&schparam, 0, sizeof(schparam)); schparam.sched_priority = opt_schprio; diff --git a/samples/fprobe/Makefile b/samples/fprobe/Makefile new file mode 100644 index 000000000000..ecccbfa6e99b --- /dev/null +++ b/samples/fprobe/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-$(CONFIG_SAMPLE_FPROBE) += fprobe_example.o diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c new file mode 100644 index 000000000000..24d3cf109140 --- /dev/null +++ b/samples/fprobe/fprobe_example.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Here's a sample kernel module showing the use of fprobe to dump a + * stack trace and selected registers when kernel_clone() is called. + * + * For more information on theory of operation of kprobes, see + * Documentation/trace/kprobes.rst + * + * You will see the trace data in /var/log/messages and on the console + * whenever kernel_clone() is invoked to create a new process. + */ + +#define pr_fmt(fmt) "%s: " fmt, __func__ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/fprobe.h> +#include <linux/sched/debug.h> +#include <linux/slab.h> + +#define BACKTRACE_DEPTH 16 +#define MAX_SYMBOL_LEN 4096 +struct fprobe sample_probe; + +static char symbol[MAX_SYMBOL_LEN] = "kernel_clone"; +module_param_string(symbol, symbol, sizeof(symbol), 0644); +static char nosymbol[MAX_SYMBOL_LEN] = ""; +module_param_string(nosymbol, nosymbol, sizeof(nosymbol), 0644); +static bool stackdump = true; +module_param(stackdump, bool, 0644); + +static void show_backtrace(void) +{ + unsigned long stacks[BACKTRACE_DEPTH]; + unsigned int len; + + len = stack_trace_save(stacks, BACKTRACE_DEPTH, 2); + stack_trace_print(stacks, len, 24); +} + +static void sample_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) +{ + pr_info("Enter <%pS> ip = 0x%p\n", (void *)ip, (void *)ip); + if (stackdump) + show_backtrace(); +} + +static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs) +{ + unsigned long rip = instruction_pointer(regs); + + pr_info("Return from <%pS> ip = 0x%p to rip = 0x%p (%pS)\n", + (void *)ip, (void *)ip, (void *)rip, (void *)rip); + if (stackdump) + show_backtrace(); +} + +static int __init fprobe_init(void) +{ + char *p, *symbuf = NULL; + const char **syms; + int ret, count, i; + + sample_probe.entry_handler = sample_entry_handler; + sample_probe.exit_handler = sample_exit_handler; + + if (strchr(symbol, '*')) { + /* filter based fprobe */ + ret = register_fprobe(&sample_probe, symbol, + nosymbol[0] == '\0' ? NULL : nosymbol); + goto out; + } else if (!strchr(symbol, ',')) { + symbuf = symbol; + ret = register_fprobe_syms(&sample_probe, (const char **)&symbuf, 1); + goto out; + } + + /* Comma separated symbols */ + symbuf = kstrdup(symbol, GFP_KERNEL); + if (!symbuf) + return -ENOMEM; + p = symbuf; + count = 1; + while ((p = strchr(++p, ',')) != NULL) + count++; + + pr_info("%d symbols found\n", count); + + syms = kcalloc(count, sizeof(char *), GFP_KERNEL); + if (!syms) { + kfree(symbuf); + return -ENOMEM; + } + + p = symbuf; + for (i = 0; i < count; i++) + syms[i] = strsep(&p, ","); + + ret = register_fprobe_syms(&sample_probe, syms, count); + kfree(syms); + kfree(symbuf); +out: + if (ret < 0) + pr_err("register_fprobe failed, returned %d\n", ret); + else + pr_info("Planted fprobe at %s\n", symbol); + + return ret; +} + +static void __exit fprobe_exit(void) +{ + unregister_fprobe(&sample_probe); + + pr_info("fprobe at %s unregistered\n", symbol); +} + +module_init(fprobe_init) +module_exit(fprobe_exit) +MODULE_LICENSE("GPL"); diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 8c6e4514d494..ed1a82f1def3 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -418,6 +418,7 @@ int ima_file_mmap(struct file *file, unsigned long prot) /** * ima_file_mprotect - based on policy, limit mprotect change + * @vma: vm_area_struct protection is set to * @prot: contains the protection that will be applied by the kernel. * * Files can be mmap'ed read/write and later changed to execute to circumvent @@ -519,20 +520,38 @@ int ima_file_check(struct file *file, int mask) } EXPORT_SYMBOL_GPL(ima_file_check); -static int __ima_inode_hash(struct inode *inode, char *buf, size_t buf_size) +static int __ima_inode_hash(struct inode *inode, struct file *file, char *buf, + size_t buf_size) { - struct integrity_iint_cache *iint; - int hash_algo; + struct integrity_iint_cache *iint = NULL, tmp_iint; + int rc, hash_algo; - if (!ima_policy_flag) - return -EOPNOTSUPP; + if (ima_policy_flag) { + iint = integrity_iint_find(inode); + if (iint) + mutex_lock(&iint->mutex); + } + + if ((!iint || !(iint->flags & IMA_COLLECTED)) && file) { + if (iint) + mutex_unlock(&iint->mutex); + + memset(&tmp_iint, 0, sizeof(tmp_iint)); + tmp_iint.inode = inode; + mutex_init(&tmp_iint.mutex); + + rc = ima_collect_measurement(&tmp_iint, file, NULL, 0, + ima_hash_algo, NULL); + if (rc < 0) + return -EOPNOTSUPP; + + iint = &tmp_iint; + mutex_lock(&iint->mutex); + } - iint = integrity_iint_find(inode); if (!iint) return -EOPNOTSUPP; - mutex_lock(&iint->mutex); - /* * ima_file_hash can be called when ima_collect_measurement has still * not been called, we might not always have a hash. @@ -551,12 +570,14 @@ static int __ima_inode_hash(struct inode *inode, char *buf, size_t buf_size) hash_algo = iint->ima_hash->algo; mutex_unlock(&iint->mutex); + if (iint == &tmp_iint) + kfree(iint->ima_hash); + return hash_algo; } /** - * ima_file_hash - return the stored measurement if a file has been hashed and - * is in the iint cache. + * ima_file_hash - return a measurement of the file * @file: pointer to the file * @buf: buffer in which to store the hash * @buf_size: length of the buffer @@ -569,7 +590,7 @@ static int __ima_inode_hash(struct inode *inode, char *buf, size_t buf_size) * The file hash returned is based on the entire file, including the appended * signature. * - * If IMA is disabled or if no measurement is available, return -EOPNOTSUPP. + * If the measurement cannot be performed, return -EOPNOTSUPP. * If the parameters are incorrect, return -EINVAL. */ int ima_file_hash(struct file *file, char *buf, size_t buf_size) @@ -577,7 +598,7 @@ int ima_file_hash(struct file *file, char *buf, size_t buf_size) if (!file) return -EINVAL; - return __ima_inode_hash(file_inode(file), buf, buf_size); + return __ima_inode_hash(file_inode(file), file, buf, buf_size); } EXPORT_SYMBOL_GPL(ima_file_hash); @@ -604,14 +625,14 @@ int ima_inode_hash(struct inode *inode, char *buf, size_t buf_size) if (!inode) return -EINVAL; - return __ima_inode_hash(inode, buf, buf_size); + return __ima_inode_hash(inode, NULL, buf, buf_size); } EXPORT_SYMBOL_GPL(ima_inode_hash); /** * ima_post_create_tmpfile - mark newly created tmpfile as new - * @mnt_userns: user namespace of the mount the inode was found from - * @file : newly created tmpfile + * @mnt_userns: user namespace of the mount the inode was found from + * @inode: inode of the newly created tmpfile * * No measuring, appraising or auditing of newly created tmpfiles is needed. * Skip calling process_measurement(), but indicate which newly, created @@ -643,7 +664,7 @@ void ima_post_create_tmpfile(struct user_namespace *mnt_userns, /** * ima_post_path_mknod - mark as a new inode - * @mnt_userns: user namespace of the mount the inode was found from + * @mnt_userns: user namespace of the mount the inode was found from * @dentry: newly created dentry * * Mark files created via the mknodat syscall as new, so that the @@ -814,8 +835,8 @@ int ima_load_data(enum kernel_load_data_id id, bool contents) * ima_post_load_data - appraise decision based on policy * @buf: pointer to in memory file contents * @size: size of in memory file contents - * @id: kernel load data caller identifier - * @description: @id-specific description of contents + * @load_id: kernel load data caller identifier + * @description: @load_id-specific description of contents * * Measure/appraise/audit in memory buffer based on policy. Policy rules * are written in terms of a policy identifier. diff --git a/tools/bpf/bpftool/Documentation/bpftool-gen.rst b/tools/bpf/bpftool/Documentation/bpftool-gen.rst index 18d646b571ec..68454ef28f58 100644 --- a/tools/bpf/bpftool/Documentation/bpftool-gen.rst +++ b/tools/bpf/bpftool/Documentation/bpftool-gen.rst @@ -25,6 +25,7 @@ GEN COMMANDS | **bpftool** **gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...] | **bpftool** **gen skeleton** *FILE* [**name** *OBJECT_NAME*] +| **bpftool** **gen subskeleton** *FILE* [**name** *OBJECT_NAME*] | **bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...] | **bpftool** **gen help** @@ -150,6 +151,30 @@ DESCRIPTION (non-read-only) data from userspace, with same simplicity as for BPF side. + **bpftool gen subskeleton** *FILE* + Generate BPF subskeleton C header file for a given *FILE*. + + Subskeletons are similar to skeletons, except they do not own + the corresponding maps, programs, or global variables. They + require that the object file used to generate them is already + loaded into a *bpf_object* by some other means. + + This functionality is useful when a library is included into a + larger BPF program. A subskeleton for the library would have + access to all objects and globals defined in it, without + having to know about the larger program. + + Consequently, there are only two functions defined + for subskeletons: + + - **example__open(bpf_object\*)** + Instantiates a subskeleton from an already opened (but not + necessarily loaded) **bpf_object**. + + - **example__destroy()** + Frees the storage for the subskeleton but *does not* unload + any BPF programs or maps. + **bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...] Generate a minimum BTF file as *OUTPUT*, derived from a given *INPUT* BTF file, containing all needed BTF types so one, or diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst index 7084dd9fa2f8..6965c94dfdaf 100644 --- a/tools/bpf/bpftool/Documentation/bpftool.rst +++ b/tools/bpf/bpftool/Documentation/bpftool.rst @@ -20,7 +20,8 @@ SYNOPSIS **bpftool** **version** - *OBJECT* := { **map** | **program** | **cgroup** | **perf** | **net** | **feature** } + *OBJECT* := { **map** | **program** | **link** | **cgroup** | **perf** | **net** | **feature** | + **btf** | **gen** | **struct_ops** | **iter** } *OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| } @@ -31,6 +32,8 @@ SYNOPSIS *PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin** | **load** | **attach** | **detach** | **help** } + *LINK-COMMANDS* := { **show** | **list** | **pin** | **detach** | **help** } + *CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** } *PERF-COMMANDS* := { **show** | **list** | **help** } @@ -39,6 +42,14 @@ SYNOPSIS *FEATURE-COMMANDS* := { **probe** | **help** } + *BTF-COMMANDS* := { **show** | **list** | **dump** | **help** } + + *GEN-COMMANDS* := { **object** | **skeleton** | **min_core_btf** | **help** } + + *STRUCT-OPS-COMMANDS* := { **show** | **list** | **dump** | **register** | **unregister** | **help** } + + *ITER-COMMANDS* := { **pin** | **help** } + DESCRIPTION =========== *bpftool* allows for inspection and simple modification of BPF objects diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool index 958e1fd71b5c..5df8d72c5179 100644 --- a/tools/bpf/bpftool/bash-completion/bpftool +++ b/tools/bpf/bpftool/bash-completion/bpftool @@ -1003,13 +1003,25 @@ _bpftool() ;; esac ;; + subskeleton) + case $prev in + $command) + _filedir + return 0 + ;; + *) + _bpftool_once_attr 'name' + return 0 + ;; + esac + ;; min_core_btf) _filedir return 0 ;; *) [[ $prev == $object ]] && \ - COMPREPLY=( $( compgen -W 'object skeleton help min_core_btf' -- "$cur" ) ) + COMPREPLY=( $( compgen -W 'object skeleton subskeleton help min_core_btf' -- "$cur" ) ) ;; esac ;; diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 606743c6db41..0c1e06cf50b9 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -56,7 +56,6 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6", [BPF_CGROUP_GETSOCKOPT] = "getsockopt", [BPF_CGROUP_SETSOCKOPT] = "setsockopt", - [BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict", [BPF_SK_SKB_VERDICT] = "sk_skb_verdict", @@ -76,6 +75,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_SK_REUSEPORT_SELECT] = "sk_skb_reuseport_select", [BPF_SK_REUSEPORT_SELECT_OR_MIGRATE] = "sk_skb_reuseport_select_or_migrate", [BPF_PERF_EVENT] = "perf_event", + [BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi", }; void p_err(const char *fmt, ...) diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c index 9c894b1447de..c2f43a5d38e0 100644 --- a/tools/bpf/bpftool/feature.c +++ b/tools/bpf/bpftool/feature.c @@ -3,6 +3,7 @@ #include <ctype.h> #include <errno.h> +#include <fcntl.h> #include <string.h> #include <unistd.h> #include <net/if.h> @@ -45,6 +46,11 @@ static bool run_as_unprivileged; /* Miscellaneous utility functions */ +static bool grep(const char *buffer, const char *pattern) +{ + return !!strstr(buffer, pattern); +} + static bool check_procfs(void) { struct statfs st_fs; @@ -135,6 +141,32 @@ static void print_end_section(void) /* Probing functions */ +static int get_vendor_id(int ifindex) +{ + char ifname[IF_NAMESIZE], path[64], buf[8]; + ssize_t len; + int fd; + + if (!if_indextoname(ifindex, ifname)) + return -1; + + snprintf(path, sizeof(path), "/sys/class/net/%s/device/vendor", ifname); + + fd = open(path, O_RDONLY | O_CLOEXEC); + if (fd < 0) + return -1; + + len = read(fd, buf, sizeof(buf)); + close(fd); + if (len < 0) + return -1; + if (len >= (ssize_t)sizeof(buf)) + return -1; + buf[len] = '\0'; + + return strtol(buf, NULL, 0); +} + static int read_procfs(const char *path) { char *endptr, *line = NULL; @@ -478,6 +510,40 @@ static bool probe_bpf_syscall(const char *define_prefix) return res; } +static bool +probe_prog_load_ifindex(enum bpf_prog_type prog_type, + const struct bpf_insn *insns, size_t insns_cnt, + char *log_buf, size_t log_buf_sz, + __u32 ifindex) +{ + LIBBPF_OPTS(bpf_prog_load_opts, opts, + .log_buf = log_buf, + .log_size = log_buf_sz, + .log_level = log_buf ? 1 : 0, + .prog_ifindex = ifindex, + ); + int fd; + + errno = 0; + fd = bpf_prog_load(prog_type, NULL, "GPL", insns, insns_cnt, &opts); + if (fd >= 0) + close(fd); + + return fd >= 0 && errno != EINVAL && errno != EOPNOTSUPP; +} + +static bool probe_prog_type_ifindex(enum bpf_prog_type prog_type, __u32 ifindex) +{ + /* nfp returns -EINVAL on exit(0) with TC offload */ + struct bpf_insn insns[2] = { + BPF_MOV64_IMM(BPF_REG_0, 2), + BPF_EXIT_INSN() + }; + + return probe_prog_load_ifindex(prog_type, insns, ARRAY_SIZE(insns), + NULL, 0, ifindex); +} + static void probe_prog_type(enum bpf_prog_type prog_type, bool *supported_types, const char *define_prefix, __u32 ifindex) @@ -488,11 +554,19 @@ probe_prog_type(enum bpf_prog_type prog_type, bool *supported_types, bool res; if (ifindex) { - p_info("BPF offload feature probing is not supported"); - return; + switch (prog_type) { + case BPF_PROG_TYPE_SCHED_CLS: + case BPF_PROG_TYPE_XDP: + break; + default: + return; + } + + res = probe_prog_type_ifindex(prog_type, ifindex); + } else { + res = libbpf_probe_bpf_prog_type(prog_type, NULL); } - res = libbpf_probe_bpf_prog_type(prog_type, NULL); #ifdef USE_LIBCAP /* Probe may succeed even if program load fails, for unprivileged users * check that we did not fail because of insufficient permissions @@ -521,6 +595,26 @@ probe_prog_type(enum bpf_prog_type prog_type, bool *supported_types, define_prefix); } +static bool probe_map_type_ifindex(enum bpf_map_type map_type, __u32 ifindex) +{ + LIBBPF_OPTS(bpf_map_create_opts, opts); + int key_size, value_size, max_entries; + int fd; + + opts.map_ifindex = ifindex; + + key_size = sizeof(__u32); + value_size = sizeof(__u32); + max_entries = 1; + + fd = bpf_map_create(map_type, NULL, key_size, value_size, max_entries, + &opts); + if (fd >= 0) + close(fd); + + return fd >= 0; +} + static void probe_map_type(enum bpf_map_type map_type, const char *define_prefix, __u32 ifindex) @@ -531,11 +625,18 @@ probe_map_type(enum bpf_map_type map_type, const char *define_prefix, bool res; if (ifindex) { - p_info("BPF offload feature probing is not supported"); - return; - } + switch (map_type) { + case BPF_MAP_TYPE_HASH: + case BPF_MAP_TYPE_ARRAY: + break; + default: + return; + } - res = libbpf_probe_bpf_map_type(map_type, NULL); + res = probe_map_type_ifindex(map_type, ifindex); + } else { + res = libbpf_probe_bpf_map_type(map_type, NULL); + } /* Probe result depends on the success of map creation, no additional * check required for unprivileged users @@ -559,6 +660,33 @@ probe_map_type(enum bpf_map_type map_type, const char *define_prefix, define_prefix); } +static bool +probe_helper_ifindex(enum bpf_func_id id, enum bpf_prog_type prog_type, + __u32 ifindex) +{ + struct bpf_insn insns[2] = { + BPF_EMIT_CALL(id), + BPF_EXIT_INSN() + }; + char buf[4096] = {}; + bool res; + + probe_prog_load_ifindex(prog_type, insns, ARRAY_SIZE(insns), buf, + sizeof(buf), ifindex); + res = !grep(buf, "invalid func ") && !grep(buf, "unknown func "); + + switch (get_vendor_id(ifindex)) { + case 0x19ee: /* Netronome specific */ + res = res && !grep(buf, "not supported by FW") && + !grep(buf, "unsupported function id"); + break; + default: + break; + } + + return res; +} + static void probe_helper_for_progtype(enum bpf_prog_type prog_type, bool supported_type, const char *define_prefix, unsigned int id, @@ -567,12 +695,10 @@ probe_helper_for_progtype(enum bpf_prog_type prog_type, bool supported_type, bool res = false; if (supported_type) { - if (ifindex) { - p_info("BPF offload feature probing is not supported"); - return; - } - - res = libbpf_probe_bpf_helper(prog_type, id, NULL); + if (ifindex) + res = probe_helper_ifindex(id, prog_type, ifindex); + else + res = libbpf_probe_bpf_helper(prog_type, id, NULL); #ifdef USE_LIBCAP /* Probe may succeed even if program load fails, for * unprivileged users check that we did not fail because of diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c index 145734b4fe41..7ba7ff55d2ea 100644 --- a/tools/bpf/bpftool/gen.c +++ b/tools/bpf/bpftool/gen.c @@ -64,11 +64,11 @@ static void get_obj_name(char *name, const char *file) sanitize_identifier(name); } -static void get_header_guard(char *guard, const char *obj_name) +static void get_header_guard(char *guard, const char *obj_name, const char *suffix) { int i; - sprintf(guard, "__%s_SKEL_H__", obj_name); + sprintf(guard, "__%s_%s__", obj_name, suffix); for (i = 0; guard[i]; i++) guard[i] = toupper(guard[i]); } @@ -231,6 +231,17 @@ static const struct btf_type *find_type_for_map(struct btf *btf, const char *map return NULL; } +static bool is_internal_mmapable_map(const struct bpf_map *map, char *buf, size_t sz) +{ + if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) + return false; + + if (!get_map_ident(map, buf, sz)) + return false; + + return true; +} + static int codegen_datasecs(struct bpf_object *obj, const char *obj_name) { struct btf *btf = bpf_object__btf(obj); @@ -247,12 +258,7 @@ static int codegen_datasecs(struct bpf_object *obj, const char *obj_name) bpf_object__for_each_map(map, obj) { /* only generate definitions for memory-mapped internal maps */ - if (!bpf_map__is_internal(map)) - continue; - if (!(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) - continue; - - if (!get_map_ident(map, map_ident, sizeof(map_ident))) + if (!is_internal_mmapable_map(map, map_ident, sizeof(map_ident))) continue; sec = find_type_for_map(btf, map_ident); @@ -280,6 +286,96 @@ out: return err; } +static bool btf_is_ptr_to_func_proto(const struct btf *btf, + const struct btf_type *v) +{ + return btf_is_ptr(v) && btf_is_func_proto(btf__type_by_id(btf, v->type)); +} + +static int codegen_subskel_datasecs(struct bpf_object *obj, const char *obj_name) +{ + struct btf *btf = bpf_object__btf(obj); + struct btf_dump *d; + struct bpf_map *map; + const struct btf_type *sec, *var; + const struct btf_var_secinfo *sec_var; + int i, err = 0, vlen; + char map_ident[256], sec_ident[256]; + bool strip_mods = false, needs_typeof = false; + const char *sec_name, *var_name; + __u32 var_type_id; + + d = btf_dump__new(btf, codegen_btf_dump_printf, NULL, NULL); + if (!d) + return -errno; + + bpf_object__for_each_map(map, obj) { + /* only generate definitions for memory-mapped internal maps */ + if (!is_internal_mmapable_map(map, map_ident, sizeof(map_ident))) + continue; + + sec = find_type_for_map(btf, map_ident); + if (!sec) + continue; + + sec_name = btf__name_by_offset(btf, sec->name_off); + if (!get_datasec_ident(sec_name, sec_ident, sizeof(sec_ident))) + continue; + + strip_mods = strcmp(sec_name, ".kconfig") != 0; + printf(" struct %s__%s {\n", obj_name, sec_ident); + + sec_var = btf_var_secinfos(sec); + vlen = btf_vlen(sec); + for (i = 0; i < vlen; i++, sec_var++) { + DECLARE_LIBBPF_OPTS(btf_dump_emit_type_decl_opts, opts, + .indent_level = 2, + .strip_mods = strip_mods, + /* we'll print the name separately */ + .field_name = "", + ); + + var = btf__type_by_id(btf, sec_var->type); + var_name = btf__name_by_offset(btf, var->name_off); + var_type_id = var->type; + + /* static variables are not exposed through BPF skeleton */ + if (btf_var(var)->linkage == BTF_VAR_STATIC) + continue; + + /* The datasec member has KIND_VAR but we want the + * underlying type of the variable (e.g. KIND_INT). + */ + var = skip_mods_and_typedefs(btf, var->type, NULL); + + printf("\t\t"); + /* Func and array members require special handling. + * Instead of producing `typename *var`, they produce + * `typeof(typename) *var`. This allows us to keep a + * similar syntax where the identifier is just prefixed + * by *, allowing us to ignore C declaration minutiae. + */ + needs_typeof = btf_is_array(var) || btf_is_ptr_to_func_proto(btf, var); + if (needs_typeof) + printf("typeof("); + + err = btf_dump__emit_type_decl(d, var_type_id, &opts); + if (err) + goto out; + + if (needs_typeof) + printf(")"); + + printf(" *%s;\n", var_name); + } + printf(" } %s;\n", sec_ident); + } + +out: + btf_dump__free(d); + return err; +} + static void codegen(const char *template, ...) { const char *src, *end; @@ -389,11 +485,7 @@ static void codegen_asserts(struct bpf_object *obj, const char *obj_name) ", obj_name); bpf_object__for_each_map(map, obj) { - if (!bpf_map__is_internal(map)) - continue; - if (!(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) - continue; - if (!get_map_ident(map, map_ident, sizeof(map_ident))) + if (!is_internal_mmapable_map(map, map_ident, sizeof(map_ident))) continue; sec = find_type_for_map(btf, map_ident); @@ -608,11 +700,7 @@ static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *h const void *mmap_data = NULL; size_t mmap_size = 0; - if (!get_map_ident(map, ident, sizeof(ident))) - continue; - - if (!bpf_map__is_internal(map) || - !(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) + if (!is_internal_mmapable_map(map, ident, sizeof(ident))) continue; codegen("\ @@ -671,11 +759,7 @@ static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *h bpf_object__for_each_map(map, obj) { const char *mmap_flags; - if (!get_map_ident(map, ident, sizeof(ident))) - continue; - - if (!bpf_map__is_internal(map) || - !(bpf_map__map_flags(map) & BPF_F_MMAPABLE)) + if (!is_internal_mmapable_map(map, ident, sizeof(ident))) continue; if (bpf_map__map_flags(map) & BPF_F_RDONLY_PROG) @@ -727,10 +811,95 @@ out: return err; } +static void +codegen_maps_skeleton(struct bpf_object *obj, size_t map_cnt, bool mmaped) +{ + struct bpf_map *map; + char ident[256]; + size_t i; + + if (!map_cnt) + return; + + codegen("\ + \n\ + \n\ + /* maps */ \n\ + s->map_cnt = %zu; \n\ + s->map_skel_sz = sizeof(*s->maps); \n\ + s->maps = (struct bpf_map_skeleton *)calloc(s->map_cnt, s->map_skel_sz);\n\ + if (!s->maps) \n\ + goto err; \n\ + ", + map_cnt + ); + i = 0; + bpf_object__for_each_map(map, obj) { + if (!get_map_ident(map, ident, sizeof(ident))) + continue; + + codegen("\ + \n\ + \n\ + s->maps[%zu].name = \"%s\"; \n\ + s->maps[%zu].map = &obj->maps.%s; \n\ + ", + i, bpf_map__name(map), i, ident); + /* memory-mapped internal maps */ + if (mmaped && is_internal_mmapable_map(map, ident, sizeof(ident))) { + printf("\ts->maps[%zu].mmaped = (void **)&obj->%s;\n", + i, ident); + } + i++; + } +} + +static void +codegen_progs_skeleton(struct bpf_object *obj, size_t prog_cnt, bool populate_links) +{ + struct bpf_program *prog; + int i; + + if (!prog_cnt) + return; + + codegen("\ + \n\ + \n\ + /* programs */ \n\ + s->prog_cnt = %zu; \n\ + s->prog_skel_sz = sizeof(*s->progs); \n\ + s->progs = (struct bpf_prog_skeleton *)calloc(s->prog_cnt, s->prog_skel_sz);\n\ + if (!s->progs) \n\ + goto err; \n\ + ", + prog_cnt + ); + i = 0; + bpf_object__for_each_program(prog, obj) { + codegen("\ + \n\ + \n\ + s->progs[%1$zu].name = \"%2$s\"; \n\ + s->progs[%1$zu].prog = &obj->progs.%2$s;\n\ + ", + i, bpf_program__name(prog)); + + if (populate_links) { + codegen("\ + \n\ + s->progs[%1$zu].link = &obj->links.%2$s;\n\ + ", + i, bpf_program__name(prog)); + } + i++; + } +} + static int do_skeleton(int argc, char **argv) { char header_guard[MAX_OBJ_NAME_LEN + sizeof("__SKEL_H__")]; - size_t i, map_cnt = 0, prog_cnt = 0, file_sz, mmap_sz; + size_t map_cnt = 0, prog_cnt = 0, file_sz, mmap_sz; DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts); char obj_name[MAX_OBJ_NAME_LEN] = "", *obj_data; struct bpf_object *obj = NULL; @@ -821,7 +990,7 @@ static int do_skeleton(int argc, char **argv) prog_cnt++; } - get_header_guard(header_guard, obj_name); + get_header_guard(header_guard, obj_name, "SKEL_H"); if (use_loader) { codegen("\ \n\ @@ -1024,66 +1193,10 @@ static int do_skeleton(int argc, char **argv) ", obj_name ); - if (map_cnt) { - codegen("\ - \n\ - \n\ - /* maps */ \n\ - s->map_cnt = %zu; \n\ - s->map_skel_sz = sizeof(*s->maps); \n\ - s->maps = (struct bpf_map_skeleton *)calloc(s->map_cnt, s->map_skel_sz);\n\ - if (!s->maps) \n\ - goto err; \n\ - ", - map_cnt - ); - i = 0; - bpf_object__for_each_map(map, obj) { - if (!get_map_ident(map, ident, sizeof(ident))) - continue; - codegen("\ - \n\ - \n\ - s->maps[%zu].name = \"%s\"; \n\ - s->maps[%zu].map = &obj->maps.%s; \n\ - ", - i, bpf_map__name(map), i, ident); - /* memory-mapped internal maps */ - if (bpf_map__is_internal(map) && - (bpf_map__map_flags(map) & BPF_F_MMAPABLE)) { - printf("\ts->maps[%zu].mmaped = (void **)&obj->%s;\n", - i, ident); - } - i++; - } - } - if (prog_cnt) { - codegen("\ - \n\ - \n\ - /* programs */ \n\ - s->prog_cnt = %zu; \n\ - s->prog_skel_sz = sizeof(*s->progs); \n\ - s->progs = (struct bpf_prog_skeleton *)calloc(s->prog_cnt, s->prog_skel_sz);\n\ - if (!s->progs) \n\ - goto err; \n\ - ", - prog_cnt - ); - i = 0; - bpf_object__for_each_program(prog, obj) { - codegen("\ - \n\ - \n\ - s->progs[%1$zu].name = \"%2$s\"; \n\ - s->progs[%1$zu].prog = &obj->progs.%2$s;\n\ - s->progs[%1$zu].link = &obj->links.%2$s;\n\ - ", - i, bpf_program__name(prog)); - i++; - } - } + codegen_maps_skeleton(obj, map_cnt, true /*mmaped*/); + codegen_progs_skeleton(obj, prog_cnt, true /*populate_links*/); + codegen("\ \n\ \n\ @@ -1141,6 +1254,310 @@ out: return err; } +/* Subskeletons are like skeletons, except they don't own the bpf_object, + * associated maps, links, etc. Instead, they know about the existence of + * variables, maps, programs and are able to find their locations + * _at runtime_ from an already loaded bpf_object. + * + * This allows for library-like BPF objects to have userspace counterparts + * with access to their own items without having to know anything about the + * final BPF object that the library was linked into. + */ +static int do_subskeleton(int argc, char **argv) +{ + char header_guard[MAX_OBJ_NAME_LEN + sizeof("__SUBSKEL_H__")]; + size_t i, len, file_sz, map_cnt = 0, prog_cnt = 0, mmap_sz, var_cnt = 0, var_idx = 0; + DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts); + char obj_name[MAX_OBJ_NAME_LEN] = "", *obj_data; + struct bpf_object *obj = NULL; + const char *file, *var_name; + char ident[256]; + int fd, err = -1, map_type_id; + const struct bpf_map *map; + struct bpf_program *prog; + struct btf *btf; + const struct btf_type *map_type, *var_type; + const struct btf_var_secinfo *var; + struct stat st; + + if (!REQ_ARGS(1)) { + usage(); + return -1; + } + file = GET_ARG(); + + while (argc) { + if (!REQ_ARGS(2)) + return -1; + + if (is_prefix(*argv, "name")) { + NEXT_ARG(); + + if (obj_name[0] != '\0') { + p_err("object name already specified"); + return -1; + } + + strncpy(obj_name, *argv, MAX_OBJ_NAME_LEN - 1); + obj_name[MAX_OBJ_NAME_LEN - 1] = '\0'; + } else { + p_err("unknown arg %s", *argv); + return -1; + } + + NEXT_ARG(); + } + + if (argc) { + p_err("extra unknown arguments"); + return -1; + } + + if (use_loader) { + p_err("cannot use loader for subskeletons"); + return -1; + } + + if (stat(file, &st)) { + p_err("failed to stat() %s: %s", file, strerror(errno)); + return -1; + } + file_sz = st.st_size; + mmap_sz = roundup(file_sz, sysconf(_SC_PAGE_SIZE)); + fd = open(file, O_RDONLY); + if (fd < 0) { + p_err("failed to open() %s: %s", file, strerror(errno)); + return -1; + } + obj_data = mmap(NULL, mmap_sz, PROT_READ, MAP_PRIVATE, fd, 0); + if (obj_data == MAP_FAILED) { + obj_data = NULL; + p_err("failed to mmap() %s: %s", file, strerror(errno)); + goto out; + } + if (obj_name[0] == '\0') + get_obj_name(obj_name, file); + + /* The empty object name allows us to use bpf_map__name and produce + * ELF section names out of it. (".data" instead of "obj.data") + */ + opts.object_name = ""; + obj = bpf_object__open_mem(obj_data, file_sz, &opts); + if (!obj) { + char err_buf[256]; + + libbpf_strerror(errno, err_buf, sizeof(err_buf)); + p_err("failed to open BPF object file: %s", err_buf); + obj = NULL; + goto out; + } + + btf = bpf_object__btf(obj); + if (!btf) { + err = -1; + p_err("need btf type information for %s", obj_name); + goto out; + } + + bpf_object__for_each_program(prog, obj) { + prog_cnt++; + } + + /* First, count how many variables we have to find. + * We need this in advance so the subskel can allocate the right + * amount of storage. + */ + bpf_object__for_each_map(map, obj) { + if (!get_map_ident(map, ident, sizeof(ident))) + continue; + + /* Also count all maps that have a name */ + map_cnt++; + + if (!is_internal_mmapable_map(map, ident, sizeof(ident))) + continue; + + map_type_id = bpf_map__btf_value_type_id(map); + if (map_type_id <= 0) { + err = map_type_id; + goto out; + } + map_type = btf__type_by_id(btf, map_type_id); + + var = btf_var_secinfos(map_type); + len = btf_vlen(map_type); + for (i = 0; i < len; i++, var++) { + var_type = btf__type_by_id(btf, var->type); + + if (btf_var(var_type)->linkage == BTF_VAR_STATIC) + continue; + + var_cnt++; + } + } + + get_header_guard(header_guard, obj_name, "SUBSKEL_H"); + codegen("\ + \n\ + /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ \n\ + \n\ + /* THIS FILE IS AUTOGENERATED! */ \n\ + #ifndef %2$s \n\ + #define %2$s \n\ + \n\ + #include <errno.h> \n\ + #include <stdlib.h> \n\ + #include <bpf/libbpf.h> \n\ + \n\ + struct %1$s { \n\ + struct bpf_object *obj; \n\ + struct bpf_object_subskeleton *subskel; \n\ + ", obj_name, header_guard); + + if (map_cnt) { + printf("\tstruct {\n"); + bpf_object__for_each_map(map, obj) { + if (!get_map_ident(map, ident, sizeof(ident))) + continue; + printf("\t\tstruct bpf_map *%s;\n", ident); + } + printf("\t} maps;\n"); + } + + if (prog_cnt) { + printf("\tstruct {\n"); + bpf_object__for_each_program(prog, obj) { + printf("\t\tstruct bpf_program *%s;\n", + bpf_program__name(prog)); + } + printf("\t} progs;\n"); + } + + err = codegen_subskel_datasecs(obj, obj_name); + if (err) + goto out; + + /* emit code that will allocate enough storage for all symbols */ + codegen("\ + \n\ + \n\ + #ifdef __cplusplus \n\ + static inline struct %1$s *open(const struct bpf_object *src);\n\ + static inline void destroy(struct %1$s *skel); \n\ + #endif /* __cplusplus */ \n\ + }; \n\ + \n\ + static inline void \n\ + %1$s__destroy(struct %1$s *skel) \n\ + { \n\ + if (!skel) \n\ + return; \n\ + if (skel->subskel) \n\ + bpf_object__destroy_subskeleton(skel->subskel);\n\ + free(skel); \n\ + } \n\ + \n\ + static inline struct %1$s * \n\ + %1$s__open(const struct bpf_object *src) \n\ + { \n\ + struct %1$s *obj; \n\ + struct bpf_object_subskeleton *s; \n\ + int err; \n\ + \n\ + obj = (struct %1$s *)calloc(1, sizeof(*obj)); \n\ + if (!obj) { \n\ + errno = ENOMEM; \n\ + goto err; \n\ + } \n\ + s = (struct bpf_object_subskeleton *)calloc(1, sizeof(*s));\n\ + if (!s) { \n\ + errno = ENOMEM; \n\ + goto err; \n\ + } \n\ + s->sz = sizeof(*s); \n\ + s->obj = src; \n\ + s->var_skel_sz = sizeof(*s->vars); \n\ + obj->subskel = s; \n\ + \n\ + /* vars */ \n\ + s->var_cnt = %2$d; \n\ + s->vars = (struct bpf_var_skeleton *)calloc(%2$d, sizeof(*s->vars));\n\ + if (!s->vars) { \n\ + errno = ENOMEM; \n\ + goto err; \n\ + } \n\ + ", + obj_name, var_cnt + ); + + /* walk through each symbol and emit the runtime representation */ + bpf_object__for_each_map(map, obj) { + if (!is_internal_mmapable_map(map, ident, sizeof(ident))) + continue; + + map_type_id = bpf_map__btf_value_type_id(map); + if (map_type_id <= 0) + /* skip over internal maps with no type*/ + continue; + + map_type = btf__type_by_id(btf, map_type_id); + var = btf_var_secinfos(map_type); + len = btf_vlen(map_type); + for (i = 0; i < len; i++, var++) { + var_type = btf__type_by_id(btf, var->type); + var_name = btf__name_by_offset(btf, var_type->name_off); + + if (btf_var(var_type)->linkage == BTF_VAR_STATIC) + continue; + + /* Note that we use the dot prefix in .data as the + * field access operator i.e. maps%s becomes maps.data + */ + codegen("\ + \n\ + \n\ + s->vars[%3$d].name = \"%1$s\"; \n\ + s->vars[%3$d].map = &obj->maps.%2$s; \n\ + s->vars[%3$d].addr = (void **) &obj->%2$s.%1$s;\n\ + ", var_name, ident, var_idx); + + var_idx++; + } + } + + codegen_maps_skeleton(obj, map_cnt, false /*mmaped*/); + codegen_progs_skeleton(obj, prog_cnt, false /*links*/); + + codegen("\ + \n\ + \n\ + err = bpf_object__open_subskeleton(s); \n\ + if (err) \n\ + goto err; \n\ + \n\ + return obj; \n\ + err: \n\ + %1$s__destroy(obj); \n\ + return NULL; \n\ + } \n\ + \n\ + #ifdef __cplusplus \n\ + struct %1$s *%1$s::open(const struct bpf_object *src) { return %1$s__open(src); }\n\ + void %1$s::destroy(struct %1$s *skel) { %1$s__destroy(skel); }\n\ + #endif /* __cplusplus */ \n\ + \n\ + #endif /* %2$s */ \n\ + ", + obj_name, header_guard); + err = 0; +out: + bpf_object__close(obj); + if (obj_data) + munmap(obj_data, mmap_sz); + close(fd); + return err; +} + static int do_object(int argc, char **argv) { struct bpf_linker *linker; @@ -1192,6 +1609,7 @@ static int do_help(int argc, char **argv) fprintf(stderr, "Usage: %1$s %2$s object OUTPUT_FILE INPUT_FILE [INPUT_FILE...]\n" " %1$s %2$s skeleton FILE [name OBJECT_NAME]\n" + " %1$s %2$s subskeleton FILE [name OBJECT_NAME]\n" " %1$s %2$s min_core_btf INPUT OUTPUT OBJECT [OBJECT...]\n" " %1$s %2$s help\n" "\n" @@ -1788,6 +2206,7 @@ static int do_min_core_btf(int argc, char **argv) static const struct cmd cmds[] = { { "object", do_object }, { "skeleton", do_skeleton }, + { "subskeleton", do_subskeleton }, { "min_core_btf", do_min_core_btf}, { "help", do_help }, { 0 } diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 0468e5b24bd4..6e9277ffc68c 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -113,7 +113,9 @@ struct obj_ref { struct obj_refs { int ref_cnt; + bool has_bpf_cookie; struct obj_ref *refs; + __u64 bpf_cookie; }; struct btf; diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index e746642de292..c26378f20831 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -504,7 +504,7 @@ static int show_map_close_json(int fd, struct bpf_map_info *info) jsonw_uint_field(json_wtr, "max_entries", info->max_entries); if (memlock) - jsonw_int_field(json_wtr, "bytes_memlock", atoi(memlock)); + jsonw_int_field(json_wtr, "bytes_memlock", atoll(memlock)); free(memlock); if (info->type == BPF_MAP_TYPE_PROG_ARRAY) { @@ -620,17 +620,14 @@ static int show_map_close_plain(int fd, struct bpf_map_info *info) u32_as_hash_field(info->id)) printf("\n\tpinned %s", (char *)entry->value); } - printf("\n"); if (frozen_str) { frozen = atoi(frozen_str); free(frozen_str); } - if (!info->btf_id && !frozen) - return 0; - - printf("\t"); + if (info->btf_id || frozen) + printf("\n\t"); if (info->btf_id) printf("btf_id %d", info->btf_id); diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c index 7c384d10e95f..bb6c969a114a 100644 --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -78,6 +78,8 @@ static void add_ref(struct hashmap *map, struct pid_iter_entry *e) ref->pid = e->pid; memcpy(ref->comm, e->comm, sizeof(ref->comm)); refs->ref_cnt = 1; + refs->has_bpf_cookie = e->has_bpf_cookie; + refs->bpf_cookie = e->bpf_cookie; err = hashmap__append(map, u32_as_hash_field(e->id), refs); if (err) @@ -205,6 +207,9 @@ void emit_obj_refs_json(struct hashmap *map, __u32 id, if (refs->ref_cnt == 0) break; + if (refs->has_bpf_cookie) + jsonw_lluint_field(json_writer, "bpf_cookie", refs->bpf_cookie); + jsonw_name(json_writer, "pids"); jsonw_start_array(json_writer); for (i = 0; i < refs->ref_cnt; i++) { @@ -234,6 +239,9 @@ void emit_obj_refs_plain(struct hashmap *map, __u32 id, const char *prefix) if (refs->ref_cnt == 0) break; + if (refs->has_bpf_cookie) + printf("\n\tbpf_cookie %llu", (unsigned long long) refs->bpf_cookie); + printf("%s", prefix); for (i = 0; i < refs->ref_cnt; i++) { struct obj_ref *ref = &refs->refs[i]; diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 8a52eed19fa2..bc4e05542c2b 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -485,7 +485,7 @@ static void print_prog_json(struct bpf_prog_info *info, int fd) memlock = get_fdinfo(fd, "memlock"); if (memlock) - jsonw_int_field(json_wtr, "bytes_memlock", atoi(memlock)); + jsonw_int_field(json_wtr, "bytes_memlock", atoll(memlock)); free(memlock); if (info->nr_map_ids) diff --git a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c index f70702fcb224..eb05ea53afb1 100644 --- a/tools/bpf/bpftool/skeleton/pid_iter.bpf.c +++ b/tools/bpf/bpftool/skeleton/pid_iter.bpf.c @@ -38,6 +38,17 @@ static __always_inline __u32 get_obj_id(void *ent, enum bpf_obj_type type) } } +/* could be used only with BPF_LINK_TYPE_PERF_EVENT links */ +static __u64 get_bpf_cookie(struct bpf_link *link) +{ + struct bpf_perf_link *perf_link; + struct perf_event *event; + + perf_link = container_of(link, struct bpf_perf_link, link); + event = BPF_CORE_READ(perf_link, perf_file, private_data); + return BPF_CORE_READ(event, bpf_cookie); +} + SEC("iter/task_file") int iter(struct bpf_iter__task_file *ctx) { @@ -69,8 +80,19 @@ int iter(struct bpf_iter__task_file *ctx) if (file->f_op != fops) return 0; + __builtin_memset(&e, 0, sizeof(e)); e.pid = task->tgid; e.id = get_obj_id(file->private_data, obj_type); + + if (obj_type == BPF_OBJ_LINK) { + struct bpf_link *link = (struct bpf_link *) file->private_data; + + if (BPF_CORE_READ(link, type) == BPF_LINK_TYPE_PERF_EVENT) { + e.has_bpf_cookie = true; + e.bpf_cookie = get_bpf_cookie(link); + } + } + bpf_probe_read_kernel_str(&e.comm, sizeof(e.comm), task->group_leader->comm); bpf_seq_write(ctx->meta->seq, &e, sizeof(e)); diff --git a/tools/bpf/bpftool/skeleton/pid_iter.h b/tools/bpf/bpftool/skeleton/pid_iter.h index 5692cf257adb..bbb570d4cca6 100644 --- a/tools/bpf/bpftool/skeleton/pid_iter.h +++ b/tools/bpf/bpftool/skeleton/pid_iter.h @@ -6,6 +6,8 @@ struct pid_iter_entry { __u32 id; int pid; + __u64 bpf_cookie; + bool has_bpf_cookie; char comm[16]; }; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 4eebea830613..7604e7d5438f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -997,6 +997,7 @@ enum bpf_attach_type { BPF_SK_REUSEPORT_SELECT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, BPF_PERF_EVENT, + BPF_TRACE_KPROBE_MULTI, __MAX_BPF_ATTACH_TYPE }; @@ -1011,6 +1012,7 @@ enum bpf_link_type { BPF_LINK_TYPE_NETNS = 5, BPF_LINK_TYPE_XDP = 6, BPF_LINK_TYPE_PERF_EVENT = 7, + BPF_LINK_TYPE_KPROBE_MULTI = 8, MAX_BPF_LINK_TYPE, }; @@ -1118,6 +1120,11 @@ enum bpf_link_type { */ #define BPF_F_XDP_HAS_FRAGS (1U << 5) +/* link_create.kprobe_multi.flags used in LINK_CREATE command for + * BPF_TRACE_KPROBE_MULTI attach type to create return probe. + */ +#define BPF_F_KPROBE_MULTI_RETURN (1U << 0) + /* When BPF ldimm64's insn[0].src_reg != 0 then this can have * the following extensions: * @@ -1232,6 +1239,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, XDP frames will be transmitted after processing */ +#define BPF_F_TEST_XDP_LIVE_FRAMES (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type { @@ -1393,6 +1402,7 @@ union bpf_attr { __aligned_u64 ctx_out; __u32 flags; __u32 cpu; + __u32 batch_size; } test; struct { /* anonymous struct used by BPF_*_GET_*_ID */ @@ -1472,6 +1482,13 @@ union bpf_attr { */ __u64 bpf_cookie; } perf_event; + struct { + __u32 flags; + __u32 cnt; + __aligned_u64 syms; + __aligned_u64 addrs; + __aligned_u64 cookies; + } kprobe_multi; }; } link_create; @@ -2299,8 +2316,8 @@ union bpf_attr { * Return * The return value depends on the result of the test, and can be: * - * * 0, if current task belongs to the cgroup2. - * * 1, if current task does not belong to the cgroup2. + * * 1, if current task belongs to the cgroup2. + * * 0, if current task does not belong to the cgroup2. * * A negative error code, if an error occurred. * * long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags) @@ -5087,23 +5104,22 @@ union bpf_attr { * 0 on success, or a negative error in case of failure. On error * *dst* buffer is zeroed out. * - * long bpf_skb_set_delivery_time(struct sk_buff *skb, u64 dtime, u32 dtime_type) + * long bpf_skb_set_tstamp(struct sk_buff *skb, u64 tstamp, u32 tstamp_type) * Description - * Set a *dtime* (delivery time) to the __sk_buff->tstamp and also - * change the __sk_buff->delivery_time_type to *dtime_type*. - * - * When setting a delivery time (non zero *dtime*) to - * __sk_buff->tstamp, only BPF_SKB_DELIVERY_TIME_MONO *dtime_type* - * is supported. It is the only delivery_time_type that will be - * kept after bpf_redirect_*(). + * Change the __sk_buff->tstamp_type to *tstamp_type* + * and set *tstamp* to the __sk_buff->tstamp together. * - * If there is no need to change the __sk_buff->delivery_time_type, - * the delivery time can be directly written to __sk_buff->tstamp + * If there is no need to change the __sk_buff->tstamp_type, + * the tstamp value can be directly written to __sk_buff->tstamp * instead. * - * *dtime* 0 and *dtime_type* BPF_SKB_DELIVERY_TIME_NONE - * can be used to clear any delivery time stored in - * __sk_buff->tstamp. + * BPF_SKB_TSTAMP_DELIVERY_MONO is the only tstamp that + * will be kept during bpf_redirect_*(). A non zero + * *tstamp* must be used with the BPF_SKB_TSTAMP_DELIVERY_MONO + * *tstamp_type*. + * + * A BPF_SKB_TSTAMP_UNSPEC *tstamp_type* can only be used + * with a zero *tstamp*. * * Only IPv4 and IPv6 skb->protocol are supported. * @@ -5116,7 +5132,17 @@ union bpf_attr { * Return * 0 on success. * **-EINVAL** for invalid input - * **-EOPNOTSUPP** for unsupported delivery_time_type and protocol + * **-EOPNOTSUPP** for unsupported protocol + * + * long bpf_ima_file_hash(struct file *file, void *dst, u32 size) + * Description + * Returns a calculated IMA hash of the *file*. + * If the hash is larger than *size*, then only *size* + * bytes will be copied to *dst* + * Return + * The **hash_algo** is returned on success, + * **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if + * invalid arguments are passed. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5311,7 +5337,8 @@ union bpf_attr { FN(xdp_load_bytes), \ FN(xdp_store_bytes), \ FN(copy_from_user_task), \ - FN(skb_set_delivery_time), \ + FN(skb_set_tstamp), \ + FN(ima_file_hash), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -5502,9 +5529,12 @@ union { \ } __attribute__((aligned(8))) enum { - BPF_SKB_DELIVERY_TIME_NONE, - BPF_SKB_DELIVERY_TIME_UNSPEC, - BPF_SKB_DELIVERY_TIME_MONO, + BPF_SKB_TSTAMP_UNSPEC, + BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */ + /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle, + * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC + * and try to deduce it by ingress, egress or skb->sk->sk_clockid. + */ }; /* user accessible mirror of in-kernel sk_buff. @@ -5547,7 +5577,7 @@ struct __sk_buff { __u32 gso_segs; __bpf_md_ptr(struct bpf_sock *, sk); __u32 gso_size; - __u8 delivery_time_type; + __u8 tstamp_type; __u32 :24; /* Padding, future use. */ __u64 hwtstamp; }; diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 418b259166f8..cf27251adb92 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -29,6 +29,7 @@ #include <errno.h> #include <linux/bpf.h> #include <linux/filter.h> +#include <linux/kernel.h> #include <limits.h> #include <sys/resource.h> #include "bpf.h" @@ -111,7 +112,7 @@ int probe_memcg_account(void) BPF_EMIT_CALL(BPF_FUNC_ktime_get_coarse_ns), BPF_EXIT_INSN(), }; - size_t insn_cnt = sizeof(insns) / sizeof(insns[0]); + size_t insn_cnt = ARRAY_SIZE(insns); union bpf_attr attr; int prog_fd; @@ -853,6 +854,15 @@ int bpf_link_create(int prog_fd, int target_fd, if (!OPTS_ZEROED(opts, perf_event)) return libbpf_err(-EINVAL); break; + case BPF_TRACE_KPROBE_MULTI: + attr.link_create.kprobe_multi.flags = OPTS_GET(opts, kprobe_multi.flags, 0); + attr.link_create.kprobe_multi.cnt = OPTS_GET(opts, kprobe_multi.cnt, 0); + attr.link_create.kprobe_multi.syms = ptr_to_u64(OPTS_GET(opts, kprobe_multi.syms, 0)); + attr.link_create.kprobe_multi.addrs = ptr_to_u64(OPTS_GET(opts, kprobe_multi.addrs, 0)); + attr.link_create.kprobe_multi.cookies = ptr_to_u64(OPTS_GET(opts, kprobe_multi.cookies, 0)); + if (!OPTS_ZEROED(opts, kprobe_multi)) + return libbpf_err(-EINVAL); + break; default: if (!OPTS_ZEROED(opts, flags)) return libbpf_err(-EINVAL); @@ -994,6 +1004,7 @@ int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts) memset(&attr, 0, sizeof(attr)); attr.test.prog_fd = prog_fd; + attr.test.batch_size = OPTS_GET(opts, batch_size, 0); attr.test.cpu = OPTS_GET(opts, cpu, 0); attr.test.flags = OPTS_GET(opts, flags, 0); attr.test.repeat = OPTS_GET(opts, repeat, 0); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 16b21757b8bf..f4b4afb6d4ba 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -413,10 +413,17 @@ struct bpf_link_create_opts { struct { __u64 bpf_cookie; } perf_event; + struct { + __u32 flags; + __u32 cnt; + const char **syms; + const unsigned long *addrs; + const __u64 *cookies; + } kprobe_multi; }; size_t :0; }; -#define bpf_link_create_opts__last_field perf_event +#define bpf_link_create_opts__last_field kprobe_multi.cookies LIBBPF_API int bpf_link_create(int prog_fd, int target_fd, enum bpf_attach_type attach_type, @@ -512,8 +519,9 @@ struct bpf_test_run_opts { __u32 duration; /* out: average per repetition in ns */ __u32 flags; __u32 cpu; + __u32 batch_size; }; -#define bpf_test_run_opts__last_field cpu +#define bpf_test_run_opts__last_field batch_size LIBBPF_API int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts); diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 81bf01d67671..809fe209cdcc 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -201,12 +201,6 @@ struct reloc_desc { }; }; -struct bpf_sec_def; - -typedef int (*init_fn_t)(struct bpf_program *prog, long cookie); -typedef int (*preload_fn_t)(struct bpf_program *prog, struct bpf_prog_load_opts *opts, long cookie); -typedef struct bpf_link *(*attach_fn_t)(const struct bpf_program *prog, long cookie); - /* stored as sec_def->cookie for all libbpf-supported SEC()s */ enum sec_def_flags { SEC_NONE = 0, @@ -234,14 +228,15 @@ enum sec_def_flags { }; struct bpf_sec_def { - const char *sec; + char *sec; enum bpf_prog_type prog_type; enum bpf_attach_type expected_attach_type; long cookie; + int handler_id; - init_fn_t init_fn; - preload_fn_t preload_fn; - attach_fn_t attach_fn; + libbpf_prog_setup_fn_t prog_setup_fn; + libbpf_prog_prepare_load_fn_t prog_prepare_load_fn; + libbpf_prog_attach_fn_t prog_attach_fn; }; /* @@ -1523,6 +1518,9 @@ static char *internal_map_name(struct bpf_object *obj, const char *real_name) } static int +bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map); + +static int bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, const char *real_name, int sec_idx, void *data, size_t data_sz) { @@ -1569,6 +1567,9 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, return err; } + /* failures are fine because of maps like .rodata.str1.1 */ + (void) bpf_map_find_btf_info(obj, map); + if (data) memcpy(map->mmaped, data, data_sz); @@ -2051,6 +2052,9 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict) } memcpy(&map->def, def, sizeof(struct bpf_map_def)); } + + /* btf info may not exist but fill it in if it does exist */ + (void) bpf_map_find_btf_info(obj, map); } return 0; } @@ -2539,6 +2543,10 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj, fill_map_from_def(map->inner_map, &inner_def); } + err = bpf_map_find_btf_info(obj, map); + if (err) + return err; + return 0; } @@ -3837,7 +3845,14 @@ static bool prog_is_subprog(const struct bpf_object *obj, * .text programs are subprograms (even if they are not called from * other programs), because libbpf never explicitly supported mixing * SEC()-designated BPF programs and .text entry-point BPF programs. + * + * In libbpf 1.0 strict mode, we always consider .text + * programs to be subprograms. */ + + if (libbpf_mode & LIBBPF_STRICT_SEC_NAME) + return prog->sec_idx == obj->efile.text_shndx; + return prog->sec_idx == obj->efile.text_shndx && obj->nr_programs > 1; } @@ -4182,6 +4197,9 @@ static int bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map) __u32 key_type_id = 0, value_type_id = 0; int ret; + if (!obj->btf) + return -ENOENT; + /* if it's BTF-defined map, we don't need to search for type IDs. * For struct_ops map, it does not need btf_key_type_id and * btf_value_type_id. @@ -4805,8 +4823,8 @@ bpf_object__reuse_map(struct bpf_map *map) } err = bpf_map__reuse_fd(map, pin_fd); + close(pin_fd); if (err) { - close(pin_fd); return err; } map->pinned = true; @@ -4871,7 +4889,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b if (bpf_map__is_struct_ops(map)) create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; - if (obj->btf && btf__fd(obj->btf) >= 0 && !bpf_map_find_btf_info(obj, map)) { + if (obj->btf && btf__fd(obj->btf) >= 0) { create_attr.btf_fd = btf__fd(obj->btf); create_attr.btf_key_type_id = map->btf_key_type_id; create_attr.btf_value_type_id = map->btf_value_type_id; @@ -6572,9 +6590,9 @@ static int bpf_object__sanitize_prog(struct bpf_object *obj, struct bpf_program static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attach_name, int *btf_obj_fd, int *btf_type_id); -/* this is called as prog->sec_def->preload_fn for libbpf-supported sec_defs */ -static int libbpf_preload_prog(struct bpf_program *prog, - struct bpf_prog_load_opts *opts, long cookie) +/* this is called as prog->sec_def->prog_prepare_load_fn for libbpf-supported sec_defs */ +static int libbpf_prepare_prog_load(struct bpf_program *prog, + struct bpf_prog_load_opts *opts, long cookie) { enum sec_def_flags def = cookie; @@ -6670,8 +6688,8 @@ static int bpf_object_load_prog_instance(struct bpf_object *obj, struct bpf_prog load_attr.fd_array = obj->fd_array; /* adjust load_attr if sec_def provides custom preload callback */ - if (prog->sec_def && prog->sec_def->preload_fn) { - err = prog->sec_def->preload_fn(prog, &load_attr, prog->sec_def->cookie); + if (prog->sec_def && prog->sec_def->prog_prepare_load_fn) { + err = prog->sec_def->prog_prepare_load_fn(prog, &load_attr, prog->sec_def->cookie); if (err < 0) { pr_warn("prog '%s': failed to prepare load attributes: %d\n", prog->name, err); @@ -6971,8 +6989,8 @@ static int bpf_object_init_progs(struct bpf_object *obj, const struct bpf_object /* sec_def can have custom callback which should be called * after bpf_program is initialized to adjust its properties */ - if (prog->sec_def->init_fn) { - err = prog->sec_def->init_fn(prog, prog->sec_def->cookie); + if (prog->sec_def->prog_setup_fn) { + err = prog->sec_def->prog_setup_fn(prog, prog->sec_def->cookie); if (err < 0) { pr_warn("prog '%s': failed to initialize: %d\n", prog->name, err); @@ -7176,12 +7194,10 @@ static int bpf_object__sanitize_maps(struct bpf_object *obj) return 0; } -static int bpf_object__read_kallsyms_file(struct bpf_object *obj) +int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *ctx) { char sym_type, sym_name[500]; unsigned long long sym_addr; - const struct btf_type *t; - struct extern_desc *ext; int ret, err = 0; FILE *f; @@ -7200,35 +7216,51 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj) if (ret != 3) { pr_warn("failed to read kallsyms entry: %d\n", ret); err = -EINVAL; - goto out; + break; } - ext = find_extern_by_name(obj, sym_name); - if (!ext || ext->type != EXT_KSYM) - continue; - - t = btf__type_by_id(obj->btf, ext->btf_id); - if (!btf_is_var(t)) - continue; - - if (ext->is_set && ext->ksym.addr != sym_addr) { - pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n", - sym_name, ext->ksym.addr, sym_addr); - err = -EINVAL; - goto out; - } - if (!ext->is_set) { - ext->is_set = true; - ext->ksym.addr = sym_addr; - pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr); - } + err = cb(sym_addr, sym_type, sym_name, ctx); + if (err) + break; } -out: fclose(f); return err; } +static int kallsyms_cb(unsigned long long sym_addr, char sym_type, + const char *sym_name, void *ctx) +{ + struct bpf_object *obj = ctx; + const struct btf_type *t; + struct extern_desc *ext; + + ext = find_extern_by_name(obj, sym_name); + if (!ext || ext->type != EXT_KSYM) + return 0; + + t = btf__type_by_id(obj->btf, ext->btf_id); + if (!btf_is_var(t)) + return 0; + + if (ext->is_set && ext->ksym.addr != sym_addr) { + pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n", + sym_name, ext->ksym.addr, sym_addr); + return -EINVAL; + } + if (!ext->is_set) { + ext->is_set = true; + ext->ksym.addr = sym_addr; + pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr); + } + return 0; +} + +static int bpf_object__read_kallsyms_file(struct bpf_object *obj) +{ + return libbpf_kallsyms_parse(kallsyms_cb, obj); +} + static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name, __u16 kind, struct btf **res_btf, struct module_btf **res_mod_btf) @@ -8589,20 +8621,21 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log } #define SEC_DEF(sec_pfx, ptype, atype, flags, ...) { \ - .sec = sec_pfx, \ + .sec = (char *)sec_pfx, \ .prog_type = BPF_PROG_TYPE_##ptype, \ .expected_attach_type = atype, \ .cookie = (long)(flags), \ - .preload_fn = libbpf_preload_prog, \ + .prog_prepare_load_fn = libbpf_prepare_prog_load, \ __VA_ARGS__ \ } -static struct bpf_link *attach_kprobe(const struct bpf_program *prog, long cookie); -static struct bpf_link *attach_tp(const struct bpf_program *prog, long cookie); -static struct bpf_link *attach_raw_tp(const struct bpf_program *prog, long cookie); -static struct bpf_link *attach_trace(const struct bpf_program *prog, long cookie); -static struct bpf_link *attach_lsm(const struct bpf_program *prog, long cookie); -static struct bpf_link *attach_iter(const struct bpf_program *prog, long cookie); +static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link); +static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); +static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); +static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_link **link); +static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link); +static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_link **link); +static int attach_iter(const struct bpf_program *prog, long cookie, struct bpf_link **link); static const struct bpf_sec_def section_defs[] = { SEC_DEF("socket", SOCKET_FILTER, 0, SEC_NONE | SEC_SLOPPY_PFX), @@ -8612,6 +8645,8 @@ static const struct bpf_sec_def section_defs[] = { SEC_DEF("uprobe/", KPROBE, 0, SEC_NONE), SEC_DEF("kretprobe/", KPROBE, 0, SEC_NONE, attach_kprobe), SEC_DEF("uretprobe/", KPROBE, 0, SEC_NONE), + SEC_DEF("kprobe.multi/", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), + SEC_DEF("kretprobe.multi/", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE | SEC_SLOPPY_PFX | SEC_DEPRECATED), SEC_DEF("action", SCHED_ACT, 0, SEC_NONE | SEC_SLOPPY_PFX), @@ -8682,61 +8717,167 @@ static const struct bpf_sec_def section_defs[] = { SEC_DEF("sk_lookup", SK_LOOKUP, BPF_SK_LOOKUP, SEC_ATTACHABLE | SEC_SLOPPY_PFX), }; -#define MAX_TYPE_NAME_SIZE 32 +static size_t custom_sec_def_cnt; +static struct bpf_sec_def *custom_sec_defs; +static struct bpf_sec_def custom_fallback_def; +static bool has_custom_fallback_def; -static const struct bpf_sec_def *find_sec_def(const char *sec_name) +static int last_custom_sec_def_handler_id; + +int libbpf_register_prog_handler(const char *sec, + enum bpf_prog_type prog_type, + enum bpf_attach_type exp_attach_type, + const struct libbpf_prog_handler_opts *opts) { - const struct bpf_sec_def *sec_def; - enum sec_def_flags sec_flags; - int i, n = ARRAY_SIZE(section_defs), len; - bool strict = libbpf_mode & LIBBPF_STRICT_SEC_NAME; + struct bpf_sec_def *sec_def; - for (i = 0; i < n; i++) { - sec_def = §ion_defs[i]; - sec_flags = sec_def->cookie; - len = strlen(sec_def->sec); + if (!OPTS_VALID(opts, libbpf_prog_handler_opts)) + return libbpf_err(-EINVAL); - /* "type/" always has to have proper SEC("type/extras") form */ - if (sec_def->sec[len - 1] == '/') { - if (str_has_pfx(sec_name, sec_def->sec)) - return sec_def; - continue; - } + if (last_custom_sec_def_handler_id == INT_MAX) /* prevent overflow */ + return libbpf_err(-E2BIG); - /* "type+" means it can be either exact SEC("type") or - * well-formed SEC("type/extras") with proper '/' separator - */ - if (sec_def->sec[len - 1] == '+') { - len--; - /* not even a prefix */ - if (strncmp(sec_name, sec_def->sec, len) != 0) - continue; - /* exact match or has '/' separator */ - if (sec_name[len] == '\0' || sec_name[len] == '/') - return sec_def; - continue; - } + if (sec) { + sec_def = libbpf_reallocarray(custom_sec_defs, custom_sec_def_cnt + 1, + sizeof(*sec_def)); + if (!sec_def) + return libbpf_err(-ENOMEM); - /* SEC_SLOPPY_PFX definitions are allowed to be just prefix - * matches, unless strict section name mode - * (LIBBPF_STRICT_SEC_NAME) is enabled, in which case the - * match has to be exact. - */ - if ((sec_flags & SEC_SLOPPY_PFX) && !strict) { - if (str_has_pfx(sec_name, sec_def->sec)) - return sec_def; - continue; - } + custom_sec_defs = sec_def; + sec_def = &custom_sec_defs[custom_sec_def_cnt]; + } else { + if (has_custom_fallback_def) + return libbpf_err(-EBUSY); - /* Definitions not marked SEC_SLOPPY_PFX (e.g., - * SEC("syscall")) are exact matches in both modes. - */ - if (strcmp(sec_name, sec_def->sec) == 0) + sec_def = &custom_fallback_def; + } + + sec_def->sec = sec ? strdup(sec) : NULL; + if (sec && !sec_def->sec) + return libbpf_err(-ENOMEM); + + sec_def->prog_type = prog_type; + sec_def->expected_attach_type = exp_attach_type; + sec_def->cookie = OPTS_GET(opts, cookie, 0); + + sec_def->prog_setup_fn = OPTS_GET(opts, prog_setup_fn, NULL); + sec_def->prog_prepare_load_fn = OPTS_GET(opts, prog_prepare_load_fn, NULL); + sec_def->prog_attach_fn = OPTS_GET(opts, prog_attach_fn, NULL); + + sec_def->handler_id = ++last_custom_sec_def_handler_id; + + if (sec) + custom_sec_def_cnt++; + else + has_custom_fallback_def = true; + + return sec_def->handler_id; +} + +int libbpf_unregister_prog_handler(int handler_id) +{ + struct bpf_sec_def *sec_defs; + int i; + + if (handler_id <= 0) + return libbpf_err(-EINVAL); + + if (has_custom_fallback_def && custom_fallback_def.handler_id == handler_id) { + memset(&custom_fallback_def, 0, sizeof(custom_fallback_def)); + has_custom_fallback_def = false; + return 0; + } + + for (i = 0; i < custom_sec_def_cnt; i++) { + if (custom_sec_defs[i].handler_id == handler_id) + break; + } + + if (i == custom_sec_def_cnt) + return libbpf_err(-ENOENT); + + free(custom_sec_defs[i].sec); + for (i = i + 1; i < custom_sec_def_cnt; i++) + custom_sec_defs[i - 1] = custom_sec_defs[i]; + custom_sec_def_cnt--; + + /* try to shrink the array, but it's ok if we couldn't */ + sec_defs = libbpf_reallocarray(custom_sec_defs, custom_sec_def_cnt, sizeof(*sec_defs)); + if (sec_defs) + custom_sec_defs = sec_defs; + + return 0; +} + +static bool sec_def_matches(const struct bpf_sec_def *sec_def, const char *sec_name, + bool allow_sloppy) +{ + size_t len = strlen(sec_def->sec); + + /* "type/" always has to have proper SEC("type/extras") form */ + if (sec_def->sec[len - 1] == '/') { + if (str_has_pfx(sec_name, sec_def->sec)) + return true; + return false; + } + + /* "type+" means it can be either exact SEC("type") or + * well-formed SEC("type/extras") with proper '/' separator + */ + if (sec_def->sec[len - 1] == '+') { + len--; + /* not even a prefix */ + if (strncmp(sec_name, sec_def->sec, len) != 0) + return false; + /* exact match or has '/' separator */ + if (sec_name[len] == '\0' || sec_name[len] == '/') + return true; + return false; + } + + /* SEC_SLOPPY_PFX definitions are allowed to be just prefix + * matches, unless strict section name mode + * (LIBBPF_STRICT_SEC_NAME) is enabled, in which case the + * match has to be exact. + */ + if (allow_sloppy && str_has_pfx(sec_name, sec_def->sec)) + return true; + + /* Definitions not marked SEC_SLOPPY_PFX (e.g., + * SEC("syscall")) are exact matches in both modes. + */ + return strcmp(sec_name, sec_def->sec) == 0; +} + +static const struct bpf_sec_def *find_sec_def(const char *sec_name) +{ + const struct bpf_sec_def *sec_def; + int i, n; + bool strict = libbpf_mode & LIBBPF_STRICT_SEC_NAME, allow_sloppy; + + n = custom_sec_def_cnt; + for (i = 0; i < n; i++) { + sec_def = &custom_sec_defs[i]; + if (sec_def_matches(sec_def, sec_name, false)) + return sec_def; + } + + n = ARRAY_SIZE(section_defs); + for (i = 0; i < n; i++) { + sec_def = §ion_defs[i]; + allow_sloppy = (sec_def->cookie & SEC_SLOPPY_PFX) && !strict; + if (sec_def_matches(sec_def, sec_name, allow_sloppy)) return sec_def; } + + if (has_custom_fallback_def) + return &custom_fallback_def; + return NULL; } +#define MAX_TYPE_NAME_SIZE 32 + static char *libbpf_get_type_names(bool attach_type) { int i, len = ARRAY_SIZE(section_defs) * MAX_TYPE_NAME_SIZE; @@ -8752,7 +8893,7 @@ static char *libbpf_get_type_names(bool attach_type) const struct bpf_sec_def *sec_def = §ion_defs[i]; if (attach_type) { - if (sec_def->preload_fn != libbpf_preload_prog) + if (sec_def->prog_prepare_load_fn != libbpf_prepare_prog_load) continue; if (!(sec_def->cookie & SEC_ATTACHABLE)) @@ -9135,7 +9276,7 @@ int libbpf_attach_type_by_name(const char *name, return libbpf_err(-EINVAL); } - if (sec_def->preload_fn != libbpf_preload_prog) + if (sec_def->prog_prepare_load_fn != libbpf_prepare_prog_load) return libbpf_err(-EINVAL); if (!(sec_def->cookie & SEC_ATTACHABLE)) return libbpf_err(-EINVAL); @@ -10109,14 +10250,146 @@ struct bpf_link *bpf_program__attach_kprobe(const struct bpf_program *prog, return bpf_program__attach_kprobe_opts(prog, func_name, &opts); } -static struct bpf_link *attach_kprobe(const struct bpf_program *prog, long cookie) +/* Adapted from perf/util/string.c */ +static bool glob_match(const char *str, const char *pat) +{ + while (*str && *pat && *pat != '*') { + if (*pat == '?') { /* Matches any single character */ + str++; + pat++; + continue; + } + if (*str != *pat) + return false; + str++; + pat++; + } + /* Check wild card */ + if (*pat == '*') { + while (*pat == '*') + pat++; + if (!*pat) /* Tail wild card matches all */ + return true; + while (*str) + if (glob_match(str++, pat)) + return true; + } + return !*str && !*pat; +} + +struct kprobe_multi_resolve { + const char *pattern; + unsigned long *addrs; + size_t cap; + size_t cnt; +}; + +static int +resolve_kprobe_multi_cb(unsigned long long sym_addr, char sym_type, + const char *sym_name, void *ctx) +{ + struct kprobe_multi_resolve *res = ctx; + int err; + + if (!glob_match(sym_name, res->pattern)) + return 0; + + err = libbpf_ensure_mem((void **) &res->addrs, &res->cap, sizeof(unsigned long), + res->cnt + 1); + if (err) + return err; + + res->addrs[res->cnt++] = (unsigned long) sym_addr; + return 0; +} + +struct bpf_link * +bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, + const char *pattern, + const struct bpf_kprobe_multi_opts *opts) +{ + LIBBPF_OPTS(bpf_link_create_opts, lopts); + struct kprobe_multi_resolve res = { + .pattern = pattern, + }; + struct bpf_link *link = NULL; + char errmsg[STRERR_BUFSIZE]; + const unsigned long *addrs; + int err, link_fd, prog_fd; + const __u64 *cookies; + const char **syms; + bool retprobe; + size_t cnt; + + if (!OPTS_VALID(opts, bpf_kprobe_multi_opts)) + return libbpf_err_ptr(-EINVAL); + + syms = OPTS_GET(opts, syms, false); + addrs = OPTS_GET(opts, addrs, false); + cnt = OPTS_GET(opts, cnt, false); + cookies = OPTS_GET(opts, cookies, false); + + if (!pattern && !addrs && !syms) + return libbpf_err_ptr(-EINVAL); + if (pattern && (addrs || syms || cookies || cnt)) + return libbpf_err_ptr(-EINVAL); + if (!pattern && !cnt) + return libbpf_err_ptr(-EINVAL); + if (addrs && syms) + return libbpf_err_ptr(-EINVAL); + + if (pattern) { + err = libbpf_kallsyms_parse(resolve_kprobe_multi_cb, &res); + if (err) + goto error; + if (!res.cnt) { + err = -ENOENT; + goto error; + } + addrs = res.addrs; + cnt = res.cnt; + } + + retprobe = OPTS_GET(opts, retprobe, false); + + lopts.kprobe_multi.syms = syms; + lopts.kprobe_multi.addrs = addrs; + lopts.kprobe_multi.cookies = cookies; + lopts.kprobe_multi.cnt = cnt; + lopts.kprobe_multi.flags = retprobe ? BPF_F_KPROBE_MULTI_RETURN : 0; + + link = calloc(1, sizeof(*link)); + if (!link) { + err = -ENOMEM; + goto error; + } + link->detach = &bpf_link__detach_fd; + + prog_fd = bpf_program__fd(prog); + link_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_KPROBE_MULTI, &lopts); + if (link_fd < 0) { + err = -errno; + pr_warn("prog '%s': failed to attach: %s\n", + prog->name, libbpf_strerror_r(err, errmsg, sizeof(errmsg))); + goto error; + } + link->fd = link_fd; + free(res.addrs); + return link; + +error: + free(link); + free(res.addrs); + return libbpf_err_ptr(err); +} + +static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) { DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); unsigned long offset = 0; - struct bpf_link *link; const char *func_name; char *func; - int n, err; + int n; opts.retprobe = str_has_pfx(prog->sec_name, "kretprobe/"); if (opts.retprobe) @@ -10126,21 +10399,43 @@ static struct bpf_link *attach_kprobe(const struct bpf_program *prog, long cooki n = sscanf(func_name, "%m[a-zA-Z0-9_.]+%li", &func, &offset); if (n < 1) { - err = -EINVAL; pr_warn("kprobe name is invalid: %s\n", func_name); - return libbpf_err_ptr(err); + return -EINVAL; } if (opts.retprobe && offset != 0) { free(func); - err = -EINVAL; pr_warn("kretprobes do not support offset specification\n"); - return libbpf_err_ptr(err); + return -EINVAL; } opts.offset = offset; - link = bpf_program__attach_kprobe_opts(prog, func, &opts); + *link = bpf_program__attach_kprobe_opts(prog, func, &opts); free(func); - return link; + return libbpf_get_error(*link); +} + +static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link) +{ + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); + const char *spec; + char *pattern; + int n; + + opts.retprobe = str_has_pfx(prog->sec_name, "kretprobe.multi/"); + if (opts.retprobe) + spec = prog->sec_name + sizeof("kretprobe.multi/") - 1; + else + spec = prog->sec_name + sizeof("kprobe.multi/") - 1; + + n = sscanf(spec, "%m[a-zA-Z0-9_.*?]", &pattern); + if (n < 1) { + pr_warn("kprobe multi pattern is invalid: %s\n", pattern); + return -EINVAL; + } + + *link = bpf_program__attach_kprobe_multi_opts(prog, pattern, &opts); + free(pattern); + return libbpf_get_error(*link); } static void gen_uprobe_legacy_event_name(char *buf, size_t buf_sz, @@ -10395,14 +10690,13 @@ struct bpf_link *bpf_program__attach_tracepoint(const struct bpf_program *prog, return bpf_program__attach_tracepoint_opts(prog, tp_category, tp_name, NULL); } -static struct bpf_link *attach_tp(const struct bpf_program *prog, long cookie) +static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link) { char *sec_name, *tp_cat, *tp_name; - struct bpf_link *link; sec_name = strdup(prog->sec_name); if (!sec_name) - return libbpf_err_ptr(-ENOMEM); + return -ENOMEM; /* extract "tp/<category>/<name>" or "tracepoint/<category>/<name>" */ if (str_has_pfx(prog->sec_name, "tp/")) @@ -10412,14 +10706,14 @@ static struct bpf_link *attach_tp(const struct bpf_program *prog, long cookie) tp_name = strchr(tp_cat, '/'); if (!tp_name) { free(sec_name); - return libbpf_err_ptr(-EINVAL); + return -EINVAL; } *tp_name = '\0'; tp_name++; - link = bpf_program__attach_tracepoint(prog, tp_cat, tp_name); + *link = bpf_program__attach_tracepoint(prog, tp_cat, tp_name); free(sec_name); - return link; + return libbpf_get_error(*link); } struct bpf_link *bpf_program__attach_raw_tracepoint(const struct bpf_program *prog, @@ -10452,7 +10746,7 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(const struct bpf_program *pr return link; } -static struct bpf_link *attach_raw_tp(const struct bpf_program *prog, long cookie) +static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link) { static const char *const prefixes[] = { "raw_tp/", @@ -10472,10 +10766,11 @@ static struct bpf_link *attach_raw_tp(const struct bpf_program *prog, long cooki if (!tp_name) { pr_warn("prog '%s': invalid section name '%s'\n", prog->name, prog->sec_name); - return libbpf_err_ptr(-EINVAL); + return -EINVAL; } - return bpf_program__attach_raw_tracepoint(prog, tp_name); + *link = bpf_program__attach_raw_tracepoint(prog, tp_name); + return libbpf_get_error(link); } /* Common logic for all BPF program types that attach to a btf_id */ @@ -10518,14 +10813,16 @@ struct bpf_link *bpf_program__attach_lsm(const struct bpf_program *prog) return bpf_program__attach_btf_id(prog); } -static struct bpf_link *attach_trace(const struct bpf_program *prog, long cookie) +static int attach_trace(const struct bpf_program *prog, long cookie, struct bpf_link **link) { - return bpf_program__attach_trace(prog); + *link = bpf_program__attach_trace(prog); + return libbpf_get_error(*link); } -static struct bpf_link *attach_lsm(const struct bpf_program *prog, long cookie) +static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_link **link) { - return bpf_program__attach_lsm(prog); + *link = bpf_program__attach_lsm(prog); + return libbpf_get_error(*link); } static struct bpf_link * @@ -10654,17 +10951,33 @@ bpf_program__attach_iter(const struct bpf_program *prog, return link; } -static struct bpf_link *attach_iter(const struct bpf_program *prog, long cookie) +static int attach_iter(const struct bpf_program *prog, long cookie, struct bpf_link **link) { - return bpf_program__attach_iter(prog, NULL); + *link = bpf_program__attach_iter(prog, NULL); + return libbpf_get_error(*link); } struct bpf_link *bpf_program__attach(const struct bpf_program *prog) { - if (!prog->sec_def || !prog->sec_def->attach_fn) - return libbpf_err_ptr(-ESRCH); + struct bpf_link *link = NULL; + int err; + + if (!prog->sec_def || !prog->sec_def->prog_attach_fn) + return libbpf_err_ptr(-EOPNOTSUPP); + + err = prog->sec_def->prog_attach_fn(prog, prog->sec_def->cookie, &link); + if (err) + return libbpf_err_ptr(err); - return prog->sec_def->attach_fn(prog, prog->sec_def->cookie); + /* When calling bpf_program__attach() explicitly, auto-attach support + * is expected to work, so NULL returned link is considered an error. + * This is different for skeleton's attach, see comment in + * bpf_object__attach_skeleton(). + */ + if (!link) + return libbpf_err_ptr(-EOPNOTSUPP); + + return link; } static int bpf_link__detach_struct_ops(struct bpf_link *link) @@ -11679,6 +11992,49 @@ int libbpf_num_possible_cpus(void) return tmp_cpus; } +static int populate_skeleton_maps(const struct bpf_object *obj, + struct bpf_map_skeleton *maps, + size_t map_cnt) +{ + int i; + + for (i = 0; i < map_cnt; i++) { + struct bpf_map **map = maps[i].map; + const char *name = maps[i].name; + void **mmaped = maps[i].mmaped; + + *map = bpf_object__find_map_by_name(obj, name); + if (!*map) { + pr_warn("failed to find skeleton map '%s'\n", name); + return -ESRCH; + } + + /* externs shouldn't be pre-setup from user code */ + if (mmaped && (*map)->libbpf_type != LIBBPF_MAP_KCONFIG) + *mmaped = (*map)->mmaped; + } + return 0; +} + +static int populate_skeleton_progs(const struct bpf_object *obj, + struct bpf_prog_skeleton *progs, + size_t prog_cnt) +{ + int i; + + for (i = 0; i < prog_cnt; i++) { + struct bpf_program **prog = progs[i].prog; + const char *name = progs[i].name; + + *prog = bpf_object__find_program_by_name(obj, name); + if (!*prog) { + pr_warn("failed to find skeleton program '%s'\n", name); + return -ESRCH; + } + } + return 0; +} + int bpf_object__open_skeleton(struct bpf_object_skeleton *s, const struct bpf_object_open_opts *opts) { @@ -11686,7 +12042,7 @@ int bpf_object__open_skeleton(struct bpf_object_skeleton *s, .object_name = s->name, ); struct bpf_object *obj; - int i, err; + int err; /* Attempt to preserve opts->object_name, unless overriden by user * explicitly. Overwriting object name for skeletons is discouraged, @@ -11709,37 +12065,91 @@ int bpf_object__open_skeleton(struct bpf_object_skeleton *s, } *s->obj = obj; + err = populate_skeleton_maps(obj, s->maps, s->map_cnt); + if (err) { + pr_warn("failed to populate skeleton maps for '%s': %d\n", s->name, err); + return libbpf_err(err); + } - for (i = 0; i < s->map_cnt; i++) { - struct bpf_map **map = s->maps[i].map; - const char *name = s->maps[i].name; - void **mmaped = s->maps[i].mmaped; + err = populate_skeleton_progs(obj, s->progs, s->prog_cnt); + if (err) { + pr_warn("failed to populate skeleton progs for '%s': %d\n", s->name, err); + return libbpf_err(err); + } - *map = bpf_object__find_map_by_name(obj, name); - if (!*map) { - pr_warn("failed to find skeleton map '%s'\n", name); - return libbpf_err(-ESRCH); - } + return 0; +} - /* externs shouldn't be pre-setup from user code */ - if (mmaped && (*map)->libbpf_type != LIBBPF_MAP_KCONFIG) - *mmaped = (*map)->mmaped; +int bpf_object__open_subskeleton(struct bpf_object_subskeleton *s) +{ + int err, len, var_idx, i; + const char *var_name; + const struct bpf_map *map; + struct btf *btf; + __u32 map_type_id; + const struct btf_type *map_type, *var_type; + const struct bpf_var_skeleton *var_skel; + struct btf_var_secinfo *var; + + if (!s->obj) + return libbpf_err(-EINVAL); + + btf = bpf_object__btf(s->obj); + if (!btf) { + pr_warn("subskeletons require BTF at runtime (object %s)\n", + bpf_object__name(s->obj)); + return libbpf_err(-errno); } - for (i = 0; i < s->prog_cnt; i++) { - struct bpf_program **prog = s->progs[i].prog; - const char *name = s->progs[i].name; + err = populate_skeleton_maps(s->obj, s->maps, s->map_cnt); + if (err) { + pr_warn("failed to populate subskeleton maps: %d\n", err); + return libbpf_err(err); + } - *prog = bpf_object__find_program_by_name(obj, name); - if (!*prog) { - pr_warn("failed to find skeleton program '%s'\n", name); - return libbpf_err(-ESRCH); - } + err = populate_skeleton_progs(s->obj, s->progs, s->prog_cnt); + if (err) { + pr_warn("failed to populate subskeleton maps: %d\n", err); + return libbpf_err(err); } + for (var_idx = 0; var_idx < s->var_cnt; var_idx++) { + var_skel = &s->vars[var_idx]; + map = *var_skel->map; + map_type_id = bpf_map__btf_value_type_id(map); + map_type = btf__type_by_id(btf, map_type_id); + + if (!btf_is_datasec(map_type)) { + pr_warn("type for map '%1$s' is not a datasec: %2$s", + bpf_map__name(map), + __btf_kind_str(btf_kind(map_type))); + return libbpf_err(-EINVAL); + } + + len = btf_vlen(map_type); + var = btf_var_secinfos(map_type); + for (i = 0; i < len; i++, var++) { + var_type = btf__type_by_id(btf, var->type); + var_name = btf__name_by_offset(btf, var_type->name_off); + if (strcmp(var_name, var_skel->name) == 0) { + *var_skel->addr = map->mmaped + var->offset; + break; + } + } + } return 0; } +void bpf_object__destroy_subskeleton(struct bpf_object_subskeleton *s) +{ + if (!s) + return; + free(s->maps); + free(s->progs); + free(s->vars); + free(s); +} + int bpf_object__load_skeleton(struct bpf_object_skeleton *s) { int i, err; @@ -11805,16 +12215,30 @@ int bpf_object__attach_skeleton(struct bpf_object_skeleton *s) continue; /* auto-attaching not supported for this program */ - if (!prog->sec_def || !prog->sec_def->attach_fn) + if (!prog->sec_def || !prog->sec_def->prog_attach_fn) continue; - *link = bpf_program__attach(prog); - err = libbpf_get_error(*link); + /* if user already set the link manually, don't attempt auto-attach */ + if (*link) + continue; + + err = prog->sec_def->prog_attach_fn(prog, prog->sec_def->cookie, link); if (err) { - pr_warn("failed to auto-attach program '%s': %d\n", + pr_warn("prog '%s': failed to auto-attach: %d\n", bpf_program__name(prog), err); return libbpf_err(err); } + + /* It's possible that for some SEC() definitions auto-attach + * is supported in some cases (e.g., if definition completely + * specifies target information), but is not in other cases. + * SEC("uprobe") is one such case. If user specified target + * binary and function name, such BPF program can be + * auto-attached. But if not, it shouldn't trigger skeleton's + * attach to fail. It should just be skipped. + * attach_fn signals such case with returning 0 (no error) and + * setting link to NULL. + */ } return 0; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index c8d8daad212e..05dde85e19a6 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -425,6 +425,29 @@ bpf_program__attach_kprobe_opts(const struct bpf_program *prog, const char *func_name, const struct bpf_kprobe_opts *opts); +struct bpf_kprobe_multi_opts { + /* size of this struct, for forward/backward compatibility */ + size_t sz; + /* array of function symbols to attach */ + const char **syms; + /* array of function addresses to attach */ + const unsigned long *addrs; + /* array of user-provided values fetchable through bpf_get_attach_cookie */ + const __u64 *cookies; + /* number of elements in syms/addrs/cookies arrays */ + size_t cnt; + /* create return kprobes */ + bool retprobe; + size_t :0; +}; + +#define bpf_kprobe_multi_opts__last_field retprobe + +LIBBPF_API struct bpf_link * +bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, + const char *pattern, + const struct bpf_kprobe_multi_opts *opts); + struct bpf_uprobe_opts { /* size of this struct, for forward/backward compatiblity */ size_t sz; @@ -1289,6 +1312,35 @@ LIBBPF_API int bpf_object__attach_skeleton(struct bpf_object_skeleton *s); LIBBPF_API void bpf_object__detach_skeleton(struct bpf_object_skeleton *s); LIBBPF_API void bpf_object__destroy_skeleton(struct bpf_object_skeleton *s); +struct bpf_var_skeleton { + const char *name; + struct bpf_map **map; + void **addr; +}; + +struct bpf_object_subskeleton { + size_t sz; /* size of this struct, for forward/backward compatibility */ + + const struct bpf_object *obj; + + int map_cnt; + int map_skel_sz; /* sizeof(struct bpf_map_skeleton) */ + struct bpf_map_skeleton *maps; + + int prog_cnt; + int prog_skel_sz; /* sizeof(struct bpf_prog_skeleton) */ + struct bpf_prog_skeleton *progs; + + int var_cnt; + int var_skel_sz; /* sizeof(struct bpf_var_skeleton) */ + struct bpf_var_skeleton *vars; +}; + +LIBBPF_API int +bpf_object__open_subskeleton(struct bpf_object_subskeleton *s); +LIBBPF_API void +bpf_object__destroy_subskeleton(struct bpf_object_subskeleton *s); + struct gen_loader_opts { size_t sz; /* size of this struct, for forward/backward compatiblity */ const char *data; @@ -1328,6 +1380,115 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker); LIBBPF_API void bpf_linker__free(struct bpf_linker *linker); +/* + * Custom handling of BPF program's SEC() definitions + */ + +struct bpf_prog_load_opts; /* defined in bpf.h */ + +/* Called during bpf_object__open() for each recognized BPF program. Callback + * can use various bpf_program__set_*() setters to adjust whatever properties + * are necessary. + */ +typedef int (*libbpf_prog_setup_fn_t)(struct bpf_program *prog, long cookie); + +/* Called right before libbpf performs bpf_prog_load() to load BPF program + * into the kernel. Callback can adjust opts as necessary. + */ +typedef int (*libbpf_prog_prepare_load_fn_t)(struct bpf_program *prog, + struct bpf_prog_load_opts *opts, long cookie); + +/* Called during skeleton attach or through bpf_program__attach(). If + * auto-attach is not supported, callback should return 0 and set link to + * NULL (it's not considered an error during skeleton attach, but it will be + * an error for bpf_program__attach() calls). On error, error should be + * returned directly and link set to NULL. On success, return 0 and set link + * to a valid struct bpf_link. + */ +typedef int (*libbpf_prog_attach_fn_t)(const struct bpf_program *prog, long cookie, + struct bpf_link **link); + +struct libbpf_prog_handler_opts { + /* size of this struct, for forward/backward compatiblity */ + size_t sz; + /* User-provided value that is passed to prog_setup_fn, + * prog_prepare_load_fn, and prog_attach_fn callbacks. Allows user to + * register one set of callbacks for multiple SEC() definitions and + * still be able to distinguish them, if necessary. For example, + * libbpf itself is using this to pass necessary flags (e.g., + * sleepable flag) to a common internal SEC() handler. + */ + long cookie; + /* BPF program initialization callback (see libbpf_prog_setup_fn_t). + * Callback is optional, pass NULL if it's not necessary. + */ + libbpf_prog_setup_fn_t prog_setup_fn; + /* BPF program loading callback (see libbpf_prog_prepare_load_fn_t). + * Callback is optional, pass NULL if it's not necessary. + */ + libbpf_prog_prepare_load_fn_t prog_prepare_load_fn; + /* BPF program attach callback (see libbpf_prog_attach_fn_t). + * Callback is optional, pass NULL if it's not necessary. + */ + libbpf_prog_attach_fn_t prog_attach_fn; +}; +#define libbpf_prog_handler_opts__last_field prog_attach_fn + +/** + * @brief **libbpf_register_prog_handler()** registers a custom BPF program + * SEC() handler. + * @param sec section prefix for which custom handler is registered + * @param prog_type BPF program type associated with specified section + * @param exp_attach_type Expected BPF attach type associated with specified section + * @param opts optional cookie, callbacks, and other extra options + * @return Non-negative handler ID is returned on success. This handler ID has + * to be passed to *libbpf_unregister_prog_handler()* to unregister such + * custom handler. Negative error code is returned on error. + * + * *sec* defines which SEC() definitions are handled by this custom handler + * registration. *sec* can have few different forms: + * - if *sec* is just a plain string (e.g., "abc"), it will match only + * SEC("abc"). If BPF program specifies SEC("abc/whatever") it will result + * in an error; + * - if *sec* is of the form "abc/", proper SEC() form is + * SEC("abc/something"), where acceptable "something" should be checked by + * *prog_init_fn* callback, if there are additional restrictions; + * - if *sec* is of the form "abc+", it will successfully match both + * SEC("abc") and SEC("abc/whatever") forms; + * - if *sec* is NULL, custom handler is registered for any BPF program that + * doesn't match any of the registered (custom or libbpf's own) SEC() + * handlers. There could be only one such generic custom handler registered + * at any given time. + * + * All custom handlers (except the one with *sec* == NULL) are processed + * before libbpf's own SEC() handlers. It is allowed to "override" libbpf's + * SEC() handlers by registering custom ones for the same section prefix + * (i.e., it's possible to have custom SEC("perf_event/LLC-load-misses") + * handler). + * + * Note, like much of global libbpf APIs (e.g., libbpf_set_print(), + * libbpf_set_strict_mode(), etc)) these APIs are not thread-safe. User needs + * to ensure synchronization if there is a risk of running this API from + * multiple threads simultaneously. + */ +LIBBPF_API int libbpf_register_prog_handler(const char *sec, + enum bpf_prog_type prog_type, + enum bpf_attach_type exp_attach_type, + const struct libbpf_prog_handler_opts *opts); +/** + * @brief *libbpf_unregister_prog_handler()* unregisters previously registered + * custom BPF program SEC() handler. + * @param handler_id handler ID returned by *libbpf_register_prog_handler()* + * after successful registration + * @return 0 on success, negative error code if handler isn't found + * + * Note, like much of global libbpf APIs (e.g., libbpf_set_print(), + * libbpf_set_strict_mode(), etc)) these APIs are not thread-safe. User needs + * to ensure synchronization if there is a risk of running this API from + * multiple threads simultaneously. + */ +LIBBPF_API int libbpf_unregister_prog_handler(int handler_id); + #ifdef __cplusplus } /* extern "C" */ #endif diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 47e70c9058d9..dd35ee58bfaa 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -439,3 +439,12 @@ LIBBPF_0.7.0 { libbpf_probe_bpf_prog_type; libbpf_set_memlock_rlim_max; } LIBBPF_0.6.0; + +LIBBPF_0.8.0 { + global: + bpf_object__destroy_subskeleton; + bpf_object__open_subskeleton; + libbpf_register_prog_handler; + libbpf_unregister_prog_handler; + bpf_program__attach_kprobe_multi_opts; +} LIBBPF_0.7.0; diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index 4fda8bdf0a0d..b6247dc7f8eb 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -449,6 +449,11 @@ __s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name, extern enum libbpf_strict_mode libbpf_mode; +typedef int (*kallsyms_cb_t)(unsigned long long sym_addr, char sym_type, + const char *sym_name, void *ctx); + +int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *arg); + /* handle direct returned errors */ static inline int libbpf_err(int ret) { diff --git a/tools/lib/bpf/libbpf_legacy.h b/tools/lib/bpf/libbpf_legacy.h index a283cf031665..d7bcbd01f66f 100644 --- a/tools/lib/bpf/libbpf_legacy.h +++ b/tools/lib/bpf/libbpf_legacy.h @@ -54,6 +54,10 @@ enum libbpf_strict_mode { * * Note, in this mode the program pin path will be based on the * function name instead of section name. + * + * Additionally, routines in the .text section are always considered + * sub-programs. Legacy behavior allows for a single routine in .text + * to be a program. */ LIBBPF_STRICT_SEC_NAME = 0x04, /* diff --git a/tools/lib/bpf/libbpf_version.h b/tools/lib/bpf/libbpf_version.h index 0fefefc3500b..61f2039404b6 100644 --- a/tools/lib/bpf/libbpf_version.h +++ b/tools/lib/bpf/libbpf_version.h @@ -4,6 +4,6 @@ #define __LIBBPF_VERSION_H #define LIBBPF_MAJOR_VERSION 0 -#define LIBBPF_MINOR_VERSION 7 +#define LIBBPF_MINOR_VERSION 8 #endif /* __LIBBPF_VERSION_H */ diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index edafe56664f3..af136f73b09d 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -481,8 +481,8 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) BPF_EMIT_CALL(BPF_FUNC_redirect_map), BPF_EXIT_INSN(), }; - size_t insns_cnt[] = {sizeof(prog) / sizeof(struct bpf_insn), - sizeof(prog_redirect_flags) / sizeof(struct bpf_insn), + size_t insns_cnt[] = {ARRAY_SIZE(prog), + ARRAY_SIZE(prog_redirect_flags), }; struct bpf_insn *progs[] = {prog, prog_redirect_flags}; enum xsk_prog option = get_xsk_prog(); @@ -1193,12 +1193,23 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, int xsk_umem__delete(struct xsk_umem *umem) { + struct xdp_mmap_offsets off; + int err; + if (!umem) return 0; if (umem->refcount) return -EBUSY; + err = xsk_get_mmap_offsets(umem->fd, &off); + if (!err && umem->fill_save && umem->comp_save) { + munmap(umem->fill_save->ring - off.fr.desc, + off.fr.desc + umem->config.fill_size * sizeof(__u64)); + munmap(umem->comp_save->ring - off.cr.desc, + off.cr.desc + umem->config.comp_size * sizeof(__u64)); + } + close(umem->fd); free(umem); diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include index 79d102304470..a2335e402145 100644 --- a/tools/scripts/Makefile.include +++ b/tools/scripts/Makefile.include @@ -89,6 +89,9 @@ ifeq ($(CC_NO_CLANG), 1) EXTRA_WARNINGS += -Wstrict-aliasing=3 else ifneq ($(CROSS_COMPILE),) +# Allow userspace to override CLANG_CROSS_FLAGS to specify their own +# sysroots and flags or to avoid the GCC call in pure Clang builds. +ifeq ($(CLANG_CROSS_FLAGS),) CLANG_CROSS_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%)) GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)gcc 2>/dev/null)) ifneq ($(GCC_TOOLCHAIN_DIR),) @@ -96,6 +99,7 @@ CLANG_CROSS_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR)$(notdir $(CROSS_COMPILE)) CLANG_CROSS_FLAGS += --sysroot=$(shell $(CROSS_COMPILE)gcc -print-sysroot) CLANG_CROSS_FLAGS += --gcc-toolchain=$(realpath $(GCC_TOOLCHAIN_DIR)/..) endif # GCC_TOOLCHAIN_DIR +endif # CLANG_CROSS_FLAGS CFLAGS += $(CLANG_CROSS_FLAGS) AFLAGS += $(CLANG_CROSS_FLAGS) endif # CROSS_COMPILE diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index a7eead8820a0..595565eb68c0 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -31,6 +31,7 @@ test_tcp_check_syncookie_user test_sysctl xdping test_cpp +*.subskel.h *.skel.h *.lskel.h /no_alu32 diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index fe12b4f5fe20..3820608faf57 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -25,7 +25,7 @@ CFLAGS += -g -O0 -rdynamic -Wall -Werror $(GENFLAGS) $(SAN_CFLAGS) \ -I$(CURDIR) -I$(INCLUDE_DIR) -I$(GENDIR) -I$(LIBDIR) \ -I$(TOOLSINCDIR) -I$(APIDIR) -I$(OUTPUT) LDFLAGS += $(SAN_CFLAGS) -LDLIBS += -lcap -lelf -lz -lrt -lpthread +LDLIBS += -lelf -lz -lrt -lpthread # Silence some warnings when compiled with clang ifneq ($(LLVM),) @@ -195,6 +195,7 @@ $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(BPFOBJ) CGROUP_HELPERS := $(OUTPUT)/cgroup_helpers.o TESTING_HELPERS := $(OUTPUT)/testing_helpers.o TRACE_HELPERS := $(OUTPUT)/trace_helpers.o +CAP_HELPERS := $(OUTPUT)/cap_helpers.o $(OUTPUT)/test_dev_cgroup: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(OUTPUT)/test_skb_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) @@ -211,7 +212,7 @@ $(OUTPUT)/test_lirc_mode2_user: $(TESTING_HELPERS) $(OUTPUT)/xdping: $(TESTING_HELPERS) $(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS) $(OUTPUT)/test_maps: $(TESTING_HELPERS) -$(OUTPUT)/test_verifier: $(TESTING_HELPERS) +$(OUTPUT)/test_verifier: $(TESTING_HELPERS) $(CAP_HELPERS) BPFTOOL ?= $(DEFAULT_BPFTOOL) $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \ @@ -326,7 +327,13 @@ endef SKEL_BLACKLIST := btf__% test_pinning_invalid.c test_sk_assign.c LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \ - linked_vars.skel.h linked_maps.skel.h + linked_vars.skel.h linked_maps.skel.h \ + test_subskeleton.skel.h test_subskeleton_lib.skel.h + +# In the subskeleton case, we want the test_subskeleton_lib.subskel.h file +# but that's created as a side-effect of the skel.h generation. +test_subskeleton.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o test_subskeleton.o +test_subskeleton_lib.skel.h-deps := test_subskeleton_lib2.o test_subskeleton_lib.o LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \ test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \ @@ -404,6 +411,7 @@ $(TRUNNER_BPF_SKELS): %.skel.h: %.o $(BPFTOOL) | $(TRUNNER_OUTPUT) $(Q)$$(BPFTOOL) gen object $$(<:.o=.linked3.o) $$(<:.o=.linked2.o) $(Q)diff $$(<:.o=.linked2.o) $$(<:.o=.linked3.o) $(Q)$$(BPFTOOL) gen skeleton $$(<:.o=.linked3.o) name $$(notdir $$(<:.o=)) > $$@ + $(Q)$$(BPFTOOL) gen subskeleton $$(<:.o=.linked3.o) name $$(notdir $$(<:.o=)) > $$(@:.skel.h=.subskel.h) $(TRUNNER_BPF_LSKELS): %.lskel.h: %.o $(BPFTOOL) | $(TRUNNER_OUTPUT) $$(call msg,GEN-SKEL,$(TRUNNER_BINARY),$$@) @@ -421,6 +429,7 @@ $(TRUNNER_BPF_SKELS_LINKED): $(TRUNNER_BPF_OBJS) $(BPFTOOL) | $(TRUNNER_OUTPUT) $(Q)diff $$(@:.skel.h=.linked2.o) $$(@:.skel.h=.linked3.o) $$(call msg,GEN-SKEL,$(TRUNNER_BINARY),$$@) $(Q)$$(BPFTOOL) gen skeleton $$(@:.skel.h=.linked3.o) name $$(notdir $$(@:.skel.h=)) > $$@ + $(Q)$$(BPFTOOL) gen subskeleton $$(@:.skel.h=.linked3.o) name $$(notdir $$(@:.skel.h=)) > $$(@:.skel.h=.subskel.h) endif # ensure we set up tests.h header generation rule just once @@ -479,7 +488,8 @@ TRUNNER_TESTS_DIR := prog_tests TRUNNER_BPF_PROGS_DIR := progs TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c \ network_helpers.c testing_helpers.c \ - btf_helpers.c flow_dissector_load.h + btf_helpers.c flow_dissector_load.h \ + cap_helpers.c TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \ ima_setup.sh \ $(wildcard progs/btf_dump_test_case_*.c) @@ -557,6 +567,6 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \ EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \ prog_tests/tests.h map_tests/tests.h verifier/tests.h \ feature bpftool \ - $(addprefix $(OUTPUT)/,*.o *.skel.h *.lskel.h no_alu32 bpf_gcc bpf_testmod.ko) + $(addprefix $(OUTPUT)/,*.o *.skel.h *.lskel.h *.subskel.h no_alu32 bpf_gcc bpf_testmod.ko) .PHONY: docs docs-clean diff --git a/tools/testing/selftests/bpf/README.rst b/tools/testing/selftests/bpf/README.rst index d099d91adc3b..eb1b7541f39d 100644 --- a/tools/testing/selftests/bpf/README.rst +++ b/tools/testing/selftests/bpf/README.rst @@ -32,11 +32,19 @@ For more information on about using the script, run: $ tools/testing/selftests/bpf/vmtest.sh -h +In case of linker errors when running selftests, try using static linking: + +.. code-block:: console + + $ LDLIBS=-static vmtest.sh + +.. note:: Some distros may not support static linking. + .. note:: The script uses pahole and clang based on host environment setting. If you want to change pahole and llvm, you can change `PATH` environment variable in the beginning of script. -.. note:: The script currently only supports x86_64. +.. note:: The script currently only supports x86_64 and s390x architectures. Additional information about selftest failures are documented here. diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 27d63be47b95..e585e1cefc77 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -33,6 +33,10 @@ struct bpf_testmod_btf_type_tag_2 { struct bpf_testmod_btf_type_tag_1 __user *p; }; +struct bpf_testmod_btf_type_tag_3 { + struct bpf_testmod_btf_type_tag_1 __percpu *p; +}; + noinline int bpf_testmod_test_btf_type_tag_user_1(struct bpf_testmod_btf_type_tag_1 __user *arg) { BTF_TYPE_EMIT(func_proto_typedef); @@ -46,6 +50,16 @@ bpf_testmod_test_btf_type_tag_user_2(struct bpf_testmod_btf_type_tag_2 *arg) { return arg->p->a; } +noinline int +bpf_testmod_test_btf_type_tag_percpu_1(struct bpf_testmod_btf_type_tag_1 __percpu *arg) { + return arg->a; +} + +noinline int +bpf_testmod_test_btf_type_tag_percpu_2(struct bpf_testmod_btf_type_tag_3 *arg) { + return arg->p->a; +} + noinline int bpf_testmod_loop_test(int n) { int i, sum = 0; diff --git a/tools/testing/selftests/bpf/cap_helpers.c b/tools/testing/selftests/bpf/cap_helpers.c new file mode 100644 index 000000000000..d5ac507401d7 --- /dev/null +++ b/tools/testing/selftests/bpf/cap_helpers.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "cap_helpers.h" + +/* Avoid including <sys/capability.h> from the libcap-devel package, + * so directly declare them here and use them from glibc. + */ +int capget(cap_user_header_t header, cap_user_data_t data); +int capset(cap_user_header_t header, const cap_user_data_t data); + +int cap_enable_effective(__u64 caps, __u64 *old_caps) +{ + struct __user_cap_data_struct data[_LINUX_CAPABILITY_U32S_3]; + struct __user_cap_header_struct hdr = { + .version = _LINUX_CAPABILITY_VERSION_3, + }; + __u32 cap0 = caps; + __u32 cap1 = caps >> 32; + int err; + + err = capget(&hdr, data); + if (err) + return err; + + if (old_caps) + *old_caps = (__u64)(data[1].effective) << 32 | data[0].effective; + + if ((data[0].effective & cap0) == cap0 && + (data[1].effective & cap1) == cap1) + return 0; + + data[0].effective |= cap0; + data[1].effective |= cap1; + err = capset(&hdr, data); + if (err) + return err; + + return 0; +} + +int cap_disable_effective(__u64 caps, __u64 *old_caps) +{ + struct __user_cap_data_struct data[_LINUX_CAPABILITY_U32S_3]; + struct __user_cap_header_struct hdr = { + .version = _LINUX_CAPABILITY_VERSION_3, + }; + __u32 cap0 = caps; + __u32 cap1 = caps >> 32; + int err; + + err = capget(&hdr, data); + if (err) + return err; + + if (old_caps) + *old_caps = (__u64)(data[1].effective) << 32 | data[0].effective; + + if (!(data[0].effective & cap0) && !(data[1].effective & cap1)) + return 0; + + data[0].effective &= ~cap0; + data[1].effective &= ~cap1; + err = capset(&hdr, data); + if (err) + return err; + + return 0; +} diff --git a/tools/testing/selftests/bpf/cap_helpers.h b/tools/testing/selftests/bpf/cap_helpers.h new file mode 100644 index 000000000000..6d163530cb0f --- /dev/null +++ b/tools/testing/selftests/bpf/cap_helpers.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __CAP_HELPERS_H +#define __CAP_HELPERS_H + +#include <linux/types.h> +#include <linux/capability.h> + +#ifndef CAP_PERFMON +#define CAP_PERFMON 38 +#endif + +#ifndef CAP_BPF +#define CAP_BPF 39 +#endif + +int cap_enable_effective(__u64 caps, __u64 *old_caps); +int cap_disable_effective(__u64 caps, __u64 *old_caps); + +#endif diff --git a/tools/testing/selftests/bpf/ima_setup.sh b/tools/testing/selftests/bpf/ima_setup.sh index 8e62581113a3..8ecead4ccad0 100755 --- a/tools/testing/selftests/bpf/ima_setup.sh +++ b/tools/testing/selftests/bpf/ima_setup.sh @@ -12,7 +12,7 @@ LOG_FILE="$(mktemp /tmp/ima_setup.XXXX.log)" usage() { - echo "Usage: $0 <setup|cleanup|run> <existing_tmp_dir>" + echo "Usage: $0 <setup|cleanup|run|modify-bin|restore-bin|load-policy> <existing_tmp_dir>" exit 1 } @@ -51,6 +51,7 @@ setup() ensure_mount_securityfs echo "measure func=BPRM_CHECK fsuuid=${mount_uuid}" > ${IMA_POLICY_FILE} + echo "measure func=BPRM_CHECK fsuuid=${mount_uuid}" > ${mount_dir}/policy_test } cleanup() { @@ -77,6 +78,32 @@ run() exec "${copied_bin_path}" } +modify_bin() +{ + local tmp_dir="$1" + local mount_dir="${tmp_dir}/mnt" + local copied_bin_path="${mount_dir}/$(basename ${TEST_BINARY})" + + echo "mod" >> "${copied_bin_path}" +} + +restore_bin() +{ + local tmp_dir="$1" + local mount_dir="${tmp_dir}/mnt" + local copied_bin_path="${mount_dir}/$(basename ${TEST_BINARY})" + + truncate -s -4 "${copied_bin_path}" +} + +load_policy() +{ + local tmp_dir="$1" + local mount_dir="${tmp_dir}/mnt" + + echo ${mount_dir}/policy_test > ${IMA_POLICY_FILE} 2> /dev/null +} + catch() { local exit_code="$1" @@ -105,6 +132,12 @@ main() cleanup "${tmp_dir}" elif [[ "${action}" == "run" ]]; then run "${tmp_dir}" + elif [[ "${action}" == "modify-bin" ]]; then + modify_bin "${tmp_dir}" + elif [[ "${action}" == "restore-bin" ]]; then + restore_bin "${tmp_dir}" + elif [[ "${action}" == "load-policy" ]]; then + load_policy "${tmp_dir}" else echo "Unknown action: ${action}" exit 1 diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 6db1af8fdee7..2bb1f9b3841d 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -1,18 +1,25 @@ // SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + #include <errno.h> #include <stdbool.h> #include <stdio.h> #include <string.h> #include <unistd.h> +#include <sched.h> #include <arpa/inet.h> +#include <sys/mount.h> +#include <sys/stat.h> #include <linux/err.h> #include <linux/in.h> #include <linux/in6.h> +#include <linux/limits.h> #include "bpf_util.h" #include "network_helpers.h" +#include "test_progs.h" #define clean_errno() (errno == 0 ? "None" : strerror(errno)) #define log_err(MSG, ...) ({ \ @@ -356,3 +363,82 @@ char *ping_command(int family) } return "ping"; } + +struct nstoken { + int orig_netns_fd; +}; + +static int setns_by_fd(int nsfd) +{ + int err; + + err = setns(nsfd, CLONE_NEWNET); + close(nsfd); + + if (!ASSERT_OK(err, "setns")) + return err; + + /* Switch /sys to the new namespace so that e.g. /sys/class/net + * reflects the devices in the new namespace. + */ + err = unshare(CLONE_NEWNS); + if (!ASSERT_OK(err, "unshare")) + return err; + + /* Make our /sys mount private, so the following umount won't + * trigger the global umount in case it's shared. + */ + err = mount("none", "/sys", NULL, MS_PRIVATE, NULL); + if (!ASSERT_OK(err, "remount private /sys")) + return err; + + err = umount2("/sys", MNT_DETACH); + if (!ASSERT_OK(err, "umount2 /sys")) + return err; + + err = mount("sysfs", "/sys", "sysfs", 0, NULL); + if (!ASSERT_OK(err, "mount /sys")) + return err; + + err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL); + if (!ASSERT_OK(err, "mount /sys/fs/bpf")) + return err; + + return 0; +} + +struct nstoken *open_netns(const char *name) +{ + int nsfd; + char nspath[PATH_MAX]; + int err; + struct nstoken *token; + + token = malloc(sizeof(struct nstoken)); + if (!ASSERT_OK_PTR(token, "malloc token")) + return NULL; + + token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); + if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net")) + goto fail; + + snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); + nsfd = open(nspath, O_RDONLY | O_CLOEXEC); + if (!ASSERT_GE(nsfd, 0, "open netns fd")) + goto fail; + + err = setns_by_fd(nsfd); + if (!ASSERT_OK(err, "setns_by_fd")) + goto fail; + + return token; +fail: + free(token); + return NULL; +} + +void close_netns(struct nstoken *token) +{ + ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd"); + free(token); +} diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h index d198181a5648..a4b3b2f9877b 100644 --- a/tools/testing/selftests/bpf/network_helpers.h +++ b/tools/testing/selftests/bpf/network_helpers.h @@ -55,4 +55,13 @@ int make_sockaddr(int family, const char *addr_str, __u16 port, struct sockaddr_storage *addr, socklen_t *len); char *ping_command(int family); +struct nstoken; +/** + * open_netns() - Switch to specified network namespace by name. + * + * Returns token with which to restore the original namespace + * using close_netns(). + */ +struct nstoken *open_netns(const char *name); +void close_netns(struct nstoken *token); #endif diff --git a/tools/testing/selftests/bpf/prog_tests/bind_perm.c b/tools/testing/selftests/bpf/prog_tests/bind_perm.c index eac71fbb24ce..a1766a298bb7 100644 --- a/tools/testing/selftests/bpf/prog_tests/bind_perm.c +++ b/tools/testing/selftests/bpf/prog_tests/bind_perm.c @@ -4,9 +4,9 @@ #include <stdlib.h> #include <sys/types.h> #include <sys/socket.h> -#include <sys/capability.h> #include "test_progs.h" +#include "cap_helpers.h" #include "bind_perm.skel.h" static int duration; @@ -49,41 +49,11 @@ close_socket: close(fd); } -bool cap_net_bind_service(cap_flag_value_t flag) -{ - const cap_value_t cap_net_bind_service = CAP_NET_BIND_SERVICE; - cap_flag_value_t original_value; - bool was_effective = false; - cap_t caps; - - caps = cap_get_proc(); - if (CHECK(!caps, "cap_get_proc", "errno %d", errno)) - goto free_caps; - - if (CHECK(cap_get_flag(caps, CAP_NET_BIND_SERVICE, CAP_EFFECTIVE, - &original_value), - "cap_get_flag", "errno %d", errno)) - goto free_caps; - - was_effective = (original_value == CAP_SET); - - if (CHECK(cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_net_bind_service, - flag), - "cap_set_flag", "errno %d", errno)) - goto free_caps; - - if (CHECK(cap_set_proc(caps), "cap_set_proc", "errno %d", errno)) - goto free_caps; - -free_caps: - CHECK(cap_free(caps), "cap_free", "errno %d", errno); - return was_effective; -} - void test_bind_perm(void) { - bool cap_was_effective; + const __u64 net_bind_svc_cap = 1ULL << CAP_NET_BIND_SERVICE; struct bind_perm *skel; + __u64 old_caps = 0; int cgroup_fd; if (create_netns()) @@ -105,7 +75,8 @@ void test_bind_perm(void) if (!ASSERT_OK_PTR(skel, "bind_v6_prog")) goto close_skeleton; - cap_was_effective = cap_net_bind_service(CAP_CLEAR); + ASSERT_OK(cap_disable_effective(net_bind_svc_cap, &old_caps), + "cap_disable_effective"); try_bind(AF_INET, 110, EACCES); try_bind(AF_INET6, 110, EACCES); @@ -113,8 +84,9 @@ void test_bind_perm(void) try_bind(AF_INET, 111, 0); try_bind(AF_INET6, 111, 0); - if (cap_was_effective) - cap_net_bind_service(CAP_SET); + if (old_caps & net_bind_svc_cap) + ASSERT_OK(cap_enable_effective(net_bind_svc_cap, NULL), + "cap_enable_effective"); close_skeleton: bind_perm__destroy(skel); diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c index cd10df6cd0fc..923a6139b2d8 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c @@ -7,6 +7,7 @@ #include <unistd.h> #include <test_progs.h> #include "test_bpf_cookie.skel.h" +#include "kprobe_multi.skel.h" /* uprobe attach point */ static void trigger_func(void) @@ -63,6 +64,178 @@ cleanup: bpf_link__destroy(retlink2); } +static void kprobe_multi_test_run(struct kprobe_multi *skel) +{ + LIBBPF_OPTS(bpf_test_run_opts, topts); + int err, prog_fd; + + prog_fd = bpf_program__fd(skel->progs.trigger); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run"); + ASSERT_EQ(topts.retval, 0, "test_run"); + + ASSERT_EQ(skel->bss->kprobe_test1_result, 1, "kprobe_test1_result"); + ASSERT_EQ(skel->bss->kprobe_test2_result, 1, "kprobe_test2_result"); + ASSERT_EQ(skel->bss->kprobe_test3_result, 1, "kprobe_test3_result"); + ASSERT_EQ(skel->bss->kprobe_test4_result, 1, "kprobe_test4_result"); + ASSERT_EQ(skel->bss->kprobe_test5_result, 1, "kprobe_test5_result"); + ASSERT_EQ(skel->bss->kprobe_test6_result, 1, "kprobe_test6_result"); + ASSERT_EQ(skel->bss->kprobe_test7_result, 1, "kprobe_test7_result"); + ASSERT_EQ(skel->bss->kprobe_test8_result, 1, "kprobe_test8_result"); + + ASSERT_EQ(skel->bss->kretprobe_test1_result, 1, "kretprobe_test1_result"); + ASSERT_EQ(skel->bss->kretprobe_test2_result, 1, "kretprobe_test2_result"); + ASSERT_EQ(skel->bss->kretprobe_test3_result, 1, "kretprobe_test3_result"); + ASSERT_EQ(skel->bss->kretprobe_test4_result, 1, "kretprobe_test4_result"); + ASSERT_EQ(skel->bss->kretprobe_test5_result, 1, "kretprobe_test5_result"); + ASSERT_EQ(skel->bss->kretprobe_test6_result, 1, "kretprobe_test6_result"); + ASSERT_EQ(skel->bss->kretprobe_test7_result, 1, "kretprobe_test7_result"); + ASSERT_EQ(skel->bss->kretprobe_test8_result, 1, "kretprobe_test8_result"); +} + +static void kprobe_multi_link_api_subtest(void) +{ + int prog_fd, link1_fd = -1, link2_fd = -1; + struct kprobe_multi *skel = NULL; + LIBBPF_OPTS(bpf_link_create_opts, opts); + unsigned long long addrs[8]; + __u64 cookies[8]; + + if (!ASSERT_OK(load_kallsyms(), "load_kallsyms")) + goto cleanup; + + skel = kprobe_multi__open_and_load(); + if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load")) + goto cleanup; + + skel->bss->pid = getpid(); + skel->bss->test_cookie = true; + +#define GET_ADDR(__sym, __addr) ({ \ + __addr = ksym_get_addr(__sym); \ + if (!ASSERT_NEQ(__addr, 0, "ksym_get_addr " #__sym)) \ + goto cleanup; \ +}) + + GET_ADDR("bpf_fentry_test1", addrs[0]); + GET_ADDR("bpf_fentry_test2", addrs[1]); + GET_ADDR("bpf_fentry_test3", addrs[2]); + GET_ADDR("bpf_fentry_test4", addrs[3]); + GET_ADDR("bpf_fentry_test5", addrs[4]); + GET_ADDR("bpf_fentry_test6", addrs[5]); + GET_ADDR("bpf_fentry_test7", addrs[6]); + GET_ADDR("bpf_fentry_test8", addrs[7]); + +#undef GET_ADDR + + cookies[0] = 1; + cookies[1] = 2; + cookies[2] = 3; + cookies[3] = 4; + cookies[4] = 5; + cookies[5] = 6; + cookies[6] = 7; + cookies[7] = 8; + + opts.kprobe_multi.addrs = (const unsigned long *) &addrs; + opts.kprobe_multi.cnt = ARRAY_SIZE(addrs); + opts.kprobe_multi.cookies = (const __u64 *) &cookies; + prog_fd = bpf_program__fd(skel->progs.test_kprobe); + + link1_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_KPROBE_MULTI, &opts); + if (!ASSERT_GE(link1_fd, 0, "link1_fd")) + goto cleanup; + + cookies[0] = 8; + cookies[1] = 7; + cookies[2] = 6; + cookies[3] = 5; + cookies[4] = 4; + cookies[5] = 3; + cookies[6] = 2; + cookies[7] = 1; + + opts.kprobe_multi.flags = BPF_F_KPROBE_MULTI_RETURN; + prog_fd = bpf_program__fd(skel->progs.test_kretprobe); + + link2_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_KPROBE_MULTI, &opts); + if (!ASSERT_GE(link2_fd, 0, "link2_fd")) + goto cleanup; + + kprobe_multi_test_run(skel); + +cleanup: + close(link1_fd); + close(link2_fd); + kprobe_multi__destroy(skel); +} + +static void kprobe_multi_attach_api_subtest(void) +{ + struct bpf_link *link1 = NULL, *link2 = NULL; + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); + LIBBPF_OPTS(bpf_test_run_opts, topts); + struct kprobe_multi *skel = NULL; + const char *syms[8] = { + "bpf_fentry_test1", + "bpf_fentry_test2", + "bpf_fentry_test3", + "bpf_fentry_test4", + "bpf_fentry_test5", + "bpf_fentry_test6", + "bpf_fentry_test7", + "bpf_fentry_test8", + }; + __u64 cookies[8]; + + skel = kprobe_multi__open_and_load(); + if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load")) + goto cleanup; + + skel->bss->pid = getpid(); + skel->bss->test_cookie = true; + + cookies[0] = 1; + cookies[1] = 2; + cookies[2] = 3; + cookies[3] = 4; + cookies[4] = 5; + cookies[5] = 6; + cookies[6] = 7; + cookies[7] = 8; + + opts.syms = syms; + opts.cnt = ARRAY_SIZE(syms); + opts.cookies = cookies; + + link1 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe, + NULL, &opts); + if (!ASSERT_OK_PTR(link1, "bpf_program__attach_kprobe_multi_opts")) + goto cleanup; + + cookies[0] = 8; + cookies[1] = 7; + cookies[2] = 6; + cookies[3] = 5; + cookies[4] = 4; + cookies[5] = 3; + cookies[6] = 2; + cookies[7] = 1; + + opts.retprobe = true; + + link2 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kretprobe, + NULL, &opts); + if (!ASSERT_OK_PTR(link2, "bpf_program__attach_kprobe_multi_opts")) + goto cleanup; + + kprobe_multi_test_run(skel); + +cleanup: + bpf_link__destroy(link2); + bpf_link__destroy(link1); + kprobe_multi__destroy(skel); +} static void uprobe_subtest(struct test_bpf_cookie *skel) { DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts); @@ -199,7 +372,7 @@ static void pe_subtest(struct test_bpf_cookie *skel) attr.type = PERF_TYPE_SOFTWARE; attr.config = PERF_COUNT_SW_CPU_CLOCK; attr.freq = 1; - attr.sample_freq = 4000; + attr.sample_freq = 1000; pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC); if (!ASSERT_GE(pfd, 0, "perf_fd")) goto cleanup; @@ -249,6 +422,10 @@ void test_bpf_cookie(void) if (test__start_subtest("kprobe")) kprobe_subtest(skel); + if (test__start_subtest("multi_kprobe_link_api")) + kprobe_multi_link_api_subtest(); + if (test__start_subtest("multi_kprobe_attach_api")) + kprobe_multi_attach_api_subtest(); if (test__start_subtest("uprobe")) uprobe_subtest(skel); if (test__start_subtest("tracepoint")) diff --git a/tools/testing/selftests/bpf/prog_tests/btf_tag.c b/tools/testing/selftests/bpf/prog_tests/btf_tag.c index f7560b54a6bb..071430cd54de 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf_tag.c +++ b/tools/testing/selftests/bpf/prog_tests/btf_tag.c @@ -10,6 +10,7 @@ struct btf_type_tag_test { }; #include "btf_type_tag.skel.h" #include "btf_type_tag_user.skel.h" +#include "btf_type_tag_percpu.skel.h" static void test_btf_decl_tag(void) { @@ -43,38 +44,81 @@ static void test_btf_type_tag(void) btf_type_tag__destroy(skel); } -static void test_btf_type_tag_mod_user(bool load_test_user1) +/* loads vmlinux_btf as well as module_btf. If the caller passes NULL as + * module_btf, it will not load module btf. + * + * Returns 0 on success. + * Return -1 On error. In case of error, the loaded btf will be freed and the + * input parameters will be set to pointing to NULL. + */ +static int load_btfs(struct btf **vmlinux_btf, struct btf **module_btf, + bool needs_vmlinux_tag) { const char *module_name = "bpf_testmod"; - struct btf *vmlinux_btf, *module_btf; - struct btf_type_tag_user *skel; __s32 type_id; - int err; if (!env.has_testmod) { test__skip(); - return; + return -1; } - /* skip the test if the module does not have __user tags */ - vmlinux_btf = btf__load_vmlinux_btf(); - if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF")) - return; + *vmlinux_btf = btf__load_vmlinux_btf(); + if (!ASSERT_OK_PTR(*vmlinux_btf, "could not load vmlinux BTF")) + return -1; + + if (!needs_vmlinux_tag) + goto load_module_btf; - module_btf = btf__load_module_btf(module_name, vmlinux_btf); - if (!ASSERT_OK_PTR(module_btf, "could not load module BTF")) + /* skip the test if the vmlinux does not have __user tags */ + type_id = btf__find_by_name_kind(*vmlinux_btf, "user", BTF_KIND_TYPE_TAG); + if (type_id <= 0) { + printf("%s:SKIP: btf_type_tag attribute not in vmlinux btf", __func__); + test__skip(); goto free_vmlinux_btf; + } - type_id = btf__find_by_name_kind(module_btf, "user", BTF_KIND_TYPE_TAG); +load_module_btf: + /* skip loading module_btf, if not requested by caller */ + if (!module_btf) + return 0; + + *module_btf = btf__load_module_btf(module_name, *vmlinux_btf); + if (!ASSERT_OK_PTR(*module_btf, "could not load module BTF")) + goto free_vmlinux_btf; + + /* skip the test if the module does not have __user tags */ + type_id = btf__find_by_name_kind(*module_btf, "user", BTF_KIND_TYPE_TAG); if (type_id <= 0) { printf("%s:SKIP: btf_type_tag attribute not in %s", __func__, module_name); test__skip(); goto free_module_btf; } + return 0; + +free_module_btf: + btf__free(*module_btf); +free_vmlinux_btf: + btf__free(*vmlinux_btf); + + *vmlinux_btf = NULL; + if (module_btf) + *module_btf = NULL; + return -1; +} + +static void test_btf_type_tag_mod_user(bool load_test_user1) +{ + struct btf *vmlinux_btf = NULL, *module_btf = NULL; + struct btf_type_tag_user *skel; + int err; + + if (load_btfs(&vmlinux_btf, &module_btf, /*needs_vmlinux_tag=*/false)) + return; + skel = btf_type_tag_user__open(); if (!ASSERT_OK_PTR(skel, "btf_type_tag_user")) - goto free_module_btf; + goto cleanup; bpf_program__set_autoload(skel->progs.test_sys_getsockname, false); if (load_test_user1) @@ -87,34 +131,23 @@ static void test_btf_type_tag_mod_user(bool load_test_user1) btf_type_tag_user__destroy(skel); -free_module_btf: +cleanup: btf__free(module_btf); -free_vmlinux_btf: btf__free(vmlinux_btf); } static void test_btf_type_tag_vmlinux_user(void) { struct btf_type_tag_user *skel; - struct btf *vmlinux_btf; - __s32 type_id; + struct btf *vmlinux_btf = NULL; int err; - /* skip the test if the vmlinux does not have __user tags */ - vmlinux_btf = btf__load_vmlinux_btf(); - if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF")) + if (load_btfs(&vmlinux_btf, NULL, /*needs_vmlinux_tag=*/true)) return; - type_id = btf__find_by_name_kind(vmlinux_btf, "user", BTF_KIND_TYPE_TAG); - if (type_id <= 0) { - printf("%s:SKIP: btf_type_tag attribute not in vmlinux btf", __func__); - test__skip(); - goto free_vmlinux_btf; - } - skel = btf_type_tag_user__open(); if (!ASSERT_OK_PTR(skel, "btf_type_tag_user")) - goto free_vmlinux_btf; + goto cleanup; bpf_program__set_autoload(skel->progs.test_user2, false); bpf_program__set_autoload(skel->progs.test_user1, false); @@ -124,7 +157,70 @@ static void test_btf_type_tag_vmlinux_user(void) btf_type_tag_user__destroy(skel); -free_vmlinux_btf: +cleanup: + btf__free(vmlinux_btf); +} + +static void test_btf_type_tag_mod_percpu(bool load_test_percpu1) +{ + struct btf *vmlinux_btf, *module_btf; + struct btf_type_tag_percpu *skel; + int err; + + if (load_btfs(&vmlinux_btf, &module_btf, /*needs_vmlinux_tag=*/false)) + return; + + skel = btf_type_tag_percpu__open(); + if (!ASSERT_OK_PTR(skel, "btf_type_tag_percpu")) + goto cleanup; + + bpf_program__set_autoload(skel->progs.test_percpu_load, false); + bpf_program__set_autoload(skel->progs.test_percpu_helper, false); + if (load_test_percpu1) + bpf_program__set_autoload(skel->progs.test_percpu2, false); + else + bpf_program__set_autoload(skel->progs.test_percpu1, false); + + err = btf_type_tag_percpu__load(skel); + ASSERT_ERR(err, "btf_type_tag_percpu"); + + btf_type_tag_percpu__destroy(skel); + +cleanup: + btf__free(module_btf); + btf__free(vmlinux_btf); +} + +static void test_btf_type_tag_vmlinux_percpu(bool load_test) +{ + struct btf_type_tag_percpu *skel; + struct btf *vmlinux_btf = NULL; + int err; + + if (load_btfs(&vmlinux_btf, NULL, /*needs_vmlinux_tag=*/true)) + return; + + skel = btf_type_tag_percpu__open(); + if (!ASSERT_OK_PTR(skel, "btf_type_tag_percpu")) + goto cleanup; + + bpf_program__set_autoload(skel->progs.test_percpu2, false); + bpf_program__set_autoload(skel->progs.test_percpu1, false); + if (load_test) { + bpf_program__set_autoload(skel->progs.test_percpu_helper, false); + + err = btf_type_tag_percpu__load(skel); + ASSERT_ERR(err, "btf_type_tag_percpu_load"); + } else { + bpf_program__set_autoload(skel->progs.test_percpu_load, false); + + err = btf_type_tag_percpu__load(skel); + ASSERT_OK(err, "btf_type_tag_percpu_helper"); + } + + btf_type_tag_percpu__destroy(skel); + +cleanup: btf__free(vmlinux_btf); } @@ -134,10 +230,20 @@ void test_btf_tag(void) test_btf_decl_tag(); if (test__start_subtest("btf_type_tag")) test_btf_type_tag(); + if (test__start_subtest("btf_type_tag_user_mod1")) test_btf_type_tag_mod_user(true); if (test__start_subtest("btf_type_tag_user_mod2")) test_btf_type_tag_mod_user(false); if (test__start_subtest("btf_type_tag_sys_user_vmlinux")) test_btf_type_tag_vmlinux_user(); + + if (test__start_subtest("btf_type_tag_percpu_mod1")) + test_btf_type_tag_mod_percpu(true); + if (test__start_subtest("btf_type_tag_percpu_mod2")) + test_btf_type_tag_mod_percpu(false); + if (test__start_subtest("btf_type_tag_percpu_vmlinux_load")) + test_btf_type_tag_vmlinux_percpu(true); + if (test__start_subtest("btf_type_tag_percpu_vmlinux_helper")) + test_btf_type_tag_vmlinux_percpu(false); } diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_autodetach.c b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_autodetach.c index 858916d11e2e..9367bd2f0ae1 100644 --- a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_autodetach.c +++ b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_autodetach.c @@ -14,7 +14,7 @@ static int prog_load(void) BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = 1 */ BPF_EXIT_INSN(), }; - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); + size_t insns_cnt = ARRAY_SIZE(prog); return bpf_test_load_program(BPF_PROG_TYPE_CGROUP_SKB, prog, insns_cnt, "GPL", 0, diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c index 38b3c47293da..db0b7bac78d1 100644 --- a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c +++ b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c @@ -63,7 +63,7 @@ static int prog_load_cnt(int verdict, int val) BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ BPF_EXIT_INSN(), }; - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); + size_t insns_cnt = ARRAY_SIZE(prog); int ret; ret = bpf_test_load_program(BPF_PROG_TYPE_CGROUP_SKB, diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_override.c b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_override.c index 356547e849e2..9421a5b7f4e1 100644 --- a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_override.c +++ b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_override.c @@ -16,7 +16,7 @@ static int prog_load(int verdict) BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ BPF_EXIT_INSN(), }; - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); + size_t insns_cnt = ARRAY_SIZE(prog); return bpf_test_load_program(BPF_PROG_TYPE_CGROUP_SKB, prog, insns_cnt, "GPL", 0, diff --git a/tools/testing/selftests/bpf/prog_tests/custom_sec_handlers.c b/tools/testing/selftests/bpf/prog_tests/custom_sec_handlers.c new file mode 100644 index 000000000000..b2dfc5954aea --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/custom_sec_handlers.c @@ -0,0 +1,176 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Facebook */ + +#include <test_progs.h> +#include "test_custom_sec_handlers.skel.h" + +#define COOKIE_ABC1 1 +#define COOKIE_ABC2 2 +#define COOKIE_CUSTOM 3 +#define COOKIE_FALLBACK 4 +#define COOKIE_KPROBE 5 + +static int custom_setup_prog(struct bpf_program *prog, long cookie) +{ + if (cookie == COOKIE_ABC1) + bpf_program__set_autoload(prog, false); + + return 0; +} + +static int custom_prepare_load_prog(struct bpf_program *prog, + struct bpf_prog_load_opts *opts, long cookie) +{ + if (cookie == COOKIE_FALLBACK) + opts->prog_flags |= BPF_F_SLEEPABLE; + else if (cookie == COOKIE_ABC1) + ASSERT_FALSE(true, "unexpected preload for abc"); + + return 0; +} + +static int custom_attach_prog(const struct bpf_program *prog, long cookie, + struct bpf_link **link) +{ + switch (cookie) { + case COOKIE_ABC2: + *link = bpf_program__attach_raw_tracepoint(prog, "sys_enter"); + return libbpf_get_error(*link); + case COOKIE_CUSTOM: + *link = bpf_program__attach_tracepoint(prog, "syscalls", "sys_enter_nanosleep"); + return libbpf_get_error(*link); + case COOKIE_KPROBE: + case COOKIE_FALLBACK: + /* no auto-attach for SEC("xyz") and SEC("kprobe") */ + *link = NULL; + return 0; + default: + ASSERT_FALSE(true, "unexpected cookie"); + return -EINVAL; + } +} + +static int abc1_id; +static int abc2_id; +static int custom_id; +static int fallback_id; +static int kprobe_id; + +__attribute__((constructor)) +static void register_sec_handlers(void) +{ + LIBBPF_OPTS(libbpf_prog_handler_opts, abc1_opts, + .cookie = COOKIE_ABC1, + .prog_setup_fn = custom_setup_prog, + .prog_prepare_load_fn = custom_prepare_load_prog, + .prog_attach_fn = NULL, + ); + LIBBPF_OPTS(libbpf_prog_handler_opts, abc2_opts, + .cookie = COOKIE_ABC2, + .prog_setup_fn = custom_setup_prog, + .prog_prepare_load_fn = custom_prepare_load_prog, + .prog_attach_fn = custom_attach_prog, + ); + LIBBPF_OPTS(libbpf_prog_handler_opts, custom_opts, + .cookie = COOKIE_CUSTOM, + .prog_setup_fn = NULL, + .prog_prepare_load_fn = NULL, + .prog_attach_fn = custom_attach_prog, + ); + + abc1_id = libbpf_register_prog_handler("abc", BPF_PROG_TYPE_RAW_TRACEPOINT, 0, &abc1_opts); + abc2_id = libbpf_register_prog_handler("abc/", BPF_PROG_TYPE_RAW_TRACEPOINT, 0, &abc2_opts); + custom_id = libbpf_register_prog_handler("custom+", BPF_PROG_TYPE_TRACEPOINT, 0, &custom_opts); +} + +__attribute__((destructor)) +static void unregister_sec_handlers(void) +{ + libbpf_unregister_prog_handler(abc1_id); + libbpf_unregister_prog_handler(abc2_id); + libbpf_unregister_prog_handler(custom_id); +} + +void test_custom_sec_handlers(void) +{ + LIBBPF_OPTS(libbpf_prog_handler_opts, opts, + .prog_setup_fn = custom_setup_prog, + .prog_prepare_load_fn = custom_prepare_load_prog, + .prog_attach_fn = custom_attach_prog, + ); + struct test_custom_sec_handlers* skel; + int err; + + ASSERT_GT(abc1_id, 0, "abc1_id"); + ASSERT_GT(abc2_id, 0, "abc2_id"); + ASSERT_GT(custom_id, 0, "custom_id"); + + /* override libbpf's handle of SEC("kprobe/...") but also allow pure + * SEC("kprobe") due to "kprobe+" specifier. Register it as + * TRACEPOINT, just for fun. + */ + opts.cookie = COOKIE_KPROBE; + kprobe_id = libbpf_register_prog_handler("kprobe+", BPF_PROG_TYPE_TRACEPOINT, 0, &opts); + /* fallback treats everything as BPF_PROG_TYPE_SYSCALL program to test + * setting custom BPF_F_SLEEPABLE bit in preload handler + */ + opts.cookie = COOKIE_FALLBACK; + fallback_id = libbpf_register_prog_handler(NULL, BPF_PROG_TYPE_SYSCALL, 0, &opts); + + if (!ASSERT_GT(fallback_id, 0, "fallback_id") /* || !ASSERT_GT(kprobe_id, 0, "kprobe_id")*/) { + if (fallback_id > 0) + libbpf_unregister_prog_handler(fallback_id); + if (kprobe_id > 0) + libbpf_unregister_prog_handler(kprobe_id); + return; + } + + /* open skeleton and validate assumptions */ + skel = test_custom_sec_handlers__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + goto cleanup; + + ASSERT_EQ(bpf_program__type(skel->progs.abc1), BPF_PROG_TYPE_RAW_TRACEPOINT, "abc1_type"); + ASSERT_FALSE(bpf_program__autoload(skel->progs.abc1), "abc1_autoload"); + + ASSERT_EQ(bpf_program__type(skel->progs.abc2), BPF_PROG_TYPE_RAW_TRACEPOINT, "abc2_type"); + ASSERT_EQ(bpf_program__type(skel->progs.custom1), BPF_PROG_TYPE_TRACEPOINT, "custom1_type"); + ASSERT_EQ(bpf_program__type(skel->progs.custom2), BPF_PROG_TYPE_TRACEPOINT, "custom2_type"); + ASSERT_EQ(bpf_program__type(skel->progs.kprobe1), BPF_PROG_TYPE_TRACEPOINT, "kprobe1_type"); + ASSERT_EQ(bpf_program__type(skel->progs.xyz), BPF_PROG_TYPE_SYSCALL, "xyz_type"); + + skel->rodata->my_pid = getpid(); + + /* now attempt to load everything */ + err = test_custom_sec_handlers__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + /* now try to auto-attach everything */ + err = test_custom_sec_handlers__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto cleanup; + + skel->links.xyz = bpf_program__attach(skel->progs.kprobe1); + ASSERT_EQ(errno, EOPNOTSUPP, "xyz_attach_err"); + ASSERT_ERR_PTR(skel->links.xyz, "xyz_attach"); + + /* trigger programs */ + usleep(1); + + /* SEC("abc") is set to not auto-loaded */ + ASSERT_FALSE(skel->bss->abc1_called, "abc1_called"); + ASSERT_TRUE(skel->bss->abc2_called, "abc2_called"); + ASSERT_TRUE(skel->bss->custom1_called, "custom1_called"); + ASSERT_TRUE(skel->bss->custom2_called, "custom2_called"); + /* SEC("kprobe") shouldn't be auto-attached */ + ASSERT_FALSE(skel->bss->kprobe1_called, "kprobe1_called"); + /* SEC("xyz") shouldn't be auto-attached */ + ASSERT_FALSE(skel->bss->xyz_called, "xyz_called"); + +cleanup: + test_custom_sec_handlers__destroy(skel); + + ASSERT_OK(libbpf_unregister_prog_handler(fallback_id), "unregister_fallback"); + ASSERT_OK(libbpf_unregister_prog_handler(kprobe_id), "unregister_kprobe"); +} diff --git a/tools/testing/selftests/bpf/prog_tests/find_vma.c b/tools/testing/selftests/bpf/prog_tests/find_vma.c index b74b3c0c555a..5165b38f0e59 100644 --- a/tools/testing/selftests/bpf/prog_tests/find_vma.c +++ b/tools/testing/selftests/bpf/prog_tests/find_vma.c @@ -7,12 +7,14 @@ #include "find_vma_fail1.skel.h" #include "find_vma_fail2.skel.h" -static void test_and_reset_skel(struct find_vma *skel, int expected_find_zero_ret) +static void test_and_reset_skel(struct find_vma *skel, int expected_find_zero_ret, bool need_test) { - ASSERT_EQ(skel->bss->found_vm_exec, 1, "found_vm_exec"); - ASSERT_EQ(skel->data->find_addr_ret, 0, "find_addr_ret"); - ASSERT_EQ(skel->data->find_zero_ret, expected_find_zero_ret, "find_zero_ret"); - ASSERT_OK_PTR(strstr(skel->bss->d_iname, "test_progs"), "find_test_progs"); + if (need_test) { + ASSERT_EQ(skel->bss->found_vm_exec, 1, "found_vm_exec"); + ASSERT_EQ(skel->data->find_addr_ret, 0, "find_addr_ret"); + ASSERT_EQ(skel->data->find_zero_ret, expected_find_zero_ret, "find_zero_ret"); + ASSERT_OK_PTR(strstr(skel->bss->d_iname, "test_progs"), "find_test_progs"); + } skel->bss->found_vm_exec = 0; skel->data->find_addr_ret = -1; @@ -30,17 +32,26 @@ static int open_pe(void) attr.type = PERF_TYPE_HARDWARE; attr.config = PERF_COUNT_HW_CPU_CYCLES; attr.freq = 1; - attr.sample_freq = 4000; + attr.sample_freq = 1000; pfd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, PERF_FLAG_FD_CLOEXEC); return pfd >= 0 ? pfd : -errno; } +static bool find_vma_pe_condition(struct find_vma *skel) +{ + return skel->bss->found_vm_exec == 0 || + skel->data->find_addr_ret != 0 || + skel->data->find_zero_ret == -1 || + strcmp(skel->bss->d_iname, "test_progs") != 0; +} + static void test_find_vma_pe(struct find_vma *skel) { struct bpf_link *link = NULL; volatile int j = 0; int pfd, i; + const int one_bn = 1000000000; pfd = open_pe(); if (pfd < 0) { @@ -57,10 +68,10 @@ static void test_find_vma_pe(struct find_vma *skel) if (!ASSERT_OK_PTR(link, "attach_perf_event")) goto cleanup; - for (i = 0; i < 1000000; ++i) + for (i = 0; i < one_bn && find_vma_pe_condition(skel); ++i) ++j; - test_and_reset_skel(skel, -EBUSY /* in nmi, irq_work is busy */); + test_and_reset_skel(skel, -EBUSY /* in nmi, irq_work is busy */, i == one_bn); cleanup: bpf_link__destroy(link); close(pfd); @@ -75,7 +86,7 @@ static void test_find_vma_kprobe(struct find_vma *skel) return; getpgid(skel->bss->target_pid); - test_and_reset_skel(skel, -ENOENT /* could not find vma for ptr 0 */); + test_and_reset_skel(skel, -ENOENT /* could not find vma for ptr 0 */, true); } static void test_illegal_write_vma(void) @@ -108,7 +119,6 @@ void serial_test_find_vma(void) skel->bss->addr = (__u64)(uintptr_t)test_find_vma_pe; test_find_vma_pe(skel); - usleep(100000); /* allow the irq_work to finish */ test_find_vma_kprobe(skel); find_vma__destroy(skel); diff --git a/tools/testing/selftests/bpf/prog_tests/global_data.c b/tools/testing/selftests/bpf/prog_tests/global_data.c index 6fb3d3155c35..027685858925 100644 --- a/tools/testing/selftests/bpf/prog_tests/global_data.c +++ b/tools/testing/selftests/bpf/prog_tests/global_data.c @@ -29,7 +29,7 @@ static void test_global_data_number(struct bpf_object *obj, __u32 duration) { "relocate .rodata reference", 10, ~0 }, }; - for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { + for (i = 0; i < ARRAY_SIZE(tests); i++) { err = bpf_map_lookup_elem(map_fd, &tests[i].key, &num); CHECK(err || num != tests[i].num, tests[i].name, "err %d result %llx expected %llx\n", @@ -58,7 +58,7 @@ static void test_global_data_string(struct bpf_object *obj, __u32 duration) { "relocate .bss reference", 4, "\0\0hello" }, }; - for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { + for (i = 0; i < ARRAY_SIZE(tests); i++) { err = bpf_map_lookup_elem(map_fd, &tests[i].key, str); CHECK(err || memcmp(str, tests[i].str, sizeof(str)), tests[i].name, "err %d result \'%s\' expected \'%s\'\n", @@ -92,7 +92,7 @@ static void test_global_data_struct(struct bpf_object *obj, __u32 duration) { "relocate .data reference", 3, { 41, 0xeeeeefef, 0x2111111111111111ULL, } }, }; - for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { + for (i = 0; i < ARRAY_SIZE(tests); i++) { err = bpf_map_lookup_elem(map_fd, &tests[i].key, &val); CHECK(err || memcmp(&val, &tests[i].val, sizeof(val)), tests[i].name, "err %d result { %u, %u, %llu } expected { %u, %u, %llu }\n", diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c new file mode 100644 index 000000000000..b9876b55fc0c --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c @@ -0,0 +1,323 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <test_progs.h> +#include "kprobe_multi.skel.h" +#include "trace_helpers.h" + +static void kprobe_multi_test_run(struct kprobe_multi *skel, bool test_return) +{ + LIBBPF_OPTS(bpf_test_run_opts, topts); + int err, prog_fd; + + prog_fd = bpf_program__fd(skel->progs.trigger); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run"); + ASSERT_EQ(topts.retval, 0, "test_run"); + + ASSERT_EQ(skel->bss->kprobe_test1_result, 1, "kprobe_test1_result"); + ASSERT_EQ(skel->bss->kprobe_test2_result, 1, "kprobe_test2_result"); + ASSERT_EQ(skel->bss->kprobe_test3_result, 1, "kprobe_test3_result"); + ASSERT_EQ(skel->bss->kprobe_test4_result, 1, "kprobe_test4_result"); + ASSERT_EQ(skel->bss->kprobe_test5_result, 1, "kprobe_test5_result"); + ASSERT_EQ(skel->bss->kprobe_test6_result, 1, "kprobe_test6_result"); + ASSERT_EQ(skel->bss->kprobe_test7_result, 1, "kprobe_test7_result"); + ASSERT_EQ(skel->bss->kprobe_test8_result, 1, "kprobe_test8_result"); + + if (test_return) { + ASSERT_EQ(skel->bss->kretprobe_test1_result, 1, "kretprobe_test1_result"); + ASSERT_EQ(skel->bss->kretprobe_test2_result, 1, "kretprobe_test2_result"); + ASSERT_EQ(skel->bss->kretprobe_test3_result, 1, "kretprobe_test3_result"); + ASSERT_EQ(skel->bss->kretprobe_test4_result, 1, "kretprobe_test4_result"); + ASSERT_EQ(skel->bss->kretprobe_test5_result, 1, "kretprobe_test5_result"); + ASSERT_EQ(skel->bss->kretprobe_test6_result, 1, "kretprobe_test6_result"); + ASSERT_EQ(skel->bss->kretprobe_test7_result, 1, "kretprobe_test7_result"); + ASSERT_EQ(skel->bss->kretprobe_test8_result, 1, "kretprobe_test8_result"); + } +} + +static void test_skel_api(void) +{ + struct kprobe_multi *skel = NULL; + int err; + + skel = kprobe_multi__open_and_load(); + if (!ASSERT_OK_PTR(skel, "kprobe_multi__open_and_load")) + goto cleanup; + + skel->bss->pid = getpid(); + err = kprobe_multi__attach(skel); + if (!ASSERT_OK(err, "kprobe_multi__attach")) + goto cleanup; + + kprobe_multi_test_run(skel, true); + +cleanup: + kprobe_multi__destroy(skel); +} + +static void test_link_api(struct bpf_link_create_opts *opts) +{ + int prog_fd, link1_fd = -1, link2_fd = -1; + struct kprobe_multi *skel = NULL; + + skel = kprobe_multi__open_and_load(); + if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load")) + goto cleanup; + + skel->bss->pid = getpid(); + prog_fd = bpf_program__fd(skel->progs.test_kprobe); + link1_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_KPROBE_MULTI, opts); + if (!ASSERT_GE(link1_fd, 0, "link_fd")) + goto cleanup; + + opts->kprobe_multi.flags = BPF_F_KPROBE_MULTI_RETURN; + prog_fd = bpf_program__fd(skel->progs.test_kretprobe); + link2_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_KPROBE_MULTI, opts); + if (!ASSERT_GE(link2_fd, 0, "link_fd")) + goto cleanup; + + kprobe_multi_test_run(skel, true); + +cleanup: + if (link1_fd != -1) + close(link1_fd); + if (link2_fd != -1) + close(link2_fd); + kprobe_multi__destroy(skel); +} + +#define GET_ADDR(__sym, __addr) ({ \ + __addr = ksym_get_addr(__sym); \ + if (!ASSERT_NEQ(__addr, 0, "kallsyms load failed for " #__sym)) \ + return; \ +}) + +static void test_link_api_addrs(void) +{ + LIBBPF_OPTS(bpf_link_create_opts, opts); + unsigned long long addrs[8]; + + GET_ADDR("bpf_fentry_test1", addrs[0]); + GET_ADDR("bpf_fentry_test2", addrs[1]); + GET_ADDR("bpf_fentry_test3", addrs[2]); + GET_ADDR("bpf_fentry_test4", addrs[3]); + GET_ADDR("bpf_fentry_test5", addrs[4]); + GET_ADDR("bpf_fentry_test6", addrs[5]); + GET_ADDR("bpf_fentry_test7", addrs[6]); + GET_ADDR("bpf_fentry_test8", addrs[7]); + + opts.kprobe_multi.addrs = (const unsigned long*) addrs; + opts.kprobe_multi.cnt = ARRAY_SIZE(addrs); + test_link_api(&opts); +} + +static void test_link_api_syms(void) +{ + LIBBPF_OPTS(bpf_link_create_opts, opts); + const char *syms[8] = { + "bpf_fentry_test1", + "bpf_fentry_test2", + "bpf_fentry_test3", + "bpf_fentry_test4", + "bpf_fentry_test5", + "bpf_fentry_test6", + "bpf_fentry_test7", + "bpf_fentry_test8", + }; + + opts.kprobe_multi.syms = syms; + opts.kprobe_multi.cnt = ARRAY_SIZE(syms); + test_link_api(&opts); +} + +static void +test_attach_api(const char *pattern, struct bpf_kprobe_multi_opts *opts) +{ + struct bpf_link *link1 = NULL, *link2 = NULL; + struct kprobe_multi *skel = NULL; + + skel = kprobe_multi__open_and_load(); + if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load")) + goto cleanup; + + skel->bss->pid = getpid(); + link1 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe, + pattern, opts); + if (!ASSERT_OK_PTR(link1, "bpf_program__attach_kprobe_multi_opts")) + goto cleanup; + + if (opts) { + opts->retprobe = true; + link2 = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kretprobe, + pattern, opts); + if (!ASSERT_OK_PTR(link2, "bpf_program__attach_kprobe_multi_opts")) + goto cleanup; + } + + kprobe_multi_test_run(skel, !!opts); + +cleanup: + bpf_link__destroy(link2); + bpf_link__destroy(link1); + kprobe_multi__destroy(skel); +} + +static void test_attach_api_pattern(void) +{ + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); + + test_attach_api("bpf_fentry_test*", &opts); + test_attach_api("bpf_fentry_test?", NULL); +} + +static void test_attach_api_addrs(void) +{ + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); + unsigned long long addrs[8]; + + GET_ADDR("bpf_fentry_test1", addrs[0]); + GET_ADDR("bpf_fentry_test2", addrs[1]); + GET_ADDR("bpf_fentry_test3", addrs[2]); + GET_ADDR("bpf_fentry_test4", addrs[3]); + GET_ADDR("bpf_fentry_test5", addrs[4]); + GET_ADDR("bpf_fentry_test6", addrs[5]); + GET_ADDR("bpf_fentry_test7", addrs[6]); + GET_ADDR("bpf_fentry_test8", addrs[7]); + + opts.addrs = (const unsigned long *) addrs; + opts.cnt = ARRAY_SIZE(addrs); + test_attach_api(NULL, &opts); +} + +static void test_attach_api_syms(void) +{ + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); + const char *syms[8] = { + "bpf_fentry_test1", + "bpf_fentry_test2", + "bpf_fentry_test3", + "bpf_fentry_test4", + "bpf_fentry_test5", + "bpf_fentry_test6", + "bpf_fentry_test7", + "bpf_fentry_test8", + }; + + opts.syms = syms; + opts.cnt = ARRAY_SIZE(syms); + test_attach_api(NULL, &opts); +} + +static void test_attach_api_fails(void) +{ + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); + struct kprobe_multi *skel = NULL; + struct bpf_link *link = NULL; + unsigned long long addrs[2]; + const char *syms[2] = { + "bpf_fentry_test1", + "bpf_fentry_test2", + }; + __u64 cookies[2]; + + addrs[0] = ksym_get_addr("bpf_fentry_test1"); + addrs[1] = ksym_get_addr("bpf_fentry_test2"); + + if (!ASSERT_FALSE(!addrs[0] || !addrs[1], "ksym_get_addr")) + goto cleanup; + + skel = kprobe_multi__open_and_load(); + if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load")) + goto cleanup; + + skel->bss->pid = getpid(); + + /* fail_1 - pattern and opts NULL */ + link = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe, + NULL, NULL); + if (!ASSERT_ERR_PTR(link, "fail_1")) + goto cleanup; + + if (!ASSERT_EQ(libbpf_get_error(link), -EINVAL, "fail_1_error")) + goto cleanup; + + /* fail_2 - both addrs and syms set */ + opts.addrs = (const unsigned long *) addrs; + opts.syms = syms; + opts.cnt = ARRAY_SIZE(syms); + opts.cookies = NULL; + + link = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe, + NULL, &opts); + if (!ASSERT_ERR_PTR(link, "fail_2")) + goto cleanup; + + if (!ASSERT_EQ(libbpf_get_error(link), -EINVAL, "fail_2_error")) + goto cleanup; + + /* fail_3 - pattern and addrs set */ + opts.addrs = (const unsigned long *) addrs; + opts.syms = NULL; + opts.cnt = ARRAY_SIZE(syms); + opts.cookies = NULL; + + link = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe, + "ksys_*", &opts); + if (!ASSERT_ERR_PTR(link, "fail_3")) + goto cleanup; + + if (!ASSERT_EQ(libbpf_get_error(link), -EINVAL, "fail_3_error")) + goto cleanup; + + /* fail_4 - pattern and cnt set */ + opts.addrs = NULL; + opts.syms = NULL; + opts.cnt = ARRAY_SIZE(syms); + opts.cookies = NULL; + + link = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe, + "ksys_*", &opts); + if (!ASSERT_ERR_PTR(link, "fail_4")) + goto cleanup; + + if (!ASSERT_EQ(libbpf_get_error(link), -EINVAL, "fail_4_error")) + goto cleanup; + + /* fail_5 - pattern and cookies */ + opts.addrs = NULL; + opts.syms = NULL; + opts.cnt = 0; + opts.cookies = cookies; + + link = bpf_program__attach_kprobe_multi_opts(skel->progs.test_kprobe, + "ksys_*", &opts); + if (!ASSERT_ERR_PTR(link, "fail_5")) + goto cleanup; + + if (!ASSERT_EQ(libbpf_get_error(link), -EINVAL, "fail_5_error")) + goto cleanup; + +cleanup: + bpf_link__destroy(link); + kprobe_multi__destroy(skel); +} + +void test_kprobe_multi_test(void) +{ + if (!ASSERT_OK(load_kallsyms(), "load_kallsyms")) + return; + + if (test__start_subtest("skel_api")) + test_skel_api(); + if (test__start_subtest("link_api_addrs")) + test_link_api_syms(); + if (test__start_subtest("link_api_syms")) + test_link_api_addrs(); + if (test__start_subtest("attach_api_pattern")) + test_attach_api_pattern(); + if (test__start_subtest("attach_api_addrs")) + test_attach_api_addrs(); + if (test__start_subtest("attach_api_syms")) + test_attach_api_syms(); + if (test__start_subtest("attach_api_fails")) + test_attach_api_fails(); +} diff --git a/tools/testing/selftests/bpf/prog_tests/obj_name.c b/tools/testing/selftests/bpf/prog_tests/obj_name.c index 6194b776a28b..7093edca6e08 100644 --- a/tools/testing/selftests/bpf/prog_tests/obj_name.c +++ b/tools/testing/selftests/bpf/prog_tests/obj_name.c @@ -20,7 +20,7 @@ void test_obj_name(void) __u32 duration = 0; int i; - for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { + for (i = 0; i < ARRAY_SIZE(tests); i++) { size_t name_len = strlen(tests[i].name) + 1; union bpf_attr attr; size_t ncopy; diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c index 12c4f45cee1a..bc24f83339d6 100644 --- a/tools/testing/selftests/bpf/prog_tests/perf_branches.c +++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c @@ -110,7 +110,7 @@ static void test_perf_branches_hw(void) attr.type = PERF_TYPE_HARDWARE; attr.config = PERF_COUNT_HW_CPU_CYCLES; attr.freq = 1; - attr.sample_freq = 4000; + attr.sample_freq = 1000; attr.sample_type = PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY; pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC); @@ -151,7 +151,7 @@ static void test_perf_branches_no_hw(void) attr.type = PERF_TYPE_SOFTWARE; attr.config = PERF_COUNT_SW_CPU_CLOCK; attr.freq = 1; - attr.sample_freq = 4000; + attr.sample_freq = 1000; pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC); if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd)) return; diff --git a/tools/testing/selftests/bpf/prog_tests/perf_link.c b/tools/testing/selftests/bpf/prog_tests/perf_link.c index ede07344f264..224eba6fef2e 100644 --- a/tools/testing/selftests/bpf/prog_tests/perf_link.c +++ b/tools/testing/selftests/bpf/prog_tests/perf_link.c @@ -39,7 +39,7 @@ void serial_test_perf_link(void) attr.type = PERF_TYPE_SOFTWARE; attr.config = PERF_COUNT_SW_CPU_CLOCK; attr.freq = 1; - attr.sample_freq = 4000; + attr.sample_freq = 1000; pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC); if (!ASSERT_GE(pfd, 0, "perf_fd")) goto cleanup; diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c index 776916b61c40..d71226e34c34 100644 --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c @@ -4,11 +4,11 @@ #include <sys/resource.h> #include "test_send_signal_kern.skel.h" -int sigusr1_received = 0; +static int sigusr1_received; static void sigusr1_handler(int signum) { - sigusr1_received++; + sigusr1_received = 1; } static void test_send_signal_common(struct perf_event_attr *attr, @@ -40,9 +40,10 @@ static void test_send_signal_common(struct perf_event_attr *attr, if (pid == 0) { int old_prio; + volatile int j = 0; /* install signal handler and notify parent */ - signal(SIGUSR1, sigusr1_handler); + ASSERT_NEQ(signal(SIGUSR1, sigusr1_handler), SIG_ERR, "signal"); close(pipe_c2p[0]); /* close read */ close(pipe_p2c[1]); /* close write */ @@ -63,9 +64,11 @@ static void test_send_signal_common(struct perf_event_attr *attr, ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); /* wait a little for signal handler */ - sleep(1); + for (int i = 0; i < 100000000 && !sigusr1_received; i++) + j /= i + j + 1; buf[0] = sigusr1_received ? '2' : '0'; + ASSERT_EQ(sigusr1_received, 1, "sigusr1_received"); ASSERT_EQ(write(pipe_c2p[1], buf, 1), 1, "pipe_write"); /* wait for parent notification and exit */ @@ -93,7 +96,7 @@ static void test_send_signal_common(struct perf_event_attr *attr, goto destroy_skel; } } else { - pmu_fd = syscall(__NR_perf_event_open, attr, pid, -1, + pmu_fd = syscall(__NR_perf_event_open, attr, pid, -1 /* cpu */, -1 /* group id */, 0 /* flags */); if (!ASSERT_GE(pmu_fd, 0, "perf_event_open")) { err = -1; @@ -110,9 +113,9 @@ static void test_send_signal_common(struct perf_event_attr *attr, ASSERT_EQ(read(pipe_c2p[0], buf, 1), 1, "pipe_read"); /* trigger the bpf send_signal */ - skel->bss->pid = pid; - skel->bss->sig = SIGUSR1; skel->bss->signal_thread = signal_thread; + skel->bss->sig = SIGUSR1; + skel->bss->pid = pid; /* notify child that bpf program can send_signal now */ ASSERT_EQ(write(pipe_p2c[1], buf, 1), 1, "pipe_write"); diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c new file mode 100644 index 000000000000..1932b1e0685c --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <test_progs.h> +#include "stacktrace_map_skip.skel.h" + +#define TEST_STACK_DEPTH 2 + +void test_stacktrace_map_skip(void) +{ + struct stacktrace_map_skip *skel; + int stackid_hmap_fd, stackmap_fd, stack_amap_fd; + int err, stack_trace_len; + + skel = stacktrace_map_skip__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) + return; + + /* find map fds */ + stackid_hmap_fd = bpf_map__fd(skel->maps.stackid_hmap); + if (!ASSERT_GE(stackid_hmap_fd, 0, "stackid_hmap fd")) + goto out; + + stackmap_fd = bpf_map__fd(skel->maps.stackmap); + if (!ASSERT_GE(stackmap_fd, 0, "stackmap fd")) + goto out; + + stack_amap_fd = bpf_map__fd(skel->maps.stack_amap); + if (!ASSERT_GE(stack_amap_fd, 0, "stack_amap fd")) + goto out; + + skel->bss->pid = getpid(); + + err = stacktrace_map_skip__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto out; + + /* give some time for bpf program run */ + sleep(1); + + /* disable stack trace collection */ + skel->bss->control = 1; + + /* for every element in stackid_hmap, we can find a corresponding one + * in stackmap, and vise versa. + */ + err = compare_map_keys(stackid_hmap_fd, stackmap_fd); + if (!ASSERT_OK(err, "compare_map_keys stackid_hmap vs. stackmap")) + goto out; + + err = compare_map_keys(stackmap_fd, stackid_hmap_fd); + if (!ASSERT_OK(err, "compare_map_keys stackmap vs. stackid_hmap")) + goto out; + + stack_trace_len = TEST_STACK_DEPTH * sizeof(__u64); + err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len); + if (!ASSERT_OK(err, "compare_stack_ips stackmap vs. stack_amap")) + goto out; + + if (!ASSERT_EQ(skel->bss->failed, 0, "skip_failed")) + goto out; + +out: + stacktrace_map_skip__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/prog_tests/subprogs.c b/tools/testing/selftests/bpf/prog_tests/subprogs.c index 3f3d2ac4dd57..903f35a9e62e 100644 --- a/tools/testing/selftests/bpf/prog_tests/subprogs.c +++ b/tools/testing/selftests/bpf/prog_tests/subprogs.c @@ -1,32 +1,83 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2020 Facebook */ #include <test_progs.h> -#include <time.h> #include "test_subprogs.skel.h" #include "test_subprogs_unused.skel.h" -static int duration; +struct toggler_ctx { + int fd; + bool stop; +}; -void test_subprogs(void) +static void *toggle_jit_harden(void *arg) +{ + struct toggler_ctx *ctx = arg; + char two = '2'; + char zero = '0'; + + while (!ctx->stop) { + lseek(ctx->fd, SEEK_SET, 0); + write(ctx->fd, &two, sizeof(two)); + lseek(ctx->fd, SEEK_SET, 0); + write(ctx->fd, &zero, sizeof(zero)); + } + + return NULL; +} + +static void test_subprogs_with_jit_harden_toggling(void) +{ + struct toggler_ctx ctx; + pthread_t toggler; + int err; + unsigned int i, loop = 10; + + ctx.fd = open("/proc/sys/net/core/bpf_jit_harden", O_RDWR); + if (!ASSERT_GE(ctx.fd, 0, "open bpf_jit_harden")) + return; + + ctx.stop = false; + err = pthread_create(&toggler, NULL, toggle_jit_harden, &ctx); + if (!ASSERT_OK(err, "new toggler")) + goto out; + + /* Make toggler thread to run */ + usleep(1); + + for (i = 0; i < loop; i++) { + struct test_subprogs *skel = test_subprogs__open_and_load(); + + if (!ASSERT_OK_PTR(skel, "skel open")) + break; + test_subprogs__destroy(skel); + } + + ctx.stop = true; + pthread_join(toggler, NULL); +out: + close(ctx.fd); +} + +static void test_subprogs_alone(void) { struct test_subprogs *skel; struct test_subprogs_unused *skel2; int err; skel = test_subprogs__open_and_load(); - if (CHECK(!skel, "skel_open", "failed to open skeleton\n")) + if (!ASSERT_OK_PTR(skel, "skel_open")) return; err = test_subprogs__attach(skel); - if (CHECK(err, "skel_attach", "failed to attach skeleton: %d\n", err)) + if (!ASSERT_OK(err, "skel attach")) goto cleanup; usleep(1); - CHECK(skel->bss->res1 != 12, "res1", "got %d, exp %d\n", skel->bss->res1, 12); - CHECK(skel->bss->res2 != 17, "res2", "got %d, exp %d\n", skel->bss->res2, 17); - CHECK(skel->bss->res3 != 19, "res3", "got %d, exp %d\n", skel->bss->res3, 19); - CHECK(skel->bss->res4 != 36, "res4", "got %d, exp %d\n", skel->bss->res4, 36); + ASSERT_EQ(skel->bss->res1, 12, "res1"); + ASSERT_EQ(skel->bss->res2, 17, "res2"); + ASSERT_EQ(skel->bss->res3, 19, "res3"); + ASSERT_EQ(skel->bss->res4, 36, "res4"); skel2 = test_subprogs_unused__open_and_load(); ASSERT_OK_PTR(skel2, "unused_progs_skel"); @@ -35,3 +86,11 @@ void test_subprogs(void) cleanup: test_subprogs__destroy(skel); } + +void test_subprogs(void) +{ + if (test__start_subtest("subprogs_alone")) + test_subprogs_alone(); + if (test__start_subtest("subprogs_and_jit_harden")) + test_subprogs_with_jit_harden_toggling(); +} diff --git a/tools/testing/selftests/bpf/prog_tests/subskeleton.c b/tools/testing/selftests/bpf/prog_tests/subskeleton.c new file mode 100644 index 000000000000..9c31b7004f9c --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/subskeleton.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#include <test_progs.h> +#include "test_subskeleton.skel.h" +#include "test_subskeleton_lib.subskel.h" + +static void subskeleton_lib_setup(struct bpf_object *obj) +{ + struct test_subskeleton_lib *lib = test_subskeleton_lib__open(obj); + + if (!ASSERT_OK_PTR(lib, "open subskeleton")) + return; + + *lib->rodata.var1 = 1; + *lib->data.var2 = 2; + lib->bss.var3->var3_1 = 3; + lib->bss.var3->var3_2 = 4; + + test_subskeleton_lib__destroy(lib); +} + +static int subskeleton_lib_subresult(struct bpf_object *obj) +{ + struct test_subskeleton_lib *lib = test_subskeleton_lib__open(obj); + int result; + + if (!ASSERT_OK_PTR(lib, "open subskeleton")) + return -EINVAL; + + result = *lib->bss.libout1; + ASSERT_EQ(result, 1 + 2 + 3 + 4 + 5 + 6, "lib subresult"); + + ASSERT_OK_PTR(lib->progs.lib_perf_handler, "lib_perf_handler"); + ASSERT_STREQ(bpf_program__name(lib->progs.lib_perf_handler), + "lib_perf_handler", "program name"); + + ASSERT_OK_PTR(lib->maps.map1, "map1"); + ASSERT_STREQ(bpf_map__name(lib->maps.map1), "map1", "map name"); + + ASSERT_EQ(*lib->data.var5, 5, "__weak var5"); + ASSERT_EQ(*lib->data.var6, 6, "extern var6"); + ASSERT_TRUE(*lib->kconfig.CONFIG_BPF_SYSCALL, "CONFIG_BPF_SYSCALL"); + + test_subskeleton_lib__destroy(lib); + return result; +} + +void test_subskeleton(void) +{ + int err, result; + struct test_subskeleton *skel; + + skel = test_subskeleton__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + skel->rodata->rovar1 = 10; + skel->rodata->var1 = 1; + subskeleton_lib_setup(skel->obj); + + err = test_subskeleton__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + err = test_subskeleton__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto cleanup; + + /* trigger tracepoint */ + usleep(1); + + result = subskeleton_lib_subresult(skel->obj) * 10; + ASSERT_EQ(skel->bss->out1, result, "unexpected calculation"); + +cleanup: + test_subskeleton__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c index 2b255e28ed26..7ad66a247c02 100644 --- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c @@ -10,8 +10,6 @@ * to drop unexpected traffic. */ -#define _GNU_SOURCE - #include <arpa/inet.h> #include <linux/if.h> #include <linux/if_tun.h> @@ -19,10 +17,8 @@ #include <linux/sysctl.h> #include <linux/time_types.h> #include <linux/net_tstamp.h> -#include <sched.h> #include <stdbool.h> #include <stdio.h> -#include <sys/mount.h> #include <sys/stat.h> #include <unistd.h> @@ -92,91 +88,6 @@ static int write_file(const char *path, const char *newval) return 0; } -struct nstoken { - int orig_netns_fd; -}; - -static int setns_by_fd(int nsfd) -{ - int err; - - err = setns(nsfd, CLONE_NEWNET); - close(nsfd); - - if (!ASSERT_OK(err, "setns")) - return err; - - /* Switch /sys to the new namespace so that e.g. /sys/class/net - * reflects the devices in the new namespace. - */ - err = unshare(CLONE_NEWNS); - if (!ASSERT_OK(err, "unshare")) - return err; - - /* Make our /sys mount private, so the following umount won't - * trigger the global umount in case it's shared. - */ - err = mount("none", "/sys", NULL, MS_PRIVATE, NULL); - if (!ASSERT_OK(err, "remount private /sys")) - return err; - - err = umount2("/sys", MNT_DETACH); - if (!ASSERT_OK(err, "umount2 /sys")) - return err; - - err = mount("sysfs", "/sys", "sysfs", 0, NULL); - if (!ASSERT_OK(err, "mount /sys")) - return err; - - err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL); - if (!ASSERT_OK(err, "mount /sys/fs/bpf")) - return err; - - return 0; -} - -/** - * open_netns() - Switch to specified network namespace by name. - * - * Returns token with which to restore the original namespace - * using close_netns(). - */ -static struct nstoken *open_netns(const char *name) -{ - int nsfd; - char nspath[PATH_MAX]; - int err; - struct nstoken *token; - - token = calloc(1, sizeof(struct nstoken)); - if (!ASSERT_OK_PTR(token, "malloc token")) - return NULL; - - token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); - if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net")) - goto fail; - - snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); - nsfd = open(nspath, O_RDONLY | O_CLOEXEC); - if (!ASSERT_GE(nsfd, 0, "open netns fd")) - goto fail; - - err = setns_by_fd(nsfd); - if (!ASSERT_OK(err, "setns_by_fd")) - goto fail; - - return token; -fail: - free(token); - return NULL; -} - -static void close_netns(struct nstoken *token) -{ - ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd"); - free(token); -} - static int netns_setup_namespaces(const char *verb) { const char * const *ns = namespaces; diff --git a/tools/testing/selftests/bpf/prog_tests/test_ima.c b/tools/testing/selftests/bpf/prog_tests/test_ima.c index 97d8a6f84f4a..b13feceb38f1 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_ima.c +++ b/tools/testing/selftests/bpf/prog_tests/test_ima.c @@ -13,14 +13,17 @@ #include "ima.skel.h" -static int run_measured_process(const char *measured_dir, u32 *monitored_pid) +#define MAX_SAMPLES 4 + +static int _run_measured_process(const char *measured_dir, u32 *monitored_pid, + const char *cmd) { int child_pid, child_status; child_pid = fork(); if (child_pid == 0) { *monitored_pid = getpid(); - execlp("./ima_setup.sh", "./ima_setup.sh", "run", measured_dir, + execlp("./ima_setup.sh", "./ima_setup.sh", cmd, measured_dir, NULL); exit(errno); @@ -32,19 +35,39 @@ static int run_measured_process(const char *measured_dir, u32 *monitored_pid) return -EINVAL; } -static u64 ima_hash_from_bpf; +static int run_measured_process(const char *measured_dir, u32 *monitored_pid) +{ + return _run_measured_process(measured_dir, monitored_pid, "run"); +} + +static u64 ima_hash_from_bpf[MAX_SAMPLES]; +static int ima_hash_from_bpf_idx; static int process_sample(void *ctx, void *data, size_t len) { - ima_hash_from_bpf = *((u64 *)data); + if (ima_hash_from_bpf_idx >= MAX_SAMPLES) + return -ENOSPC; + + ima_hash_from_bpf[ima_hash_from_bpf_idx++] = *((u64 *)data); return 0; } +static void test_init(struct ima__bss *bss) +{ + ima_hash_from_bpf_idx = 0; + + bss->use_ima_file_hash = false; + bss->enable_bprm_creds_for_exec = false; + bss->enable_kernel_read_file = false; + bss->test_deny = false; +} + void test_test_ima(void) { char measured_dir_template[] = "/tmp/ima_measuredXXXXXX"; struct ring_buffer *ringbuf = NULL; const char *measured_dir; + u64 bin_true_sample; char cmd[256]; int err, duration = 0; @@ -72,13 +95,127 @@ void test_test_ima(void) if (CHECK(err, "failed to run command", "%s, errno = %d\n", cmd, errno)) goto close_clean; + /* + * Test #1 + * - Goal: obtain a sample with the bpf_ima_inode_hash() helper + * - Expected result: 1 sample (/bin/true) + */ + test_init(skel->bss); err = run_measured_process(measured_dir, &skel->bss->monitored_pid); - if (CHECK(err, "run_measured_process", "err = %d\n", err)) + if (CHECK(err, "run_measured_process #1", "err = %d\n", err)) goto close_clean; err = ring_buffer__consume(ringbuf); ASSERT_EQ(err, 1, "num_samples_or_err"); - ASSERT_NEQ(ima_hash_from_bpf, 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[0], 0, "ima_hash"); + + /* + * Test #2 + * - Goal: obtain samples with the bpf_ima_file_hash() helper + * - Expected result: 2 samples (./ima_setup.sh, /bin/true) + */ + test_init(skel->bss); + skel->bss->use_ima_file_hash = true; + err = run_measured_process(measured_dir, &skel->bss->monitored_pid); + if (CHECK(err, "run_measured_process #2", "err = %d\n", err)) + goto close_clean; + + err = ring_buffer__consume(ringbuf); + ASSERT_EQ(err, 2, "num_samples_or_err"); + ASSERT_NEQ(ima_hash_from_bpf[0], 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[1], 0, "ima_hash"); + bin_true_sample = ima_hash_from_bpf[1]; + + /* + * Test #3 + * - Goal: confirm that bpf_ima_inode_hash() returns a non-fresh digest + * - Expected result: 2 samples (/bin/true: non-fresh, fresh) + */ + test_init(skel->bss); + + err = _run_measured_process(measured_dir, &skel->bss->monitored_pid, + "modify-bin"); + if (CHECK(err, "modify-bin #3", "err = %d\n", err)) + goto close_clean; + + skel->bss->enable_bprm_creds_for_exec = true; + err = run_measured_process(measured_dir, &skel->bss->monitored_pid); + if (CHECK(err, "run_measured_process #3", "err = %d\n", err)) + goto close_clean; + + err = ring_buffer__consume(ringbuf); + ASSERT_EQ(err, 2, "num_samples_or_err"); + ASSERT_NEQ(ima_hash_from_bpf[0], 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[1], 0, "ima_hash"); + ASSERT_EQ(ima_hash_from_bpf[0], bin_true_sample, "sample_equal_or_err"); + /* IMA refreshed the digest. */ + ASSERT_NEQ(ima_hash_from_bpf[1], bin_true_sample, + "sample_different_or_err"); + + /* + * Test #4 + * - Goal: verify that bpf_ima_file_hash() returns a fresh digest + * - Expected result: 4 samples (./ima_setup.sh: fresh, fresh; + * /bin/true: fresh, fresh) + */ + test_init(skel->bss); + skel->bss->use_ima_file_hash = true; + skel->bss->enable_bprm_creds_for_exec = true; + err = run_measured_process(measured_dir, &skel->bss->monitored_pid); + if (CHECK(err, "run_measured_process #4", "err = %d\n", err)) + goto close_clean; + + err = ring_buffer__consume(ringbuf); + ASSERT_EQ(err, 4, "num_samples_or_err"); + ASSERT_NEQ(ima_hash_from_bpf[0], 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[1], 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[2], 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[3], 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[2], bin_true_sample, + "sample_different_or_err"); + ASSERT_EQ(ima_hash_from_bpf[3], ima_hash_from_bpf[2], + "sample_equal_or_err"); + + skel->bss->use_ima_file_hash = false; + skel->bss->enable_bprm_creds_for_exec = false; + err = _run_measured_process(measured_dir, &skel->bss->monitored_pid, + "restore-bin"); + if (CHECK(err, "restore-bin #3", "err = %d\n", err)) + goto close_clean; + + /* + * Test #5 + * - Goal: obtain a sample from the kernel_read_file hook + * - Expected result: 2 samples (./ima_setup.sh, policy_test) + */ + test_init(skel->bss); + skel->bss->use_ima_file_hash = true; + skel->bss->enable_kernel_read_file = true; + err = _run_measured_process(measured_dir, &skel->bss->monitored_pid, + "load-policy"); + if (CHECK(err, "run_measured_process #5", "err = %d\n", err)) + goto close_clean; + + err = ring_buffer__consume(ringbuf); + ASSERT_EQ(err, 2, "num_samples_or_err"); + ASSERT_NEQ(ima_hash_from_bpf[0], 0, "ima_hash"); + ASSERT_NEQ(ima_hash_from_bpf[1], 0, "ima_hash"); + + /* + * Test #6 + * - Goal: ensure that the kernel_read_file hook denies an operation + * - Expected result: 0 samples + */ + test_init(skel->bss); + skel->bss->enable_kernel_read_file = true; + skel->bss->test_deny = true; + err = _run_measured_process(measured_dir, &skel->bss->monitored_pid, + "load-policy"); + if (CHECK(!err, "run_measured_process #6", "err = %d\n", err)) + goto close_clean; + + err = ring_buffer__consume(ringbuf); + ASSERT_EQ(err, 0, "num_samples_or_err"); close_clean: snprintf(cmd, sizeof(cmd), "./ima_setup.sh cleanup %s", measured_dir); diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c new file mode 100644 index 000000000000..a50971c6cf4a --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c @@ -0,0 +1,201 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <test_progs.h> +#include <network_helpers.h> +#include <net/if.h> +#include <linux/if_ether.h> +#include <linux/if_packet.h> +#include <linux/ipv6.h> +#include <linux/in6.h> +#include <linux/udp.h> +#include <bpf/bpf_endian.h> +#include "test_xdp_do_redirect.skel.h" + +#define SYS(fmt, ...) \ + ({ \ + char cmd[1024]; \ + snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ + if (!ASSERT_OK(system(cmd), cmd)) \ + goto out; \ + }) + +struct udp_packet { + struct ethhdr eth; + struct ipv6hdr iph; + struct udphdr udp; + __u8 payload[64 - sizeof(struct udphdr) + - sizeof(struct ethhdr) - sizeof(struct ipv6hdr)]; +} __packed; + +static struct udp_packet pkt_udp = { + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6), + .eth.h_dest = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55}, + .eth.h_source = {0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb}, + .iph.version = 6, + .iph.nexthdr = IPPROTO_UDP, + .iph.payload_len = bpf_htons(sizeof(struct udp_packet) + - offsetof(struct udp_packet, udp)), + .iph.hop_limit = 2, + .iph.saddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(1)}, + .iph.daddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(2)}, + .udp.source = bpf_htons(1), + .udp.dest = bpf_htons(1), + .udp.len = bpf_htons(sizeof(struct udp_packet) + - offsetof(struct udp_packet, udp)), + .payload = {0x42}, /* receiver XDP program matches on this */ +}; + +static int attach_tc_prog(struct bpf_tc_hook *hook, int fd) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd); + int ret; + + ret = bpf_tc_hook_create(hook); + if (!ASSERT_OK(ret, "create tc hook")) + return ret; + + ret = bpf_tc_attach(hook, &opts); + if (!ASSERT_OK(ret, "bpf_tc_attach")) { + bpf_tc_hook_destroy(hook); + return ret; + } + + return 0; +} + +/* The maximum permissible size is: PAGE_SIZE - sizeof(struct xdp_page_head) - + * sizeof(struct skb_shared_info) - XDP_PACKET_HEADROOM = 3368 bytes + */ +#define MAX_PKT_SIZE 3368 +static void test_max_pkt_size(int fd) +{ + char data[MAX_PKT_SIZE + 1] = {}; + int err; + DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &data, + .data_size_in = MAX_PKT_SIZE, + .flags = BPF_F_TEST_XDP_LIVE_FRAMES, + .repeat = 1, + ); + err = bpf_prog_test_run_opts(fd, &opts); + ASSERT_OK(err, "prog_run_max_size"); + + opts.data_size_in += 1; + err = bpf_prog_test_run_opts(fd, &opts); + ASSERT_EQ(err, -EINVAL, "prog_run_too_big"); +} + +#define NUM_PKTS 10000 +void test_xdp_do_redirect(void) +{ + int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst; + char data[sizeof(pkt_udp) + sizeof(__u32)]; + struct test_xdp_do_redirect *skel = NULL; + struct nstoken *nstoken = NULL; + struct bpf_link *link; + + struct xdp_md ctx_in = { .data = sizeof(__u32), + .data_end = sizeof(data) }; + DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &data, + .data_size_in = sizeof(data), + .ctx_in = &ctx_in, + .ctx_size_in = sizeof(ctx_in), + .flags = BPF_F_TEST_XDP_LIVE_FRAMES, + .repeat = NUM_PKTS, + .batch_size = 64, + ); + DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook, + .attach_point = BPF_TC_INGRESS); + + memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp)); + *((__u32 *)data) = 0x42; /* metadata test value */ + + skel = test_xdp_do_redirect__open(); + if (!ASSERT_OK_PTR(skel, "skel")) + return; + + /* The XDP program we run with bpf_prog_run() will cycle through all + * three xmit (PASS/TX/REDIRECT) return codes starting from above, and + * ending up with PASS, so we should end up with two packets on the dst + * iface and NUM_PKTS-2 in the TC hook. We match the packets on the UDP + * payload. + */ + SYS("ip netns add testns"); + nstoken = open_netns("testns"); + if (!ASSERT_OK_PTR(nstoken, "setns")) + goto out; + + SYS("ip link add veth_src type veth peer name veth_dst"); + SYS("ip link set dev veth_src address 00:11:22:33:44:55"); + SYS("ip link set dev veth_dst address 66:77:88:99:aa:bb"); + SYS("ip link set dev veth_src up"); + SYS("ip link set dev veth_dst up"); + SYS("ip addr add dev veth_src fc00::1/64"); + SYS("ip addr add dev veth_dst fc00::2/64"); + SYS("ip neigh add fc00::2 dev veth_src lladdr 66:77:88:99:aa:bb"); + + /* We enable forwarding in the test namespace because that will cause + * the packets that go through the kernel stack (with XDP_PASS) to be + * forwarded back out the same interface (because of the packet dst + * combined with the interface addresses). When this happens, the + * regular forwarding path will end up going through the same + * veth_xdp_xmit() call as the XDP_REDIRECT code, which can cause a + * deadlock if it happens on the same CPU. There's a local_bh_disable() + * in the test_run code to prevent this, but an earlier version of the + * code didn't have this, so we keep the test behaviour to make sure the + * bug doesn't resurface. + */ + SYS("sysctl -qw net.ipv6.conf.all.forwarding=1"); + + ifindex_src = if_nametoindex("veth_src"); + ifindex_dst = if_nametoindex("veth_dst"); + if (!ASSERT_NEQ(ifindex_src, 0, "ifindex_src") || + !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst")) + goto out; + + memcpy(skel->rodata->expect_dst, &pkt_udp.eth.h_dest, ETH_ALEN); + skel->rodata->ifindex_out = ifindex_src; /* redirect back to the same iface */ + skel->rodata->ifindex_in = ifindex_src; + ctx_in.ingress_ifindex = ifindex_src; + tc_hook.ifindex = ifindex_src; + + if (!ASSERT_OK(test_xdp_do_redirect__load(skel), "load")) + goto out; + + link = bpf_program__attach_xdp(skel->progs.xdp_count_pkts, ifindex_dst); + if (!ASSERT_OK_PTR(link, "prog_attach")) + goto out; + skel->links.xdp_count_pkts = link; + + tc_prog_fd = bpf_program__fd(skel->progs.tc_count_pkts); + if (attach_tc_prog(&tc_hook, tc_prog_fd)) + goto out; + + xdp_prog_fd = bpf_program__fd(skel->progs.xdp_redirect); + err = bpf_prog_test_run_opts(xdp_prog_fd, &opts); + if (!ASSERT_OK(err, "prog_run")) + goto out_tc; + + /* wait for the packets to be flushed */ + kern_sync_rcu(); + + /* There will be one packet sent through XDP_REDIRECT and one through + * XDP_TX; these will show up on the XDP counting program, while the + * rest will be counted at the TC ingress hook (and the counting program + * resets the packet payload so they don't get counted twice even though + * they are re-xmited out the veth device + */ + ASSERT_EQ(skel->bss->pkts_seen_xdp, 2, "pkt_count_xdp"); + ASSERT_EQ(skel->bss->pkts_seen_zero, 2, "pkt_count_zero"); + ASSERT_EQ(skel->bss->pkts_seen_tc, NUM_PKTS - 2, "pkt_count_tc"); + + test_max_pkt_size(bpf_program__fd(skel->progs.xdp_count_pkts)); + +out_tc: + bpf_tc_hook_destroy(&tc_hook); +out: + if (nstoken) + close_netns(nstoken); + system("ip netns del testns"); + test_xdp_do_redirect__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c new file mode 100644 index 000000000000..8feddb8289cf --- /dev/null +++ b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Google */ +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +struct bpf_testmod_btf_type_tag_1 { + int a; +}; + +struct bpf_testmod_btf_type_tag_2 { + struct bpf_testmod_btf_type_tag_1 *p; +}; + +__u64 g; + +SEC("fentry/bpf_testmod_test_btf_type_tag_percpu_1") +int BPF_PROG(test_percpu1, struct bpf_testmod_btf_type_tag_1 *arg) +{ + g = arg->a; + return 0; +} + +SEC("fentry/bpf_testmod_test_btf_type_tag_percpu_2") +int BPF_PROG(test_percpu2, struct bpf_testmod_btf_type_tag_2 *arg) +{ + g = arg->p->a; + return 0; +} + +/* trace_cgroup_mkdir(struct cgroup *cgrp, const char *path) + * + * struct cgroup_rstat_cpu { + * ... + * struct cgroup *updated_children; + * ... + * }; + * + * struct cgroup { + * ... + * struct cgroup_rstat_cpu __percpu *rstat_cpu; + * ... + * }; + */ +SEC("tp_btf/cgroup_mkdir") +int BPF_PROG(test_percpu_load, struct cgroup *cgrp, const char *path) +{ + g = (__u64)cgrp->rstat_cpu->updated_children; + return 0; +} + +SEC("tp_btf/cgroup_mkdir") +int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path) +{ + struct cgroup_rstat_cpu *rstat; + __u32 cpu; + + cpu = bpf_get_smp_processor_id(); + rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat_cpu, cpu); + if (rstat) { + /* READ_ONCE */ + *(volatile int *)rstat; + } + + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/ima.c b/tools/testing/selftests/bpf/progs/ima.c index 96060ff4ffc6..e16a2c208481 100644 --- a/tools/testing/selftests/bpf/progs/ima.c +++ b/tools/testing/selftests/bpf/progs/ima.c @@ -18,8 +18,12 @@ struct { char _license[] SEC("license") = "GPL"; -SEC("lsm.s/bprm_committed_creds") -void BPF_PROG(ima, struct linux_binprm *bprm) +bool use_ima_file_hash; +bool enable_bprm_creds_for_exec; +bool enable_kernel_read_file; +bool test_deny; + +static void ima_test_common(struct file *file) { u64 ima_hash = 0; u64 *sample; @@ -28,8 +32,12 @@ void BPF_PROG(ima, struct linux_binprm *bprm) pid = bpf_get_current_pid_tgid() >> 32; if (pid == monitored_pid) { - ret = bpf_ima_inode_hash(bprm->file->f_inode, &ima_hash, - sizeof(ima_hash)); + if (!use_ima_file_hash) + ret = bpf_ima_inode_hash(file->f_inode, &ima_hash, + sizeof(ima_hash)); + else + ret = bpf_ima_file_hash(file, &ima_hash, + sizeof(ima_hash)); if (ret < 0 || ima_hash == 0) return; @@ -43,3 +51,53 @@ void BPF_PROG(ima, struct linux_binprm *bprm) return; } + +static int ima_test_deny(void) +{ + u32 pid; + + pid = bpf_get_current_pid_tgid() >> 32; + if (pid == monitored_pid && test_deny) + return -EPERM; + + return 0; +} + +SEC("lsm.s/bprm_committed_creds") +void BPF_PROG(bprm_committed_creds, struct linux_binprm *bprm) +{ + ima_test_common(bprm->file); +} + +SEC("lsm.s/bprm_creds_for_exec") +int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm) +{ + if (!enable_bprm_creds_for_exec) + return 0; + + ima_test_common(bprm->file); + return 0; +} + +SEC("lsm.s/kernel_read_file") +int BPF_PROG(kernel_read_file, struct file *file, enum kernel_read_file_id id, + bool contents) +{ + int ret; + + if (!enable_kernel_read_file) + return 0; + + if (!contents) + return 0; + + if (id != READING_POLICY) + return 0; + + ret = ima_test_deny(); + if (ret < 0) + return ret; + + ima_test_common(file); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/kprobe_multi.c b/tools/testing/selftests/bpf/progs/kprobe_multi.c new file mode 100644 index 000000000000..600be50800f8 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/kprobe_multi.c @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <stdbool.h> + +char _license[] SEC("license") = "GPL"; + +extern const void bpf_fentry_test1 __ksym; +extern const void bpf_fentry_test2 __ksym; +extern const void bpf_fentry_test3 __ksym; +extern const void bpf_fentry_test4 __ksym; +extern const void bpf_fentry_test5 __ksym; +extern const void bpf_fentry_test6 __ksym; +extern const void bpf_fentry_test7 __ksym; +extern const void bpf_fentry_test8 __ksym; + +int pid = 0; +bool test_cookie = false; + +__u64 kprobe_test1_result = 0; +__u64 kprobe_test2_result = 0; +__u64 kprobe_test3_result = 0; +__u64 kprobe_test4_result = 0; +__u64 kprobe_test5_result = 0; +__u64 kprobe_test6_result = 0; +__u64 kprobe_test7_result = 0; +__u64 kprobe_test8_result = 0; + +__u64 kretprobe_test1_result = 0; +__u64 kretprobe_test2_result = 0; +__u64 kretprobe_test3_result = 0; +__u64 kretprobe_test4_result = 0; +__u64 kretprobe_test5_result = 0; +__u64 kretprobe_test6_result = 0; +__u64 kretprobe_test7_result = 0; +__u64 kretprobe_test8_result = 0; + +extern bool CONFIG_X86_KERNEL_IBT __kconfig __weak; + +static void kprobe_multi_check(void *ctx, bool is_return) +{ + if (bpf_get_current_pid_tgid() >> 32 != pid) + return; + + __u64 cookie = test_cookie ? bpf_get_attach_cookie(ctx) : 0; + __u64 addr = bpf_get_func_ip(ctx) - (CONFIG_X86_KERNEL_IBT ? 4 : 0); + +#define SET(__var, __addr, __cookie) ({ \ + if (((const void *) addr == __addr) && \ + (!test_cookie || (cookie == __cookie))) \ + __var = 1; \ +}) + + if (is_return) { + SET(kretprobe_test1_result, &bpf_fentry_test1, 8); + SET(kretprobe_test2_result, &bpf_fentry_test2, 7); + SET(kretprobe_test3_result, &bpf_fentry_test3, 6); + SET(kretprobe_test4_result, &bpf_fentry_test4, 5); + SET(kretprobe_test5_result, &bpf_fentry_test5, 4); + SET(kretprobe_test6_result, &bpf_fentry_test6, 3); + SET(kretprobe_test7_result, &bpf_fentry_test7, 2); + SET(kretprobe_test8_result, &bpf_fentry_test8, 1); + } else { + SET(kprobe_test1_result, &bpf_fentry_test1, 1); + SET(kprobe_test2_result, &bpf_fentry_test2, 2); + SET(kprobe_test3_result, &bpf_fentry_test3, 3); + SET(kprobe_test4_result, &bpf_fentry_test4, 4); + SET(kprobe_test5_result, &bpf_fentry_test5, 5); + SET(kprobe_test6_result, &bpf_fentry_test6, 6); + SET(kprobe_test7_result, &bpf_fentry_test7, 7); + SET(kprobe_test8_result, &bpf_fentry_test8, 8); + } + +#undef SET +} + +/* + * No tests in here, just to trigger 'bpf_fentry_test*' + * through tracing test_run + */ +SEC("fentry/bpf_modify_return_test") +int BPF_PROG(trigger) +{ + return 0; +} + +SEC("kprobe.multi/bpf_fentry_tes??") +int test_kprobe(struct pt_regs *ctx) +{ + kprobe_multi_check(ctx, false); + return 0; +} + +SEC("kretprobe.multi/bpf_fentry_test*") +int test_kretprobe(struct pt_regs *ctx) +{ + kprobe_multi_check(ctx, true); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/selftests/bpf/progs/local_storage.c index 9b1f9b75d5c2..19423ed862e3 100644 --- a/tools/testing/selftests/bpf/progs/local_storage.c +++ b/tools/testing/selftests/bpf/progs/local_storage.c @@ -37,6 +37,13 @@ struct { } sk_storage_map SEC(".maps"); struct { + __uint(type, BPF_MAP_TYPE_SK_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE); + __type(key, int); + __type(value, struct local_storage); +} sk_storage_map2 SEC(".maps"); + +struct { __uint(type, BPF_MAP_TYPE_TASK_STORAGE); __uint(map_flags, BPF_F_NO_PREALLOC); __type(key, int); @@ -115,7 +122,19 @@ int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address, if (storage->value != DUMMY_STORAGE_VALUE) sk_storage_result = -1; + /* This tests that we can associate multiple elements + * with the local storage. + */ + storage = bpf_sk_storage_get(&sk_storage_map2, sock->sk, 0, + BPF_LOCAL_STORAGE_GET_F_CREATE); + if (!storage) + return 0; + err = bpf_sk_storage_delete(&sk_storage_map, sock->sk); + if (err) + return 0; + + err = bpf_sk_storage_delete(&sk_storage_map2, sock->sk); if (!err) sk_storage_result = err; diff --git a/tools/testing/selftests/bpf/progs/stacktrace_map_skip.c b/tools/testing/selftests/bpf/progs/stacktrace_map_skip.c new file mode 100644 index 000000000000..2eb297df3dd6 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/stacktrace_map_skip.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <vmlinux.h> +#include <bpf/bpf_helpers.h> + +#define TEST_STACK_DEPTH 2 +#define TEST_MAX_ENTRIES 16384 + +typedef __u64 stack_trace_t[TEST_STACK_DEPTH]; + +struct { + __uint(type, BPF_MAP_TYPE_STACK_TRACE); + __uint(max_entries, TEST_MAX_ENTRIES); + __type(key, __u32); + __type(value, stack_trace_t); +} stackmap SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, TEST_MAX_ENTRIES); + __type(key, __u32); + __type(value, __u32); +} stackid_hmap SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, TEST_MAX_ENTRIES); + __type(key, __u32); + __type(value, stack_trace_t); +} stack_amap SEC(".maps"); + +int pid = 0; +int control = 0; +int failed = 0; + +SEC("tracepoint/sched/sched_switch") +int oncpu(struct trace_event_raw_sched_switch *ctx) +{ + __u32 max_len = TEST_STACK_DEPTH * sizeof(__u64); + __u32 key = 0, val = 0; + __u64 *stack_p; + + if (pid != (bpf_get_current_pid_tgid() >> 32)) + return 0; + + if (control) + return 0; + + /* it should allow skipping whole buffer size entries */ + key = bpf_get_stackid(ctx, &stackmap, TEST_STACK_DEPTH); + if ((int)key >= 0) { + /* The size of stackmap and stack_amap should be the same */ + bpf_map_update_elem(&stackid_hmap, &key, &val, 0); + stack_p = bpf_map_lookup_elem(&stack_amap, &key); + if (stack_p) { + bpf_get_stack(ctx, stack_p, max_len, TEST_STACK_DEPTH); + /* it wrongly skipped all the entries and filled zero */ + if (stack_p[0] == 0) + failed = 1; + } + } else { + /* old kernel doesn't support skipping that many entries */ + failed = 2; + } + + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/test_custom_sec_handlers.c b/tools/testing/selftests/bpf/progs/test_custom_sec_handlers.c new file mode 100644 index 000000000000..4061f701ca50 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_custom_sec_handlers.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 Facebook */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +const volatile int my_pid; + +bool abc1_called; +bool abc2_called; +bool custom1_called; +bool custom2_called; +bool kprobe1_called; +bool xyz_called; + +SEC("abc") +int abc1(void *ctx) +{ + abc1_called = true; + return 0; +} + +SEC("abc/whatever") +int abc2(void *ctx) +{ + abc2_called = true; + return 0; +} + +SEC("custom") +int custom1(void *ctx) +{ + custom1_called = true; + return 0; +} + +SEC("custom/something") +int custom2(void *ctx) +{ + custom2_called = true; + return 0; +} + +SEC("kprobe") +int kprobe1(void *ctx) +{ + kprobe1_called = true; + return 0; +} + +SEC("xyz/blah") +int xyz(void *ctx) +{ + int whatever; + + /* use sleepable helper, custom handler should set sleepable flag */ + bpf_copy_from_user(&whatever, sizeof(whatever), NULL); + xyz_called = true; + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c index b4233d3efac2..92354cd72044 100644 --- a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c +++ b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c @@ -10,7 +10,7 @@ static __always_inline int bpf_send_signal_test(void *ctx) { int ret; - if (status != 0 || sig == 0 || pid == 0) + if (status != 0 || pid == 0) return 0; if ((bpf_get_current_pid_tgid() >> 32) == pid) { diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c b/tools/testing/selftests/bpf/progs/test_sk_lookup.c index bf5b7caefdd0..6058dcb11b36 100644 --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c @@ -413,15 +413,20 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx) /* Narrow loads from remote_port field. Expect SRC_PORT. */ if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) || - LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) || - LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3) != 0) + LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff)) return SK_DROP; if (LSW(ctx->remote_port, 0) != SRC_PORT) return SK_DROP; - /* Load from remote_port field with zero padding (backward compatibility) */ + /* + * NOTE: 4-byte load from bpf_sk_lookup at remote_port offset + * is quirky. It gets rewritten by the access converter to a + * 2-byte load for backward compatibility. Treating the load + * result as a be16 value makes the code portable across + * little- and big-endian platforms. + */ val_u32 = *(__u32 *)&ctx->remote_port; - if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16)) + if (val_u32 != SRC_PORT) return SK_DROP; /* Narrow loads from local_port field. Expect DST_PORT. */ diff --git a/tools/testing/selftests/bpf/progs/test_sock_fields.c b/tools/testing/selftests/bpf/progs/test_sock_fields.c index 246f1f001813..9f4b8f9f1181 100644 --- a/tools/testing/selftests/bpf/progs/test_sock_fields.c +++ b/tools/testing/selftests/bpf/progs/test_sock_fields.c @@ -114,7 +114,7 @@ static void tpcpy(struct bpf_tcp_sock *dst, #define RET_LOG() ({ \ linum = __LINE__; \ - bpf_map_update_elem(&linum_map, &linum_idx, &linum, BPF_NOEXIST); \ + bpf_map_update_elem(&linum_map, &linum_idx, &linum, BPF_ANY); \ return CG_OK; \ }) @@ -134,11 +134,11 @@ int egress_read_sock_fields(struct __sk_buff *skb) if (!sk) RET_LOG(); - /* Not the testing egress traffic or - * TCP_LISTEN (10) socket will be copied at the ingress side. + /* Not testing the egress traffic or the listening socket, + * which are covered by the cgroup_skb/ingress test program. */ if (sk->family != AF_INET6 || !is_loopback6(sk->src_ip6) || - sk->state == 10) + sk->state == BPF_TCP_LISTEN) return CG_OK; if (sk->src_port == bpf_ntohs(srv_sa6.sin6_port)) { @@ -232,8 +232,8 @@ int ingress_read_sock_fields(struct __sk_buff *skb) sk->src_port != bpf_ntohs(srv_sa6.sin6_port)) return CG_OK; - /* Only interested in TCP_LISTEN */ - if (sk->state != 10) + /* Only interested in the listening socket */ + if (sk->state != BPF_TCP_LISTEN) return CG_OK; /* It must be a fullsock for cgroup_skb/ingress prog */ @@ -251,10 +251,16 @@ int ingress_read_sock_fields(struct __sk_buff *skb) return CG_OK; } +/* + * NOTE: 4-byte load from bpf_sock at dst_port offset is quirky. It + * gets rewritten by the access converter to a 2-byte load for + * backward compatibility. Treating the load result as a be16 value + * makes the code portable across little- and big-endian platforms. + */ static __noinline bool sk_dst_port__load_word(struct bpf_sock *sk) { __u32 *word = (__u32 *)&sk->dst_port; - return word[0] == bpf_htonl(0xcafe0000); + return word[0] == bpf_htons(0xcafe); } static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk) @@ -281,6 +287,10 @@ int read_sk_dst_port(struct __sk_buff *skb) if (!sk) RET_LOG(); + /* Ignore everything but the SYN from the client socket */ + if (sk->state != BPF_TCP_SYN_SENT) + return CG_OK; + if (!sk_dst_port__load_word(sk)) RET_LOG(); if (!sk_dst_port__load_half(sk)) diff --git a/tools/testing/selftests/bpf/progs/test_subskeleton.c b/tools/testing/selftests/bpf/progs/test_subskeleton.c new file mode 100644 index 000000000000..006417974372 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_subskeleton.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#include <stdbool.h> +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> + +/* volatile to force a read, compiler may assume 0 otherwise */ +const volatile int rovar1; +int out1; + +/* Override weak symbol in test_subskeleton_lib */ +int var5 = 5; + +extern volatile bool CONFIG_BPF_SYSCALL __kconfig; + +extern int lib_routine(void); + +SEC("raw_tp/sys_enter") +int handler1(const void *ctx) +{ + (void) CONFIG_BPF_SYSCALL; + + out1 = lib_routine() * rovar1; + return 0; +} + +char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/test_subskeleton_lib.c b/tools/testing/selftests/bpf/progs/test_subskeleton_lib.c new file mode 100644 index 000000000000..ecfafe812c36 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_subskeleton_lib.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#include <stdbool.h> +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> + +/* volatile to force a read */ +const volatile int var1; +volatile int var2 = 1; +struct { + int var3_1; + __s64 var3_2; +} var3; +int libout1; + +extern volatile bool CONFIG_BPF_SYSCALL __kconfig; + +int var4[4]; + +__weak int var5 SEC(".data"); + +/* Fully contained within library extern-and-definition */ +extern int var6; + +int var7 SEC(".data.custom"); + +int (*fn_ptr)(void); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u32); + __type(value, __u32); + __uint(max_entries, 16); +} map1 SEC(".maps"); + +extern struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u32); + __type(value, __u32); + __uint(max_entries, 16); +} map2 SEC(".maps"); + +int lib_routine(void) +{ + __u32 key = 1, value = 2; + + (void) CONFIG_BPF_SYSCALL; + bpf_map_update_elem(&map2, &key, &value, BPF_ANY); + + libout1 = var1 + var2 + var3.var3_1 + var3.var3_2 + var5 + var6; + return libout1; +} + +SEC("perf_event") +int lib_perf_handler(struct pt_regs *ctx) +{ + return 0; +} + +char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/test_subskeleton_lib2.c b/tools/testing/selftests/bpf/progs/test_subskeleton_lib2.c new file mode 100644 index 000000000000..80238486b7ce --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_subskeleton_lib2.c @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> + +int var6 = 6; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, __u32); + __type(value, __u32); + __uint(max_entries, 16); +} map2 SEC(".maps"); + +char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/test_tc_dtime.c b/tools/testing/selftests/bpf/progs/test_tc_dtime.c index 9d9e8e17b8a0..06f300d06dbd 100644 --- a/tools/testing/selftests/bpf/progs/test_tc_dtime.c +++ b/tools/testing/selftests/bpf/progs/test_tc_dtime.c @@ -174,13 +174,13 @@ int egress_host(struct __sk_buff *skb) return TC_ACT_OK; if (skb_proto(skb_type) == IPPROTO_TCP) { - if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO && + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO && skb->tstamp) inc_dtimes(EGRESS_ENDHOST); else inc_errs(EGRESS_ENDHOST); } else { - if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_UNSPEC && + if (skb->tstamp_type == BPF_SKB_TSTAMP_UNSPEC && skb->tstamp) inc_dtimes(EGRESS_ENDHOST); else @@ -204,7 +204,7 @@ int ingress_host(struct __sk_buff *skb) if (!skb_type) return TC_ACT_OK; - if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO && + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO && skb->tstamp == EGRESS_FWDNS_MAGIC) inc_dtimes(INGRESS_ENDHOST); else @@ -226,7 +226,7 @@ int ingress_fwdns_prio100(struct __sk_buff *skb) return TC_ACT_OK; /* delivery_time is only available to the ingress - * if the tc-bpf checks the skb->delivery_time_type. + * if the tc-bpf checks the skb->tstamp_type. */ if (skb->tstamp == EGRESS_ENDHOST_MAGIC) inc_errs(INGRESS_FWDNS_P100); @@ -250,7 +250,7 @@ int egress_fwdns_prio100(struct __sk_buff *skb) return TC_ACT_OK; /* delivery_time is always available to egress even - * the tc-bpf did not use the delivery_time_type. + * the tc-bpf did not use the tstamp_type. */ if (skb->tstamp == INGRESS_FWDNS_MAGIC) inc_dtimes(EGRESS_FWDNS_P100); @@ -278,9 +278,9 @@ int ingress_fwdns_prio101(struct __sk_buff *skb) if (skb_proto(skb_type) == IPPROTO_UDP) expected_dtime = 0; - if (skb->delivery_time_type) { + if (skb->tstamp_type) { if (fwdns_clear_dtime() || - skb->delivery_time_type != BPF_SKB_DELIVERY_TIME_MONO || + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO || skb->tstamp != expected_dtime) inc_errs(INGRESS_FWDNS_P101); else @@ -290,14 +290,14 @@ int ingress_fwdns_prio101(struct __sk_buff *skb) inc_errs(INGRESS_FWDNS_P101); } - if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO) { + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) { skb->tstamp = INGRESS_FWDNS_MAGIC; } else { - if (bpf_skb_set_delivery_time(skb, INGRESS_FWDNS_MAGIC, - BPF_SKB_DELIVERY_TIME_MONO)) + if (bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC, + BPF_SKB_TSTAMP_DELIVERY_MONO)) inc_errs(SET_DTIME); - if (!bpf_skb_set_delivery_time(skb, INGRESS_FWDNS_MAGIC, - BPF_SKB_DELIVERY_TIME_UNSPEC)) + if (!bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC, + BPF_SKB_TSTAMP_UNSPEC)) inc_errs(SET_DTIME); } @@ -320,9 +320,9 @@ int egress_fwdns_prio101(struct __sk_buff *skb) /* Should have handled in prio100 */ return TC_ACT_SHOT; - if (skb->delivery_time_type) { + if (skb->tstamp_type) { if (fwdns_clear_dtime() || - skb->delivery_time_type != BPF_SKB_DELIVERY_TIME_MONO || + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO || skb->tstamp != INGRESS_FWDNS_MAGIC) inc_errs(EGRESS_FWDNS_P101); else @@ -332,14 +332,14 @@ int egress_fwdns_prio101(struct __sk_buff *skb) inc_errs(EGRESS_FWDNS_P101); } - if (skb->delivery_time_type == BPF_SKB_DELIVERY_TIME_MONO) { + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) { skb->tstamp = EGRESS_FWDNS_MAGIC; } else { - if (bpf_skb_set_delivery_time(skb, EGRESS_FWDNS_MAGIC, - BPF_SKB_DELIVERY_TIME_MONO)) + if (bpf_skb_set_tstamp(skb, EGRESS_FWDNS_MAGIC, + BPF_SKB_TSTAMP_DELIVERY_MONO)) inc_errs(SET_DTIME); - if (!bpf_skb_set_delivery_time(skb, EGRESS_FWDNS_MAGIC, - BPF_SKB_DELIVERY_TIME_UNSPEC)) + if (!bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC, + BPF_SKB_TSTAMP_UNSPEC)) inc_errs(SET_DTIME); } diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c new file mode 100644 index 000000000000..77a123071940 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <vmlinux.h> +#include <bpf/bpf_helpers.h> + +#define ETH_ALEN 6 +#define HDR_SZ (sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + sizeof(struct udphdr)) +const volatile int ifindex_out; +const volatile int ifindex_in; +const volatile __u8 expect_dst[ETH_ALEN]; +volatile int pkts_seen_xdp = 0; +volatile int pkts_seen_zero = 0; +volatile int pkts_seen_tc = 0; +volatile int retcode = XDP_REDIRECT; + +SEC("xdp") +int xdp_redirect(struct xdp_md *xdp) +{ + __u32 *metadata = (void *)(long)xdp->data_meta; + void *data_end = (void *)(long)xdp->data_end; + void *data = (void *)(long)xdp->data; + + __u8 *payload = data + HDR_SZ; + int ret = retcode; + + if (payload + 1 > data_end) + return XDP_ABORTED; + + if (xdp->ingress_ifindex != ifindex_in) + return XDP_ABORTED; + + if (metadata + 1 > data) + return XDP_ABORTED; + + if (*metadata != 0x42) + return XDP_ABORTED; + + if (*payload == 0) { + *payload = 0x42; + pkts_seen_zero++; + } + + if (bpf_xdp_adjust_meta(xdp, 4)) + return XDP_ABORTED; + + if (retcode > XDP_PASS) + retcode--; + + if (ret == XDP_REDIRECT) + return bpf_redirect(ifindex_out, 0); + + return ret; +} + +static bool check_pkt(void *data, void *data_end) +{ + struct ipv6hdr *iph = data + sizeof(struct ethhdr); + __u8 *payload = data + HDR_SZ; + + if (payload + 1 > data_end) + return false; + + if (iph->nexthdr != IPPROTO_UDP || *payload != 0x42) + return false; + + /* reset the payload so the same packet doesn't get counted twice when + * it cycles back through the kernel path and out the dst veth + */ + *payload = 0; + return true; +} + +SEC("xdp") +int xdp_count_pkts(struct xdp_md *xdp) +{ + void *data = (void *)(long)xdp->data; + void *data_end = (void *)(long)xdp->data_end; + + if (check_pkt(data, data_end)) + pkts_seen_xdp++; + + /* Return XDP_DROP to make sure the data page is recycled, like when it + * exits a physical NIC. Recycled pages will be counted in the + * pkts_seen_zero counter above. + */ + return XDP_DROP; +} + +SEC("tc") +int tc_count_pkts(struct __sk_buff *skb) +{ + void *data = (void *)(long)skb->data; + void *data_end = (void *)(long)skb->data_end; + + if (check_pkt(data, data_end)) + pkts_seen_tc++; + + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c b/tools/testing/selftests/bpf/test_cgroup_storage.c index 5b8314cd77fd..d6a1be4d8020 100644 --- a/tools/testing/selftests/bpf/test_cgroup_storage.c +++ b/tools/testing/selftests/bpf/test_cgroup_storage.c @@ -36,7 +36,7 @@ int main(int argc, char **argv) BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }; - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); + size_t insns_cnt = ARRAY_SIZE(prog); int error = EXIT_FAILURE; int map_fd, percpu_map_fd, prog_fd, cgroup_fd; struct bpf_cgroup_storage_key key; diff --git a/tools/testing/selftests/bpf/test_lirc_mode2.sh b/tools/testing/selftests/bpf/test_lirc_mode2.sh index ec4e15948e40..5252b91f48a1 100755 --- a/tools/testing/selftests/bpf/test_lirc_mode2.sh +++ b/tools/testing/selftests/bpf/test_lirc_mode2.sh @@ -3,6 +3,7 @@ # Kselftest framework requirement - SKIP code is 4. ksft_skip=4 +ret=$ksft_skip msg="skip all tests:" if [ $UID != 0 ]; then @@ -25,7 +26,7 @@ do fi done -if [ -n $LIRCDEV ]; +if [ -n "$LIRCDEV" ]; then TYPE=lirc_mode2 ./test_lirc_mode2_user $LIRCDEV $INPUTDEV @@ -36,3 +37,5 @@ then echo -e ${GREEN}"PASS: $TYPE"${NC} fi fi + +exit $ret diff --git a/tools/testing/selftests/bpf/test_lru_map.c b/tools/testing/selftests/bpf/test_lru_map.c index 6e6235185a86..563bbe18c172 100644 --- a/tools/testing/selftests/bpf/test_lru_map.c +++ b/tools/testing/selftests/bpf/test_lru_map.c @@ -878,11 +878,11 @@ int main(int argc, char **argv) assert(nr_cpus != -1); printf("nr_cpus:%d\n\n", nr_cpus); - for (f = 0; f < sizeof(map_flags) / sizeof(*map_flags); f++) { + for (f = 0; f < ARRAY_SIZE(map_flags); f++) { unsigned int tgt_free = (map_flags[f] & BPF_F_NO_COMMON_LRU) ? PERCPU_FREE_TARGET : LOCAL_FREE_TARGET; - for (t = 0; t < sizeof(map_types) / sizeof(*map_types); t++) { + for (t = 0; t < ARRAY_SIZE(map_types); t++) { test_lru_sanity0(map_types[t], map_flags[f]); test_lru_sanity1(map_types[t], map_flags[f], tgt_free); test_lru_sanity2(map_types[t], map_flags[f], tgt_free); diff --git a/tools/testing/selftests/bpf/test_lwt_ip_encap.sh b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh index b497bb85b667..6c69c42b1d60 100755 --- a/tools/testing/selftests/bpf/test_lwt_ip_encap.sh +++ b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh @@ -120,6 +120,14 @@ setup() ip netns exec ${NS2} sysctl -wq net.ipv4.conf.default.rp_filter=0 ip netns exec ${NS3} sysctl -wq net.ipv4.conf.default.rp_filter=0 + # disable IPv6 DAD because it sometimes takes too long and fails tests + ip netns exec ${NS1} sysctl -wq net.ipv6.conf.all.accept_dad=0 + ip netns exec ${NS2} sysctl -wq net.ipv6.conf.all.accept_dad=0 + ip netns exec ${NS3} sysctl -wq net.ipv6.conf.all.accept_dad=0 + ip netns exec ${NS1} sysctl -wq net.ipv6.conf.default.accept_dad=0 + ip netns exec ${NS2} sysctl -wq net.ipv6.conf.default.accept_dad=0 + ip netns exec ${NS3} sysctl -wq net.ipv6.conf.default.accept_dad=0 + ip link add veth1 type veth peer name veth2 ip link add veth3 type veth peer name veth4 ip link add veth5 type veth peer name veth6 @@ -289,7 +297,7 @@ test_ping() ip netns exec ${NS1} ping -c 1 -W 1 -I veth1 ${IPv4_DST} 2>&1 > /dev/null RET=$? elif [ "${PROTO}" == "IPv6" ] ; then - ip netns exec ${NS1} ping6 -c 1 -W 6 -I veth1 ${IPv6_DST} 2>&1 > /dev/null + ip netns exec ${NS1} ping6 -c 1 -W 1 -I veth1 ${IPv6_DST} 2>&1 > /dev/null RET=$? else echo " test_ping: unknown PROTO: ${PROTO}" diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c index f0c8d05ba6d1..f3d5d7ac6505 100644 --- a/tools/testing/selftests/bpf/test_sock_addr.c +++ b/tools/testing/selftests/bpf/test_sock_addr.c @@ -723,7 +723,7 @@ static int xmsg_ret_only_prog_load(const struct sock_addr_test *test, BPF_MOV64_IMM(BPF_REG_0, rc), BPF_EXIT_INSN(), }; - return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); + return load_insns(test, insns, ARRAY_SIZE(insns)); } static int sendmsg_allow_prog_load(const struct sock_addr_test *test) @@ -795,7 +795,7 @@ static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test) BPF_EXIT_INSN(), }; - return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); + return load_insns(test, insns, ARRAY_SIZE(insns)); } static int recvmsg4_rw_c_prog_load(const struct sock_addr_test *test) @@ -858,7 +858,7 @@ static int sendmsg6_rw_dst_asm_prog_load(const struct sock_addr_test *test, BPF_EXIT_INSN(), }; - return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); + return load_insns(test, insns, ARRAY_SIZE(insns)); } static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 1ba7e7346afb..dfb4f5c0fcb9 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -1786,7 +1786,7 @@ static int populate_progs(char *bpf_file) i++; } - for (i = 0; i < sizeof(map_fd)/sizeof(int); i++) { + for (i = 0; i < ARRAY_SIZE(map_fd); i++) { maps[i] = bpf_object__find_map_by_name(obj, map_names[i]); map_fd[i] = bpf_map__fd(maps[i]); if (map_fd[i] < 0) { @@ -1867,7 +1867,7 @@ static int __test_selftests(int cg_fd, struct sockmap_options *opt) } /* Tests basic commands and APIs */ - for (i = 0; i < sizeof(test)/sizeof(struct _test); i++) { + for (i = 0; i < ARRAY_SIZE(test); i++) { struct _test t = test[i]; if (check_whitelist(&t, opt) != 0) diff --git a/tools/testing/selftests/bpf/test_tunnel.sh b/tools/testing/selftests/bpf/test_tunnel.sh index ca1372924023..2817d9948d59 100755 --- a/tools/testing/selftests/bpf/test_tunnel.sh +++ b/tools/testing/selftests/bpf/test_tunnel.sh @@ -39,7 +39,7 @@ # from root namespace, the following operations happen: # 1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev. # 2) Tnl device's egress BPF program is triggered and set the tunnel metadata, -# with remote_ip=172.16.1.200 and others. +# with remote_ip=172.16.1.100 and others. # 3) Outer tunnel header is prepended and route the packet to veth1's egress # 4) veth0's ingress queue receive the tunneled packet at namespace at_ns0 # 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 92e3465fbae8..a2cd236c32eb 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -22,8 +22,6 @@ #include <limits.h> #include <assert.h> -#include <sys/capability.h> - #include <linux/unistd.h> #include <linux/filter.h> #include <linux/bpf_perf_event.h> @@ -42,6 +40,7 @@ # define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 1 # endif #endif +#include "cap_helpers.h" #include "bpf_rand.h" #include "bpf_util.h" #include "test_btf.h" @@ -62,6 +61,10 @@ #define F_NEEDS_EFFICIENT_UNALIGNED_ACCESS (1 << 0) #define F_LOAD_WITH_STRICT_ALIGNMENT (1 << 1) +/* need CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON to load progs */ +#define ADMIN_CAPS (1ULL << CAP_NET_ADMIN | \ + 1ULL << CAP_PERFMON | \ + 1ULL << CAP_BPF) #define UNPRIV_SYSCTL "kernel/unprivileged_bpf_disabled" static bool unpriv_disabled = false; static int skips; @@ -973,47 +976,19 @@ struct libcap { static int set_admin(bool admin) { - cap_t caps; - /* need CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON to load progs */ - const cap_value_t cap_net_admin = CAP_NET_ADMIN; - const cap_value_t cap_sys_admin = CAP_SYS_ADMIN; - struct libcap *cap; - int ret = -1; - - caps = cap_get_proc(); - if (!caps) { - perror("cap_get_proc"); - return -1; - } - cap = (struct libcap *)caps; - if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_sys_admin, CAP_CLEAR)) { - perror("cap_set_flag clear admin"); - goto out; - } - if (cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_net_admin, - admin ? CAP_SET : CAP_CLEAR)) { - perror("cap_set_flag set_or_clear net"); - goto out; - } - /* libcap is likely old and simply ignores CAP_BPF and CAP_PERFMON, - * so update effective bits manually - */ + int err; + if (admin) { - cap->data[1].effective |= 1 << (38 /* CAP_PERFMON */ - 32); - cap->data[1].effective |= 1 << (39 /* CAP_BPF */ - 32); + err = cap_enable_effective(ADMIN_CAPS, NULL); + if (err) + perror("cap_enable_effective(ADMIN_CAPS)"); } else { - cap->data[1].effective &= ~(1 << (38 - 32)); - cap->data[1].effective &= ~(1 << (39 - 32)); - } - if (cap_set_proc(caps)) { - perror("cap_set_proc"); - goto out; + err = cap_disable_effective(ADMIN_CAPS, NULL); + if (err) + perror("cap_disable_effective(ADMIN_CAPS)"); } - ret = 0; -out: - if (cap_free(caps)) - perror("cap_free"); - return ret; + + return err; } static int do_prog_test_run(int fd_prog, bool unpriv, uint32_t expected_val, @@ -1291,31 +1266,18 @@ fail_log: static bool is_admin(void) { - cap_flag_value_t net_priv = CAP_CLEAR; - bool perfmon_priv = false; - bool bpf_priv = false; - struct libcap *cap; - cap_t caps; - -#ifdef CAP_IS_SUPPORTED - if (!CAP_IS_SUPPORTED(CAP_SETFCAP)) { - perror("cap_get_flag"); - return false; - } -#endif - caps = cap_get_proc(); - if (!caps) { - perror("cap_get_proc"); + __u64 caps; + + /* The test checks for finer cap as CAP_NET_ADMIN, + * CAP_PERFMON, and CAP_BPF instead of CAP_SYS_ADMIN. + * Thus, disable CAP_SYS_ADMIN at the beginning. + */ + if (cap_disable_effective(1ULL << CAP_SYS_ADMIN, &caps)) { + perror("cap_disable_effective(CAP_SYS_ADMIN)"); return false; } - cap = (struct libcap *)caps; - bpf_priv = cap->data[1].effective & (1 << (39/* CAP_BPF */ - 32)); - perfmon_priv = cap->data[1].effective & (1 << (38/* CAP_PERFMON */ - 32)); - if (cap_get_flag(caps, CAP_NET_ADMIN, CAP_EFFECTIVE, &net_priv)) - perror("cap_get_flag NET"); - if (cap_free(caps)) - perror("cap_free"); - return bpf_priv && perfmon_priv && net_priv == CAP_SET; + + return (caps & ADMIN_CAPS) == ADMIN_CAPS; } static void get_unpriv_disabled() diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c index ca6abae9b09c..3d6217e3aff7 100644 --- a/tools/testing/selftests/bpf/trace_helpers.c +++ b/tools/testing/selftests/bpf/trace_helpers.c @@ -34,6 +34,13 @@ int load_kallsyms(void) if (!f) return -ENOENT; + /* + * This is called/used from multiplace places, + * load symbols just once. + */ + if (sym_cnt) + return 0; + while (fgets(buf, sizeof(buf), f)) { if (sscanf(buf, "%p %c %s", &addr, &symbol, func) != 3) break; diff --git a/tools/testing/selftests/bpf/verifier/bounds_deduction.c b/tools/testing/selftests/bpf/verifier/bounds_deduction.c index 91869aea6d64..3931c481e30c 100644 --- a/tools/testing/selftests/bpf/verifier/bounds_deduction.c +++ b/tools/testing/selftests/bpf/verifier/bounds_deduction.c @@ -105,7 +105,7 @@ BPF_EXIT_INSN(), }, .errstr_unpriv = "R1 has pointer with unsupported alu operation", - .errstr = "dereference of modified ctx ptr", + .errstr = "negative offset ctx ptr R1 off=-1 disallowed", .result = REJECT, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c index f890333259ad..2e03decb11b6 100644 --- a/tools/testing/selftests/bpf/verifier/calls.c +++ b/tools/testing/selftests/bpf/verifier/calls.c @@ -116,6 +116,89 @@ }, }, { + "calls: invalid kfunc call: reg->off must be zero when passed to release kfunc", + .insns = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "R1 must have zero offset when passed to release func", + .fixup_kfunc_btf_id = { + { "bpf_kfunc_call_test_acquire", 3 }, + { "bpf_kfunc_call_memb_release", 8 }, + }, +}, +{ + "calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset", + .insns = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 16), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -4), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .fixup_kfunc_btf_id = { + { "bpf_kfunc_call_test_acquire", 3 }, + { "bpf_kfunc_call_test_release", 9 }, + }, + .result_unpriv = REJECT, + .result = REJECT, + .errstr = "negative offset ptr_ ptr R1 off=-4 disallowed", +}, +{ + "calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset", + .insns = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_0, 4), + BPF_JMP_IMM(BPF_JLE, BPF_REG_2, 4, 3), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 3), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .fixup_kfunc_btf_id = { + { "bpf_kfunc_call_test_acquire", 3 }, + { "bpf_kfunc_call_test_release", 9 }, + { "bpf_kfunc_call_test_release", 13 }, + { "bpf_kfunc_call_test_release", 17 }, + }, + .result_unpriv = REJECT, + .result = REJECT, + .errstr = "variable ptr_ access var_off=(0x0; 0x7) disallowed", +}, +{ "calls: basic sanity", .insns = { BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), diff --git a/tools/testing/selftests/bpf/verifier/ctx.c b/tools/testing/selftests/bpf/verifier/ctx.c index 60f6fbe03f19..c8eaf0536c24 100644 --- a/tools/testing/selftests/bpf/verifier/ctx.c +++ b/tools/testing/selftests/bpf/verifier/ctx.c @@ -58,7 +58,7 @@ }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = REJECT, - .errstr = "dereference of modified ctx ptr", + .errstr = "negative offset ctx ptr R1 off=-612 disallowed", }, { "pass modified ctx pointer to helper, 2", @@ -71,8 +71,8 @@ }, .result_unpriv = REJECT, .result = REJECT, - .errstr_unpriv = "dereference of modified ctx ptr", - .errstr = "dereference of modified ctx ptr", + .errstr_unpriv = "negative offset ctx ptr R1 off=-612 disallowed", + .errstr = "negative offset ctx ptr R1 off=-612 disallowed", }, { "pass modified ctx pointer to helper, 3", @@ -141,7 +141,7 @@ .prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, .expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, .result = REJECT, - .errstr = "dereference of modified ctx ptr", + .errstr = "negative offset ctx ptr R1 off=-612 disallowed", }, { "pass ctx or null check, 5: null (connect)", |