| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a generic sk_msg layer, and convert current sockmap and later
kTLS over to make use of it. While sk_buff handles network packet
representation from netdevice up to socket, sk_msg handles data
representation from application to socket layer.
This means that sk_msg framework spans across ULP users in the
kernel, and enables features such as introspection or filtering
of data with the help of BPF programs that operate on this data
structure.
Latter becomes in particular useful for kTLS where data encryption
is deferred into the kernel, and as such enabling the kernel to
perform L7 introspection and policy based on BPF for TLS connections
where the record is being encrypted after BPF has run and came to
a verdict. In order to get there, first step is to transform open
coding of scatter-gather list handling into a common core framework
that subsystems can use.
The code itself has been split and refactored into three bigger
pieces: i) the generic sk_msg API which deals with managing the
scatter gather ring, providing helpers for walking and mangling,
transferring application data from user space into it, and preparing
it for BPF pre/post-processing, ii) the plain sock map itself
where sockets can be attached to or detached from; these bits
are independent of i) which can now be used also without sock
map, and iii) the integration with plain TCP as one protocol
to be used for processing L7 application data (later this could
e.g. also be extended to other protocols like UDP). The semantics
are the same with the old sock map code and therefore no change
of user facing behavior or APIs. While pursuing this work it
also helped finding a number of bugs in the old sockmap code
that we've fixed already in earlier commits. The test_sockmap
kselftest suite passes through fine as well.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to prepare sockmap logic to be used in combination with kTLS
we need to detangle it from ULP, and further split it in later commits
into a generic API.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whenever the ULP data on the socket is mangled, enforce that the
caller has the socket lock held as otherwise things may race with
initialization and cleanup callbacks from ulp ops as both would
mangle internal socket state.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dan Carpenter reports:
The patch 6acc9b432e67: "bpf: Add helper to retrieve socket in BPF"
from Oct 2, 2018, leads to the following Smatch complaint:
net/core/filter.c:4893 bpf_sk_lookup()
error: we previously assumed 'skb->dev' could be null (see line 4885)
Fix this issue by checking skb->dev before using it.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The map-in-map frequently serves as a mechanism for atomic
snapshotting of state that a BPF program might record. The current
implementation is dangerous to use in this way, however, since
userspace has no way of knowing when all programs that might have
retrieved the "old" value of the map may have completed.
This change ensures that map update operations on map-in-map map types
always wait for all references to the old map to drop before returning
to userspace.
Signed-off-by: Daniel Colascione <dancol@google.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When test_flow_dissector.sh runs it complains that it can't find script
with_addr.sh:
./test_flow_dissector.sh: line 81: ./with_addr.sh: No such file or
directory
Rework so that with_addr.sh gets installed, add it to
TEST_PROGS_EXTENDED variable.
Fixes: 50b3ed57dee9 ("selftests/bpf: test bpf flow dissection")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When test_lwt_seg6local.sh was added commit c99a84eac026
("selftests/bpf: test for seg6local End.BPF action") config fragment
wasn't added, and without CONFIG_LWTUNNEL enabled we see this:
Error: CONFIG_LWTUNNEL is not enabled in this kernel.
selftests: test_lwt_seg6local [FAILED]
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
|
| |
Adding EXTRA_LDFLAGS allowing user to specify extra flags
for LD_FLAGS variable. Also adding LDFLAGS to build command
line.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
| |
Adding EXTRA_CFLAGS allowing user to specify extra flags
for CFLAGS variable.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
|
|
|
|
|
| |
Use __u32 instead u32 in libbpf.c and also use
uapi perf_event.h instead of tools/perf/perf-sys.h.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Jesper Dangaard Brouer says:
====================
While implementing PoC building blocks for eBPF code XDP+TC that can
manipulate VLANs headers, I discovered a bug in generic-XDP.
The fix should be backported to stable kernels. Even-though
generic-XDP was introduced in v4.12, I think the bug is not exposed
until v4.14 in the mentined fixes commit.
====================
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This XDP selftest also contain a small TC-bpf component. It provoke
the generic-XDP bug fixed in previous commit.
The selftest itself shows how to do VLAN manipulation from XDP and TC.
The test demonstrate how XDP ingress can remove a VLAN tag, and how TC
egress can add back a VLAN tag.
This use-case originates from a production need by ISP (kviknet.dk),
who gets DSL-lines terminated as VLAN Q-in-Q tagged packets, and want
to avoid having an net_device for every end-customer on the box doing
the L2 to L3 termination.
The test-setup is done via a veth-pair and creating two network
namespaces (ns1 and ns2). The 'ns1' simulate the ISP network that are
loading the BPF-progs stripping and adding VLAN IDs. The 'ns2'
simulate the DSL-customer that are using VLAN tagged packets.
Running the script with --interactive, will simply not call the
cleanup function. This gives the effect of creating a testlab, that
the users can inspect and play with. The --verbose option will simply
request that the shell will print input lines as they are read, this
include comments, which in effect make the comments visible docs.
Reported-by: Yoel Caspersen <yoel@kviknet.dk>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The helper bpf_skb_vlan_push is needed by next patch, and the helper
bpf_skb_vlan_pop is added for completeness, regarding VLAN helpers.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
XDP can modify (and resize) the Ethernet header in the packet.
There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
are setup before reaching (netif_receive_)generic_xdp.
This bug was hit when XDP were popping VLAN headers (changing
eth->h_proto), as skb->protocol still contains VLAN-indication
(ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
the packet (basically popping the VLAN again).
This patch catch if XDP changed eth header in such a way, that SKB
fields needs to be updated.
V2: on request from Song Liu, use ETH_HLEN instead of mac_len,
in __skb_push() as eth_type_trans() use ETH_HLEN in paired skb_pull_inline().
Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prashant Bhole says:
====================
Currently when map a lookup fails, user space API can not make any
distinction whether given key was not found or lookup is not supported
by particular map.
In this series we modify return value of maps which do not support
lookup. Lookup on such map implementation will return -EOPNOTSUPP.
bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
errno. We also handle this error in bpftool to print appropriate
message.
Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
such that errno will set to EOPNOTSUPP when map doesn't support lookup
Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
for maps which do not support lookup
Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
moved out into new function dump_map_elem(). This was done in order to
reduce deep indentation and accomodate further changes.
Patch 4: Changes in bpftool to print strerror() message when lookup
error is occured. This will result in appropriate message like
"Operation not supported" when map doesn't support lookup.
Patch 5: test_verifier: change fixup map naming convention as
suggested by Alexei
Patch 6: Added verifier tests to check whether verifier rejects call
to bpf_map_lookup_elem from bpf program. For all map types those
do not support map lookup.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
map_lookup_elem isn't supported by certain map types like:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
Let's add verfier tests to check whether verifier prevents
bpf_map_lookup_elem call on above programs from bpf program.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently fixup map are named like fixup_map1, fixup_map2, and so on.
As suggested by Alexei let's change change map names such that we can
identify map type by looking at the name.
This patch is basically a find and replace change:
fixup_map1 -> fixup_map_hash_8b
fixup_map2 -> fixup_map_hash_48b
fixup_map3 -> fixup_map_hash_16b
fixup_map4 -> fixup_map_array_48b
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since map lookup error can be ENOENT or EOPNOTSUPP, let's print
strerror() as error message in normal and JSON output.
This patch adds helper function print_entry_error() to print
entry from lookup error occurs
Example: Following example dumps a map which does not support lookup.
Output before:
root# bpftool map -jp dump id 40
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
}
]
root# bpftool map dump id 40
can't lookup element with key:
0a 00 00 00
can't lookup element with key:
0b 00 00 00
Found 0 elements
Output after changes:
root# bpftool map dump -jp id 45
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "Operation not supported"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "Operation not supported"
}
]
root# bpftool map dump id 45
key:
0a 00 00 00
value:
Operation not supported
key:
0b 00 00 00
value:
Operation not supported
Found 0 elements
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
do_dump() function in bpftool/map.c has deep indentations. In order
to reduce deep indent, let's move element printing code out of
do_dump() into dump_map_elem() function.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The error value returned by map_lookup_elem doesn't differentiate
whether lookup was failed because of invalid key or lookup is not
supported.
Lets add handling for -EOPNOTSUPP return value of map_lookup_elem()
method of map, with expectation from map's implementation that it
should return -EOPNOTSUPP if lookup is not supported.
The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set
to EOPNOTSUPP if map lookup is not supported.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In btf_parse_hdr(), the length of the btf data header is firstly copied
from the user space to 'hdr_len' and checked to see whether it is larger
than 'btf_data_size'. If yes, an error code EINVAL is returned. Otherwise,
the whole header is copied again from the user space to 'btf->hdr'.
However, after the second copy, there is no check between
'btf->hdr->hdr_len' and 'hdr_len' to confirm that the two copies get the
same value. Given that the btf data is in the user space, a malicious user
can race to change the data between the two copies. By doing so, the user
can provide malicious data to the kernel and cause undefined behavior.
This patch adds a necessary check after the second copy, to make sure
'btf->hdr->hdr_len' has the same value as 'hdr_len'. Otherwise, an error
code EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-10-08
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.
2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c
3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet
4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski
5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.
6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf
7) various AF_XDP fixes from Björn and Magnus
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The newly added TCP and UDP handling fails to link when CONFIG_INET
is disabled:
net/core/filter.o: In function `sk_lookup':
filter.c:(.text+0x7ff8): undefined reference to `tcp_hashinfo'
filter.c:(.text+0x7ffc): undefined reference to `tcp_hashinfo'
filter.c:(.text+0x8020): undefined reference to `__inet_lookup_established'
filter.c:(.text+0x8058): undefined reference to `__inet_lookup_listener'
filter.c:(.text+0x8068): undefined reference to `udp_table'
filter.c:(.text+0x8070): undefined reference to `udp_table'
filter.c:(.text+0x808c): undefined reference to `__udp4_lib_lookup'
net/core/filter.o: In function `bpf_sk_release':
filter.c:(.text+0x82e8): undefined reference to `sock_gen_put'
Wrap the related sections of code in #ifdefs for the config option.
Furthermore, sk_lookup() should always have been marked 'static', this
also avoids a warning about a missing prototype when building with
'make W=1'.
Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Quentin Monnet says:
====================
This patch series adds support for hardware offload of programs containing
BPF-to-BPF function calls. First, a new callback is added to the kernel
verifier, to collect information after the main part of the verification
has been performed. Then support for BPF-to-BPF calls is incrementally
added to the nfp driver, before offloading programs containing such calls
is eventually allowed by lifting the restriction in the kernel verifier, in
the last patch. Please refer to individual patches for details.
Many thanks to Jiong and Jakub for their precious help and contribution on
the main patches for the JIT-compiler, and everything related to stack
accesses.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that there is at least one driver supporting BPF-to-BPF function
calls, lift the restriction, in the verifier, on hardware offload of
eBPF programs containing such calls. But prevent jit_subprogs(), still
in the verifier, from being run for offloaded programs.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Mark instructions that use pointers to areas in the stack outside of the
current stack frame, and process them accordingly in mem_op_stack().
This way, we also support BPF-to-BPF calls where the caller passes a
pointer to data in its own stack frame to the callee (typically, when
the caller passes an address to one of its local variables located in
the stack, as an argument).
Thanks to Jakub and Jiong for figuring out how to deal with this case,
I just had to turn their email discussion into this patch.
Suggested-by: Jiong Wang <jiong.wang@netronome.com>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When pre-processing the instructions, it is trivial to detect what
subprograms are using R6, R7, R8 or R9 as destination registers. If a
subprogram uses none of those, then we do not need to jump to the
subroutines dedicated to saving and restoring callee-saved registers in
its prologue and epilogue.
This patch introduces detection of callee-saved registers in subprograms
and prevents the JIT from adding calls to those subroutines whenever we
can: we save some instructions in the translated program, and some time
at runtime on BPF-to-BPF calls and returns.
If no subprogram needs to save those registers, we can avoid appending
the subroutines at the end of the program.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On performing a BPF-to-BPF call, we first jump to a subroutine that
pushes callee-saved registers (R6~R9) to the stack, and from there we
goes to the start of the callee next. In order to do so, the caller must
pass to the subroutine the address of the NFP instruction to jump to at
the end of that subroutine. This cannot be reliably implemented when
translated the caller, as we do not always know the start offset of the
callee yet.
This patch implement the required fixup step for passing the start
offset in the callee via the register used by the subroutine to hold its
return address.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Relocation for targets of BPF-to-BPF calls are required at the end of
translation. Update the nfp_fixup_branches() function in that regard.
When checking that the last instruction of each bloc is a branch, we
must account for the length of the instructions required to pop the
return address from the stack.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Offloaded programs using BPF-to-BPF calls use the stack to store the
return address when calling into a subprogram. Callees also need some
space to save eBPF registers R6 to R9. And contrarily to kernel
verifier, we align stack frames on 64 bytes (and not 32). Account for
all this when checking the stack size limit before JIT-ing the program.
This means we have to recompute maximum stack usage for the program, we
cannot get the value from the kernel.
In addition to adapting the checks on stack usage, move them to the
finalize() callback, now that we have it and because such checks are
part of the verification step rather than translation.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.
The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.
For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).
Return to the caller is performed with the RTN instruction newly added
to the JIT.
For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.
Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will
require additional processing before translation starts, in order to
record and mark jump destinations.
We also mark the instructions where each subprogram begins. This will be
used in a following commit to determine where to add prologues for
subprograms.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The checks related to eBPF helper calls are performed each time the nfp
driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks
are not relevant for BPF-to-BPF call (same instruction code, different
value in source register), so just skip the checks for such calls.
While at it, rename the function that runs those checks to make it clear
they apply to _helper_ calls only.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In order to support BPF-to-BPF calls in offloaded programs, the nfp
driver must collect information about the distinct subprograms: namely,
the number of subprograms composing the complete program and the stack
depth of those subprograms. The latter in particular is non-trivial to
collect, so we copy those elements from the kernel verifier via the
newly added post-verification hook. The struct nfp_prog is extended to
store this information. Stack depths are stored in an array of dedicated
structs.
Subprogram start indexes are not collected. Instead, meta instructions
associated to the start of a subprogram will be marked with a flag in a
later patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In preparation for BPF-to-BPF calls in offloaded programs, add a new
function attribute to the struct bpf_prog_offload_ops so that drivers
supporting eBPF offload can hook at the end of program verification, and
potentially extract information collected by the verifier.
Implement a minimal callback (returning 0) in the drivers providing the
structs, namely netdevsim and nfp.
This will be useful in the nfp driver, in later commits, to extract the
number of subprograms as well as the stack depth for those subprograms.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bpf_asm and the other classic BPF tools support jump conditions
comparing register A to register X, in addition to comparing
register A with constant K.
Only the latter was documented in filter.txt, add two new addressing
modes that describe the former.
Signed-off-by: Arthur Fabre <arthur@arthurfabre.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
libbpf is maturing as a library and gaining features that no other bpf libraries support
(BPF Type Format, bpf to bpf calls, etc)
Many Apache2 licensed projects (like bcc, bpftrace, gobpf, cilium, etc)
would like to use libbpf, but cannot do this yet, since Apache Foundation explicitly
states that LGPL is incompatible with Apache2.
Hence let's relicense libbpf as dual license LGPL-2.1 or BSD-2-Clause,
since BSD-2 is compatible with Apache2.
Dual LGPL or Apache2 is invalid combination.
Fix license mistake in Makefile as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Beckett <david.beckett@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The AF_XDP socket struct can exist in three different, implicit
states: setup, bound and released. Setup is prior the socket has been
bound to a device. Bound is when the socket is active for receive and
send. Released is when the process/userspace side of the socket is
released, but the sock object is still lingering, e.g. when there is a
reference to the socket in an XSKMAP after process termination.
The Rx fast-path code uses the "dev" member of struct xdp_sock to
check whether a socket is bound or relased, and the Tx code uses the
struct xdp_umem "xsk_list" member in conjunction with "dev" to
determine the state of a socket.
However, the transition from bound to released did not tear the socket
down in correct order.
On the Rx side "dev" was cleared after synchronize_net() making the
synchronization useless. On the Tx side, the internal queues were
destroyed prior removing them from the "xsk_list".
This commit corrects the cleanup order, and by doing so
xdp_del_sk_umem() can be simplified and one synchronize_net() can be
removed.
Fixes: 965a99098443 ("xsk: add support for bind for Rx")
Fixes: ac98d8aab61b ("xsk: wire upp Tx zero-copy functions")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Magnus Karlsson says:
====================
Previously, the xsk code did not record which umem was bound to a
specific queue id. This was not required if all drivers were zero-copy
enabled as this had to be recorded in the driver anyway. So if a user
tried to bind two umems to the same queue, the driver would say
no. But if copy-mode was first enabled and then zero-copy mode (or the
reverse order), we mistakenly enabled both of them on the same umem
leading to buggy behavior. The main culprit for this is that we did
not store the association of umem to queue id in the copy case and
only relied on the driver reporting this. As this relation was not
stored in the driver for copy mode (it does not rely on the AF_XDP
NDOs), this obviously could not work.
This patch fixes the problem by always recording the umem to queue id
relationship in the netdev_queue and netdev_rx_queue structs. This way
we always know what kind of umem has been bound to a queue id and can
act appropriately at bind time. To make the bind semantics consistent
with ethtool queue manipulations and to facilitate the implementation
of drivers, we also forbid decreasing the number of queues/channels
with ethtool if there is an active AF_XDP socket in the set of queues
that are disabled.
Jakub, please take a look at your patches. The last one I had to
change slightly to make it fit with the new interface
xdp_get_umem_from_qid(). An added bonus with this function is that we,
in the future, can also use it from the driver to get a umem, thus
simplifying driver implementations (and later remove the umem from the
NDO completely). Björn will mail patches, at a later point in time,
using this in the i40e and ixgbe drivers, that removes a good chunk of
code from the ZC implementations. I also made your code aware of Tx
queues. If we create a socket that only has a Tx queue, then the queue
id will refer to a Tx queue id only and could be larger than the
available amount of Rx queues. Please take a look at it.
Differences against v1:
* Included patches from Jakub that forbids decreasing the number of active
queues if a queue to be deactivated has an AF_XDP socket. These have
been adapted somewhat to the new interfaces in patch 2.
* Removed redundant check against real_num_[rt]x_queue in xsk_bind
* Only need to test against real_num_[rt]x_queues in
xdp_clear_umem_at_qid.
Patch 1: Introduces a umem reference in the netdev_rx_queue and
netdev_queue structs.
Patch 2: Records which queue_id is bound to which umem and make sure
that you cannot bind two different umems to the same queue_id.
Patch 3: Pre patch to ethtool_set_channels.
Patch 4: Forbid decreasing the number of active queues if a deactivated
queue has an AF_XDP socket.
Patch 5: Simplify xdp_clear_umem_at_qid now when ethtool cannot deactivate
the queue id we are running on.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As we now do not allow ethtool to deactivate the queue id we are
running an AF_XDP socket on, we can simplify the implementation of
xdp_clear_umem_at_qid().
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We already check the RSS indirection table does not use queues which
would be disabled by channel reconfiguration. Make sure user does not
try to disable queues which have a UMEM and zero-copy AF_XDP socket
installed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ethtool_set_channels() validates the config against driver's max
settings. It retrieves the current config and stores it in a
variable called max. This was okay when only max settings were
accessed but we will soon want to access current settings as
well, so calling the entire structure max makes the code less
readable.
While at it drop unnecessary parenthesis.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Previously, the xsk code did not record which umem was bound to a
specific queue id. This was not required if all drivers were zero-copy
enabled as this had to be recorded in the driver anyway. So if a user
tried to bind two umems to the same queue, the driver would say
no. But if copy-mode was first enabled and then zero-copy mode (or the
reverse order), we mistakenly enabled both of them on the same umem
leading to buggy behavior. The main culprit for this is that we did
not store the association of umem to queue id in the copy case and
only relied on the driver reporting this. As this relation was not
stored in the driver for copy mode (it does not rely on the AF_XDP
NDOs), this obviously could not work.
This patch fixes the problem by always recording the umem to queue id
relationship in the netdev_queue and netdev_rx_queue structs. This way
we always know what kind of umem has been bound to a queue id and can
act appropriately at bind time.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |/
| |
| |
| |
| |
| |
| |
| | |
These references to the umem will be used to store information
on what kind of AF_XDP umem that is bound to a queue id, if any.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Fix a simple typo: Completetion -> Completion
Signed-off-by: Konrad Djimeli <kdjimeli@igalia.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is a warning when compiling bpf sample programs in sample/bpf:
make -C /home/foo/bpf/samples/bpf/../../tools/lib/bpf/ RM='rm -rf' LDFLAGS= srctree=/home/foo/bpf/samples/bpf/../../ O=
HOSTCC /home/foo/bpf/samples/bpf/tracex3_user.o
/home/foo/bpf/samples/bpf/tracex3_user.c:20:0: warning: "ARRAY_SIZE" redefined
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
In file included from /home/foo/bpf/samples/bpf/tracex3_user.c:18:0:
./tools/testing/selftests/bpf/bpf_util.h:48:0: note: this is the location of the previous definition
# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Andrey Ignatov says:
====================
This patch set renames a few interfaces in libbpf, mostly netlink related,
so that all symbols provided by the library have only three possible
prefixes:
% nm -D tools/lib/bpf/libbpf.so | \
awk '$2 == "T" {sub(/[_\(].*/, "", $3); if ($3) print $3}' | \
sort | \
uniq -c
91 bpf
8 btf
14 libbpf
libbpf is used more and more outside kernel tree. That means the library
should follow good practices in library design and implementation to
play well with third party code that uses it.
One of such practices is to have a common prefix (or a few) for every
interface, function or data structure, library provides. It helps to
avoid name conflicts with other libraries and keeps API/ABI consistent.
Inconsistent names in libbpf already cause problems in real life. E.g.
an application can't use both libbpf and libnl due to conflicting
symbols (specifically nla_parse, nla_parse_nested and a few others).
Some of problematic global symbols are not part of ABI and can be
restricted from export with either visibility attribute/pragma or export
map (what is useful by itself and can be done in addition). That won't
solve the problem for those that are part of ABI though. Also export
restrictions would help only in DSO case. If third party application links
libbpf statically it won't help, and people do it (e.g. Facebook links
most of libraries statically, including libbpf).
libbpf already uses the following prefixes for its interfaces:
* bpf_ for bpf system call wrappers, program/map/elf-object
abstractions and a few other things;
* btf_ for BTF related API;
* libbpf_ for everything else.
The patch adds libbpf_ prefix to interfaces that use none of mentioned
above prefixes and don't fit well into the first two categories.
Long term benefits of having common prefix should outweigh possible
inconvenience of changing API for those functions now.
Patches 2-4 add libbpf_ prefix to libbpf interfaces: separate patch per
header. Other patches are simple improvements in API.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make bpf_program__load consistent with other interfaces: use __u32
instead of u32. That in turn fixes build of samples:
In file included from ./samples/bpf/trace_output_user.c:21:0:
./tools/lib/bpf/libbpf.h:132:9: error: unknown type name ‘u32’
u32 kern_version);
^
Fixes: commit 29cd77f41620d ("libbpf: Support loading individual progs")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|