summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bpf: sockmap, sock_map_delete needs to use xchgJohn Fastabend2019-07-221-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __sock_map_delete() may be called from a tcp event such as unhash or close from the following trace, tcp_bpf_close() tcp_bpf_remove() sk_psock_unlink() sock_map_delete_from_link() __sock_map_delete() In this case the sock lock is held but this only protects against duplicate removals on the TCP side. If the map is free'd then we have this trace, sock_map_free xchg() <- replaces map entry sock_map_unref() sk_psock_put() sock_map_del_link() The __sock_map_delete() call however uses a read, test, null over the map entry which can result in both paths trying to free the map entry. To fix use xchg in TCP paths as well so we avoid having two references to the same map entry. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/tls: fix transition through disconnect with closeJohn Fastabend2019-07-223-1/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE state via tcp_disconnect() without actually calling tcp_close which would then call the tls close callback. Because of this a user could disconnect a socket then put it in a LISTEN state which would break our assumptions about sockets always being ESTABLISHED state. More directly because close() can call unhash() and unhash is implemented by sockmap if a sockmap socket has TLS enabled we can incorrectly destroy the psock from unhash() and then call its close handler again. But because the psock (sockmap socket representation) is already destroyed we call close handler in sk->prot. However, in some cases (TLS BASE/BASE case) this will still point at the sockmap close handler resulting in a circular call and crash reported by syzbot. To fix both above issues implement the unhash() routine for TLS. v4: - add note about tls offload still needing the fix; - move sk_proto to the cold cache line; - split TX context free into "release" and "free", otherwise the GC work itself is in already freed memory; - more TX before RX for consistency; - reuse tls_ctx_free(); - schedule the GC work after we're done with context to avoid UAF; - don't set the unhash in all modes, all modes "inherit" TLS_BASE's callbacks anyway; - disable the unhash hook for TLS_HW. Fixes: 3c4d7559159bf ("tls: kernel TLS support") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/tls: remove sock unlock/lock around strp_done()John Fastabend2019-07-224-45/+64
| | | | | | | | | | | | | | | | The tls close() callback currently drops the sock lock to call strp_done(). Split up the RX cleanup into stopping the strparser and releasing most resources, syncing strparser and finally freeing the context. To avoid the need for a strp_done() call on the cleanup path of device offload make sure we don't arm the strparser until we are sure init will be successful. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/tls: remove close callback sock unlock/lock around TX work flushJohn Fastabend2019-07-223-7/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tls close() callback currently drops the sock lock, makes a cancel_delayed_work_sync() call, and then relocks the sock. By restructuring the code we can avoid droping lock and then reclaiming it. To simplify this we do the following, tls_sk_proto_close set_bit(CLOSING) set_bit(SCHEDULE) cancel_delay_work_sync() <- cancel workqueue lock_sock(sk) ... release_sock(sk) strp_done() Setting the CLOSING bit prevents the SCHEDULE bit from being cleared by any workqueue items e.g. if one happens to be scheduled and run between when we set SCHEDULE bit and cancel work. Then because SCHEDULE bit is set now no new work will be scheduled. Tested with net selftests and bpf selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/tls: don't call tls_sk_proto_close for hw record offloadJakub Kicinski2019-07-221-4/+0
| | | | | | | | | | | The deprecated TOE offload doesn't actually do anything in tls_sk_proto_close() - all TLS code is skipped and context not freed. Remove the callback to make it easier to refactor tls_sk_proto_close(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/tls: don't arm strparser immediately in tls_set_sw_offload()Jakub Kicinski2019-07-224-10/+19
| | | | | | | | | | | | | | | | In tls_set_device_offload_rx() we prepare the software context for RX fallback and proceed to add the connection to the device. Unfortunately, software context prep includes arming strparser so in case of a later error we have to release the socket lock to call strp_done(). In preparation for not releasing the socket lock half way through callbacks move arming strparser into a separate function. Following patches will make use of that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* libbpf: sanitize VAR to conservative 1-byte INTAndrii Nakryiko2019-07-191-2/+7
| | | | | | | | | | If VAR in non-sanitized BTF was size less than 4, converting such VAR into an INT with size=4 will cause BTF validation failure due to violationg of STRUCT (into which DATASEC was converted) member size. Fix by conservatively using size=1. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* libbpf: fix SIGSEGV when BTF loading fails, but .BTF.ext existsAndrii Nakryiko2019-07-191-0/+6
| | | | | | | | | | | | In case when BTF loading fails despite sanitization, but BPF object has .BTF.ext loaded as well, we free and null obj->btf, but not obj->btf_ext. This leads to an attempt to relocate .BTF.ext later on during bpf_object__load(), which assumes obj->btf is present. This leads to SIGSEGV on null pointer access. Fix bug by freeing and nulling obj->btf_ext as well. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* tcp: fix tcp_set_congestion_control() use from bpf hookEric Dumazet2019-07-194-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neal reported incorrect use of ns_capable() from bpf hook. bpf_setsockopt(...TCP_CONGESTION...) -> tcp_set_congestion_control() -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) -> ns_capable_common() -> current_cred() -> rcu_dereference_protected(current->cred, 1) Accessing 'current' in bpf context makes no sense, since packets are processed from softirq context. As Neal stated : The capability check in tcp_set_congestion_control() was written assuming a system call context, and then was reused from a BPF call site. The fix is to add a new parameter to tcp_set_congestion_control(), so that the ns_capable() call is only performed under the right context. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lawrence Brakmo <brakmo@fb.com> Reported-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ag71xx: fix return value check in ag71xx_probe()Wei Yongjun2019-07-191-2/+2
| | | | | | | | | | | In case of error, the function of_get_mac_address() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* ag71xx: fix error return code in ag71xx_probe()Wei Yongjun2019-07-191-1/+3
| | | | | | | | | | Fix to return error code -ENOMEM from the dmam_alloc_coherent() error handling case instead of 0, as done elsewhere in this function. Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* usb: qmi_wwan: add D-Link DWM-222 A2 device IDRogan Dawes2019-07-191-0/+1
| | | | | Signed-off-by: Rogan Dawes <rogan@dawes.za.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.Michael Chan2019-07-191-2/+5
| | | | | | | | | | | | Unlike legacy chips, 57500 chips don't need additional VNIC resources for aRFS/ntuple. Fix the code accordingly so that we don't reserve and allocate additional VNICs on 57500 chips. Without this patch, the driver is failing to initialize when it tries to allocate extra VNICs. Fixes: ac33906c67e2 ("bnxt_en: Add support for aRFS on 57500 chips.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: sja1105: Fix missing unlock on error in sk_buff()Wei Yongjun2019-07-191-0/+1
| | | | | | | | | | | | Add the missing unlock before return from function sk_buff() in the error handling case. Fixes: f3097be21bf1 ("net: dsa: sja1105: Add a state machine for RX timestamping") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* gve: replace kfree with kvfreeChuhong Yuan2019-07-192-13/+13
| | | | | | | | | Variables allocated by kvzalloc should not be freed by kfree. Because they may be allocated by vmalloc. So we replace kfree with kvfree here. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-07-1832-195/+391
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf 2019-07-18 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) verifier precision propagation fix, from Andrii. 2) BTF size fix for typedefs, from Andrii. 3) a bunch of big endian fixes, from Ilya. 4) wide load from bpf_sock_addr fixes, from Stanislav. 5) a bunch of misc fixes from a number of developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * selftests/bpf: fix test_xdp_noinline on s390Ilya Leoshkevich2019-07-181-8/+9
| | | | | | | | | | | | | | | | | | | | test_xdp_noinline fails on s390 due to a handful of endianness issues. Use ntohs for parsing eth_proto. Replace bswaps with ntohs/htons. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: fix "valid read map access into a read-only array 1" on s390Ilya Leoshkevich2019-07-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | This test looks up a 32-bit map element and then loads it using a 64-bit load. This does not work on s390, which is a big-endian machine. Since the point of this test doesn't seem to be loading a smaller value using a larger load, simply use a 32-bit load. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: fix perf_buffer on s390Ilya Leoshkevich2019-07-173-17/+11
| | | | | | | | | | | | | | | | | | | | | | | | perf_buffer test fails for exactly the same reason test_attach_probe used to fail: different nanosleep syscall kprobe name. Reuse the test_attach_probe fix. Fixes: ee5cf82ce04a ("selftests/bpf: test perf buffer API") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: structure test_{progs, maps, verifier} test runners uniformlyAndrii Nakryiko2019-07-171-14/+10
| | | | | | | | | | | | | | | | | | | | | | | | It's easier to follow the logic if it's structured the same. There is just slight difference between test_progs/test_maps and test_verifier. test_verifier's verifier/*.c files are not really compilable C files (they are more of include headers), so they can't be specified as explicit dependencies of test_verifier. Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: fix test_verifier/test_maps make dependenciesAndrii Nakryiko2019-07-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | e46fc22e60a4 ("selftests/bpf: make directory prerequisites order-only") exposed existing problem in Makefile for test_verifier and test_maps tests: their dependency on auto-generated header file with a list of all tests wasn't recorded explicitly. This patch fixes these issues. Fixes: 51a0e301a563 ("bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps") Fixes: 6b7b6995c43e ("selftests: bpf: tests.h should depend on .c files, not the output") Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Stanislav Fomichev <sdf@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * libbpf: fix another GCC8 warning for strncpyAndrii Nakryiko2019-07-161-1/+2
| | | | | | | | | | | | | | | | | | | | Similar issue was fixed in cdfc7f888c2a ("libbpf: fix GCC8 warning for strncpy") already. This one was missed. Fixing now. Cc: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: skip nmi test when perf hw events are disabledIlya Leoshkevich2019-07-161-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | Some setups (e.g. virtual machines) might run with hardware perf events disabled. If this is the case, skip the test_send_signal_nmi test. Add a separate test involving a software perf event. This allows testing the perf event path regardless of hardware perf event support. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: fix "alu with different scalars 1" on s390Ilya Leoshkevich2019-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | BPF_LDX_MEM is used to load the least significant byte of the retrieved test_val.index, however, on big-endian machines it ends up retrieving the most significant byte. Change the test to load the whole int in order to make it endianness-independent. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * selftests/bpf: remove logic duplication in test_verifierAndrii Nakryiko2019-07-161-20/+15
| | | | | | | | | | | | | | | | | | | | test_verifier tests can specify single- and multi-runs tests. Internally logic of handling them is duplicated. Get rid of it by making single run retval/data specification to be a first run spec. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Cc: Krzesimir Nowak <krzesimir@kinvolk.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * Merge branch 'bpf-fix-wide-loads-sockaddr'Daniel Borkmann2019-07-156-48/+95
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stanislav Fomichev says: ==================== When fixing selftests by adding support for wide stores, Yonghong reported that he had seen some examples where clang generates single u64 loads for two adjacent u32s as well: http://lore.kernel.org/netdev/a66c937f-94c0-eaf8-5b37-8587d66c0c62@fb.com Fix this to support aligned u64 reads for some bpf_sock_addr fields as well. ==================== Acked-by: Andrii Narkyiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * bpf: sync bpf.h to tools/Stanislav Fomichev2019-07-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Update bpf_sock_addr comments to indicate support for 8-byte reads from user_ip6 and msg_src_ip6. Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * selftests/bpf: add selftests for wide loadsStanislav Fomichev2019-07-151-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | Mirror existing wide store tests with wide loads. The only significant difference is expected error string. Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * selftests/bpf: rename verifier/wide_store.c to verifier/wide_access.cStanislav Fomichev2019-07-152-36/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the file and rename internal BPF_SOCK_ADDR define to BPF_SOCK_ADDR_STORE. This selftest will be extended in the next commit with the wide loads. Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * bpf: allow wide aligned loads for bpf_sock_addr user_ip6 and msg_src_ip6Stanislav Fomichev2019-07-152-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add explicit check for u64 loads of user_ip6 and msg_src_ip6 and update the comment. Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_okStanislav Fomichev2019-07-152-7/+7
| |/ | | | | | | | | | | | | | | | | Rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok to indicate that it can be used for both loads and stores. Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * samples/bpf: build with -D__TARGET_ARCH_$(SRCARCH)Ilya Leoshkevich2019-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While $ARCH can be relatively flexible (see Makefile and tools/scripts/Makefile.arch), $SRCARCH always corresponds to a directory name under arch/. Therefore, build samples with -D__TARGET_ARCH_$(SRCARCH), since that matches the expectations of bpf_helpers.h. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: put test_stub.o into $(OUTPUT)Ilya Leoshkevich2019-07-151-2/+5
| | | | | | | | | | | | | | | | | | | | Add a rule to put test_stub.o in $(OUTPUT) and change the references to it accordingly. This prevents test_stub.o from being created in the source directory. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: make directory prerequisites order-onlyIlya Leoshkevich2019-07-151-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When directories are used as prerequisites in Makefiles, they can cause a lot of unnecessary rebuilds, because a directory is considered changed whenever a file in this directory is added, removed or modified. If the only thing a target is interested in is the existence of the directory it depends on, which is the case for selftests/bpf, this directory should be specified as an order-only prerequisite: it would still be created in case it does not exist, but it would not trigger a rebuild of a target in case it's considered changed. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: fix attach_probe on s390Ilya Leoshkevich2019-07-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | attach_probe test fails, because it cannot install a kprobe on a non-existent sys_nanosleep symbol. Use the correct symbol name for the nanosleep syscall on 64-bit s390. Don't bother adding one for 31-bit mode, since tests are compiled only in 64-bit mode. Fixes: 1e8611bbdfc9 ("selftests/bpf: add kprobe/uprobe selftests") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * Merge branch 'bpf-btf-size-verification-fix'Daniel Borkmann2019-07-155-11/+104
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrii Nakryiko says: ==================== BTF size resolution logic isn't always resolving type size correctly, leading to erroneous map creation failures due to value size mismatch. This patch set: 1. fixes the issue (patch #1); 2. adds tests for trickier cases (patch #2); 3. and converts few test cases utilizing BTF-defined maps, that previously couldn't use typedef'ed arrays due to kernel bug (patch #3). ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * selftests/bpf: use typedef'ed arrays as map valuesAndrii Nakryiko2019-07-153-4/+4
| | | | | | | | | | | | | | | | | | | | | Convert few tests that couldn't use typedef'ed arrays due to kernel bug. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * selftests/bpf: add trickier size resolution testsAndrii Nakryiko2019-07-151-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | Add more BTF tests, validating that size resolution logic is correct in few trickier cases. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * bpf: fix BTF verifier size resolution logicAndrii Nakryiko2019-07-151-7/+12
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BTF verifier has a size resolution bug which in some circumstances leads to invalid size resolution for, e.g., TYPEDEF modifier. This happens if we have [1] PTR -> [2] TYPEDEF -> [3] ARRAY, in which case due to being in pointer context ARRAY size won't be resolved (because for pointer it doesn't matter, so it's a sink in pointer context), but it will be permanently remembered as zero for TYPEDEF and TYPEDEF will be marked as RESOLVED. Eventually ARRAY size will be resolved correctly, but TYPEDEF resolved_size won't be updated anymore. This, subsequently, will lead to erroneous map creation failure, if that TYPEDEF is specified as either key or value, as key_size/value_size won't correspond to resolved size of TYPEDEF (kernel will believe it's zero). Note, that if BTF was ordered as [1] ARRAY <- [2] TYPEDEF <- [3] PTR, this won't be a problem, as by the time we get to TYPEDEF, ARRAY's size is already calculated and stored. This bug manifests itself in rejecting BTF-defined maps that use array typedef as a value type: typedef int array_t[16]; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(value, array_t); /* i.e., array_t *value; */ } test_map SEC(".maps"); The fix consists on not relying on modifier's resolved_size and instead using modifier's resolved_id (type ID for "concrete" type to which modifier eventually resolves) and doing size determination for that resolved type. This allow to preserve existing "early DFS termination" logic for PTR or STRUCT_OR_ARRAY contexts, but still do correct size determination for modifier types. Fixes: eb3f595dab40 ("bpf: btf: Validate type reference") Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: fix compiling loop{1, 2, 3}.c on s390Ilya Leoshkevich2019-07-123-3/+3
| | | | | | | | | | | | | | | | | | Use PT_REGS_RC(ctx) instead of ctx->rax, which is not present on s390. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Tested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: make PT_REGS_* work in userspaceIlya Leoshkevich2019-07-121-20/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, on certain architectures, these macros are usable only with kernel headers. This patch makes it possible to use them with userspace headers and, as a consequence, not only in BPF samples, but also in BPF selftests. On s390, provide the forward declaration of struct pt_regs and cast it to user_pt_regs in PT_REGS_* macros. This is necessary, because instead of the full struct pt_regs, s390 exposes only its first member user_pt_regs to userspace, and bpf_helpers.h is used with both userspace (in selftests) and kernel (in samples) headers. It was added in commit 466698e654e8 ("s390/bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type"). Ditto on arm64. On x86, provide userspace versions of PT_REGS_* macros. Unlike s390 and arm64, x86 provides struct pt_regs to both userspace and kernel, however, with different member names. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: fix s930 -> s390 typoIlya Leoshkevich2019-07-121-5/+5
| | | | | | | | | | | | | | | | | | | | Also check for __s390__ instead of __s390x__, just in case bpf_helpers.h is ever used by 32-bit userspace. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: compile progs with -D__TARGET_ARCH_$(SRCARCH)Ilya Leoshkevich2019-07-121-1/+3
| | | | | | | | | | | | | | | | | | This opens up the possibility of accessing registers in an arch-independent way. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: do not ignore clang failuresIlya Leoshkevich2019-07-121-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling an eBPF prog fails, make still returns 0, because failing clang command's output is piped to llc and therefore its exit status is ignored. When clang fails, pipe the string "clang failed" to llc. This will make llc fail with an informative error message. This solution was chosen over using pipefail, having separate targets or getting rid of llc invocation due to its simplicity. In addition, pull Kbuild.include in order to get .DELETE_ON_ERROR target, which would cause partial .o files to be removed. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * tools: bpftool: add raw_tracepoint_writable prog type to headerDaniel T. Lee2019-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From commit 9df1c28bb752 ("bpf: add writable context for raw tracepoints"), a new type of BPF_PROG, RAW_TRACEPOINT_WRITABLE has been added. Since this BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE is not listed at bpftool's header, it causes a segfault when executing 'bpftool feature'. This commit adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE entry to prog_type_name enum, and will eventually fixes the segfault issue. Fixes: 9df1c28bb752 ("bpf: add writable context for raw tracepoints") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * bpf: verifier: avoid fall-through warningsGustavo A. R. Silva2019-07-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, this patch silences the following warning: kernel/bpf/verifier.c: In function ‘check_return_code’: kernel/bpf/verifier.c:6106:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (env->prog->expected_attach_type == BPF_CGROUP_UDP4_RECVMSG || ^ kernel/bpf/verifier.c:6109:2: note: here case BPF_PROG_TYPE_CGROUP_SKB: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 Notice that is much clearer to explicitly add breaks in each case statement (that actually contains some code), rather than letting the code to fall through. This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * selftests/bpf: fix bpf_target_sparc checkIlya Leoshkevich2019-07-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | bpf_helpers.h fails to compile on sparc: the code should be checking for defined(bpf_target_sparc), but checks simply for bpf_target_sparc. Also change #ifdef bpf_target_powerpc to #if defined() for consistency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * xdp: fix potential deadlock on socket mutexIlya Maximets2019-07-122-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 2 call chains: a) xsk_bind --> xdp_umem_assign_dev b) unregister_netdevice_queue --> xsk_notifier with the following locking order: a) xs->mutex --> rtnl_lock b) rtnl_lock --> xdp.lock --> xs->mutex Different order of taking 'xs->mutex' and 'rtnl_lock' could produce a deadlock here. Fix that by moving the 'rtnl_lock' before 'xs->lock' in the bind call chain (a). Reported-by: syzbot+bf64ec93de836d7f4c2c@syzkaller.appspotmail.com Fixes: 455302d1c9ae ("xdp: fix hang while unregistering device bound to xdp socket") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * xdp: fix possible cq entry leakIlya Maximets2019-07-121-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Completion queue address reservation could not be undone. In case of bad 'queue_id' or skb allocation failure, reserved entry will be leaked reducing the total capacity of completion queue. Fix that by moving reservation to the point where failure is not possible. Additionally, 'queue_id' checking moved out from the loop since there is no point to check it there. Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * libbpf: fix ptr to u64 conversion warning on 32-bit platformsAndrii Nakryiko2019-07-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 32-bit platforms compiler complains about conversion: libbpf.c: In function ‘perf_event_open_probe’: libbpf.c:4112:17: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] attr.config1 = (uint64_t)(void *)name; /* kprobe_func or uprobe_path */ ^ Reported-by: Matt Hart <matthew.hart@linaro.org> Fixes: b26500274767 ("libbpf: add kprobe/uprobe attach API") Tested-by: Matt Hart <matthew.hart@linaro.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>