summaryrefslogtreecommitdiffstats
path: root/samples/bpf (follow)
Commit message (Collapse)AuthorAgeFilesLines
* samples/bpf: build with -D__TARGET_ARCH_$(SRCARCH)Ilya Leoshkevich2019-07-151-1/+1
| | | | | | | | | | | | | | While $ARCH can be relatively flexible (see Makefile and tools/scripts/Makefile.arch), $SRCARCH always corresponds to a directory name under arch/. Therefore, build samples with -D__TARGET_ARCH_$(SRCARCH), since that matches the expectations of bpf_helpers.h. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-07-091-1/+1
|\ | | | | | | | | | | Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-07-031-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-07-03 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix the interpreter to properly handle BPF_ALU32 | BPF_ARSH on BE architectures, from Jiong. 2) Fix several bugs in the x32 BPF JIT for handling shifts by 0, from Luke and Xi. 3) Fix NULL pointer deref in btf_type_is_resolve_source_only(), from Stanislav. 4) Properly handle the check that forwarding is enabled on the device in bpf_ipv6_fib_lookup() helper code, from Anton. 5) Fix UAPI bpf_prog_info fields alignment for archs that have 16 bit alignment such as m68k, from Baruch. 6) Fix kernel hanging in unregister_netdevice loop while unregistering device bound to XDP socket, from Ilya. 7) Properly terminate tail update in xskq_produce_flush_desc(), from Nathan. 8) Fix broken always_inline handling in test_lwt_seg6local, from Jiri. 9) Fix bpftool to use correct argument in cgroup errors, from Jakub. 10) Fix detaching dummy prog in XDP redirect sample code, from Prashant. 11) Add Jonathan to AF_XDP reviewers, from Björn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * samples/bpf: xdp_redirect, correctly get dummy program idPrashant Bhole2019-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we terminate xdp_redirect, it ends up with following message: "Program on iface OUT changed, not removing" This results in dummy prog still attached to OUT interface. It is because signal handler checks if the programs are the same that we had attached. But while fetching dummy_prog_id, current code uses prog_fd instead of dummy_prog_fd. This patch passes the correct fd. Fixes: 3b7a8ec2dec3 ("samples/bpf: Check the prog id before exiting") Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | samples/bpf: fix tcp_bpf.readme detach commandStanislav Fomichev2019-07-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Copy-paste, should be detach, not attach. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | samples/bpf: add sample program that periodically dumps TCP statsStanislav Fomichev2019-07-032-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Uses new RTT callback to dump stats every second. $ mkdir -p /tmp/cgroupv2 $ mount -t cgroup2 none /tmp/cgroupv2 $ mkdir -p /tmp/cgroupv2/foo $ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs $ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog $ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog $ bpftool prog tracelog $ # run neper/netperf/etc Used neper to compare performance with and without this program attached and didn't see any noticeable performance impact. Sample output: <idle>-0 [015] ..s. 2074.128800: 0: dsack_dups=0 delivered=242526 <idle>-0 [015] ..s. 2074.128808: 0: delivered_ce=0 icsk_retransmits=0 <idle>-0 [015] ..s. 2075.130133: 0: dsack_dups=0 delivered=323599 <idle>-0 [015] ..s. 2075.130138: 0: delivered_ce=0 icsk_retransmits=0 <idle>-0 [005] .Ns. 2076.131440: 0: dsack_dups=0 delivered=404648 <idle>-0 [005] .Ns. 2076.131447: 0: delivered_ce=0 icsk_retransmits=0 Cc: Eric Dumazet <edumazet@google.com> Cc: Priyaranjan Jha <priyarjha@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | bpf: Add support for fq's EDT to HBMbrakmo2019-07-035-19/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for fq's Earliest Departure Time to HBM (Host Bandwidth Manager). Includes a new BPF program supporting EDT, and also updates corresponding programs. It will drop packets with an EDT of more than 500us in the future unless the packet belongs to a flow with less than 2 packets in flight. This is done so each flow has at least 2 packets in flight, so they will not starve, and also to help prevent delayed ACK timeouts. It will also work with ECN enabled traffic, where the packets will be CE marked if their EDT is more than 50us in the future. The table below shows some performance numbers. The flows are back to back RPCS. One server sending to another, either 2 or 4 flows. One flow is a 10KB RPC, the rest are 1MB RPCs. When there are more than one flow of a given RPC size, the numbers represent averages. The rate limit applies to all flows (they are in the same cgroup). Tests ending with "-edt" ran with the new BPF program supporting EDT. Tests ending with "-hbt" ran on top HBT qdisc with the specified rate (i.e. no HBM). The other tests ran with the HBM BPF program included in the HBM patch-set. EDT has limited value when using DCTCP, but it helps in many cases when using Cubic. It usually achieves larger link utilization and lower 99% latencies for the 1MB RPCs. HBM ends up queueing a lot of packets with its default parameter values, reducing the goodput of the 10KB RPCs and increasing their latency. Also, the RTTs seen by the flows are quite large. Aggr 10K 10K 10K 1MB 1MB 1MB Limit rate drops RTT rate P90 P99 rate P90 P99 Test rate Flows Mbps % us Mbps us us Mbps ms ms -------- ---- ----- ---- ----- --- ---- ---- ---- ---- ---- ---- cubic 1G 2 904 0.02 108 257 511 539 647 13.4 24.5 cubic-edt 1G 2 982 0.01 156 239 656 967 743 14.0 17.2 dctcp 1G 2 977 0.00 105 324 408 744 653 14.5 15.9 dctcp-edt 1G 2 981 0.01 142 321 417 811 660 15.7 17.0 cubic-htb 1G 2 919 0.00 1825 40 2822 4140 879 9.7 9.9 cubic 200M 2 155 0.30 220 81 532 655 74 283 450 cubic-edt 200M 2 188 0.02 222 87 1035 1095 101 84 85 dctcp 200M 2 188 0.03 111 77 912 939 111 76 325 dctcp-edt 200M 2 188 0.03 217 74 1416 1738 114 76 79 cubic-htb 200M 2 188 0.00 5015 8 14ms 15ms 180 48 50 cubic 1G 4 952 0.03 110 165 516 546 262 38 154 cubic-edt 1G 4 973 0.01 190 111 1034 1314 287 65 79 dctcp 1G 4 951 0.00 103 180 617 905 257 37 38 dctcp-edt 1G 4 967 0.00 163 151 732 1126 272 43 55 cubic-htb 1G 4 914 0.00 3249 13 7ms 8ms 300 29 34 cubic 5G 4 4236 0.00 134 305 490 624 1310 10 17 cubic-edt 5G 4 4865 0.00 156 306 425 759 1520 10 16 dctcp 5G 4 4936 0.00 128 485 221 409 1484 7 9 dctcp-edt 5G 4 4924 0.00 148 390 392 623 1508 11 26 v1 -> v2: Incorporated Andrii's suggestions v2 -> v3: Incorporated Yonghong's suggestions v3 -> v4: Removed credit update that is not needed Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | xsk: Change the default frame size to 4096 and allow controlling itMaxim Mikityanskiy2019-06-271-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The typical XDP memory scheme is one packet per page. Change the AF_XDP frame size in libbpf to 4096, which is the page size on x86, to allow libbpf to be used with the drivers with the packet-per-page scheme. Add a command line option -f to xdpsock to allow to specify a custom frame size. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | samples: bpf: make the use of xdp samples consistentDaniel T. Lee2019-06-264-12/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, each xdp samples are inconsistent in the use. Most of the samples fetch the interface with it's name. (ex. xdp1, xdp2skb, xdp_redirect_cpu, xdp_sample_pkts, etc.) But some of the xdp samples are fetching the interface with ifindex by command argument. This commit enables xdp samples to fetch interface with it's name without changing the original index interface fetching. (<ifname|ifindex> fetching in the same way as xdp_sample_pkts_user.c does.) Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | samples: bpf: Remove bpf_debug macro in favor of bpf_printkMichal Rostecki2019-06-251-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ibumad example was implementing the bpf_debug macro which is exactly the same as the bpf_printk macro available in bpf_helpers.h. This change makes use of bpf_printk instead of bpf_debug. Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-06-221-1/+2
|\| | | | | | | | | | | | | | | | | Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge tag 'spdx-5.2-rc6' of ↵Linus Torvalds2019-06-211-1/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
| | * | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505Thomas Gleixner2019-06-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 58 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081207.556988620@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2019-06-2016-31/+34
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-06-19 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) new SO_REUSEPORT_DETACH_BPF setsocktopt, from Martin. 2) BTF based map definition, from Andrii. 3) support bpf_map_lookup_elem for xskmap, from Jonathan. 4) bounded loops and scalar precision logic in the verifier, from Alexei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | samples: bpf: refactor header include pathDaniel T. Lee2019-06-1815-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, header inclusion in each file is inconsistent. For example, "libbpf.h" header is included as multiple ways. #include "bpf/libbpf.h" #include "libbpf.h" Due to commit b552d33c80a9 ("samples/bpf: fix include path in Makefile"), $(srctree)/tools/lib/bpf/ path had been included during build, path "bpf/" in header isn't necessary anymore. This commit removes path "bpf/" in header inclusion. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | samples: bpf: remove unnecessary include options in MakefileDaniel T. Lee2019-06-181-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to recent change of include path at commit b552d33c80a9 ("samples/bpf: fix include path in Makefile"), some of the previous include options became unnecessary. This commit removes duplicated include options in Makefile. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | samples/bpf: fix include path in MakefilePrashant Bhole2019-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent commit included libbpf.h in selftests/bpf/bpf_util.h. Since some samples use bpf_util.h and samples/bpf/Makefile doesn't have libbpf.h path included, build was failing. Let's add the path in samples/bpf/Makefile. Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | samples: bpf: don't run probes at the local make stageJakub Kicinski2019-06-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quentin reports that commit 07c3bbdb1a9b ("samples: bpf: print a warning about headers_install") is producing the false positive when make is invoked locally, from the samples/bpf/ directory. When make is run locally it hits the "all" target, which will recursively invoke make through the full build system. Speed up the "local" run which doesn't actually build anything, and avoid false positives by skipping all the probes if not in kbuild environment (cover both the new warning and the BTF probes). Reported-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | samples: bpf: print a warning about headers_installJakub Kicinski2019-06-061-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems like periodically someone posts patches to "fix" header includes. The issue is that samples expect the include path to have the uAPI headers (from usr/) first, and then tools/ headers, so that locally installed uAPI headers take precedence. This means that if users didn't run headers_install they will see all sort of strange compilation errors, e.g.: HOSTCC samples/bpf/test_lru_dist samples/bpf/test_lru_dist.c:39:8: error: redefinition of ‘struct list_head’ struct list_head { ^~~~~~~~~ In file included from samples/bpf/test_lru_dist.c:9:0: ../tools/include/linux/types.h:69:8: note: originally defined here struct list_head { ^~~~~~~~~ Try to detect this situation, and print a helpful warning. v2: just use HOSTCC (Jiong). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | bpf: hbm: fix spelling mistake "notifcations" -> "notificiations"Colin Ian King2019-06-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a spelling mistake in the help information, fix this. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-06-184-20/+4
|\ \ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2019-06-182-2/+2
| |\ \ \ \ | | |_|/ / | |/| | / | | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Lots of bug fixes here: 1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer. 2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John Crispin. 3) Use after free in psock backlog workqueue, from John Fastabend. 4) Fix source port matching in fdb peer flow rule of mlx5, from Raed Salem. 5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet. 6) Network header needs to be set for packet redirect in nfp, from John Hurley. 7) Fix udp zerocopy refcnt, from Willem de Bruijn. 8) Don't assume linear buffers in vxlan and geneve error handlers, from Stefano Brivio. 9) Fix TOS matching in mlxsw, from Jiri Pirko. 10) More SCTP cookie memory leak fixes, from Neil Horman. 11) Fix VLAN filtering in rtl8366, from Linus Walluij. 12) Various TCP SACK payload size and fragmentation memory limit fixes from Eric Dumazet. 13) Use after free in pneigh_get_next(), also from Eric Dumazet. 14) LAPB control block leak fix from Jeremy Sowden" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits) lapb: fixed leak of control-blocks. tipc: purge deferredq list for each grp member in tipc_group_delete ax25: fix inconsistent lock state in ax25_destroy_timer neigh: fix use-after-free read in pneigh_get_next tcp: fix compile error if !CONFIG_SYSCTL hv_sock: Suppress bogus "may be used uninitialized" warnings be2net: Fix number of Rx queues used for flow hashing net: handle 802.1P vlan 0 packets properly tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() tcp: add tcp_min_snd_mss sysctl tcp: tcp_fragment() should apply sane memory limits tcp: limit payload size of sacked skbs Revert "net: phylink: set the autoneg state in phylink_phy_change" bpf: fix nested bpf tracepoints with per-cpu data bpf: Fix out of bounds memory access in bpf_sk_storage vsock/virtio: set SOCK_DONE on peer shutdown net: dsa: rtl8366: Fix up VLAN filtering net: phylink: set the autoneg state in phylink_phy_change net: add high_order_alloc_disable sysctl/static key tcp: add tcp_tx_skb_cache sysctl ...
| | * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-06-072-2/+2
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-06-07 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix several bugs in riscv64 JIT code emission which forgot to clear high 32-bits for alu32 ops, from Björn and Luke with selftests covering all relevant BPF alu ops from Björn and Jiong. 2) Two fixes for UDP BPF reuseport that avoid calling the program in case of __udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption that skb->data is pointing to transport header, from Martin. 3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog workqueue, and a missing restore of sk_write_space when psock gets dropped, from Jakub and John. 4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it breaks standard applications like DNS if reverse NAT is not performed upon receive, from Daniel. 5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6 fails to verify that the length of the tuple is long enough, from Lorenz. 6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for {un,}successful probe) as that is expected to be propagated as an fd to load_sk_storage_btf() and thus closing the wrong descriptor otherwise, from Michal. 7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir. 8) Minor misc fixes in docs, samples and selftests, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | samples, bpf: suppress compiler warningMatteo Croce2019-05-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 9 fails to calculate the size of local constant strings and produces a false positive: samples/bpf/task_fd_query_user.c: In function ‘test_debug_fs_uprobe’: samples/bpf/task_fd_query_user.c:242:67: warning: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 215 [-Wformat-truncation=] 242 | snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s/id", | ^~ 243 | event_type, event_alias); | ~~~~~~~~~~~ samples/bpf/task_fd_query_user.c:242:2: note: ‘snprintf’ output between 45 and 300 bytes into a destination of size 256 242 | snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s/id", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 243 | event_type, event_alias); | ~~~~~~~~~~~~~~~~~~~~~~~~ Workaround this by lowering the buffer size to a reasonable value. Related GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83431 Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | * | samples, bpf: fix to change the buffer size for read()Chang-Hsien Tsai2019-05-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the trace for read is larger than 4096, the return value sz will be 4096. This results in off-by-one error on buf: static char buf[4096]; ssize_t sz; sz = read(trace_fd, buf, sizeof(buf)); if (sz > 0) { buf[sz] = 0; puts(buf); } Signed-off-by: Chang-Hsien Tsai <luke.tw@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295Thomas Gleixner2019-06-052-18/+2
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | / Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-06-0720-80/+20
|\| | | | |_|/ |/| | | | | | | | | | | | | | Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206Thomas Gleixner2019-05-3020-80/+20
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 107 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528171438.615055994@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | bpf: Add more stats to HBMbrakmo2019-06-014-10/+117
| | | | | | | | | | | | | | | | | | Adds more stats to HBM, including average cwnd and rtt of all TCP flows, percents of packets that are ecn ce marked and distribution of return values. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | bpf: Add cn support to hbm_out_kern.cbrakmo2019-06-014-12/+45
| | | | | | | | | | | | | | | | Update hbm_out_kern.c to support returning cn notifications. Also updates relevant files to allow disabling cn notifications. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | selftests/bpf: convert test_cgrp2_attach2 example into kselftestRoman Gushchin2019-05-282-461/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert test_cgrp2_attach2 example into a proper test_cgroup_attach kselftest. It's better because we do run kselftest on a constant basis, so there are better chances to spot a potential regression. Also make it slightly less verbose to conform kselftests output style. Output example: $ ./test_cgroup_attach #override:PASS #multi:PASS test_cgroup_attach:PASS Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | samples/bpf: fix a couple of style issues in bpf_loadDaniel T. Lee2019-05-281-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | This commit fixes a few style problems in samples/bpf/bpf_load.c: - Magic string use of 'DEBUGFS' - Useless zero initialization of a global variable - Minor style fix with whitespace Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | samples: bpf: add ibumad sample to .gitignoreMatteo Croce2019-05-251-0/+1
| | | | | | | | | | | | | | | | This commit adds ibumad to .gitignore which is currently ommited from the ignore file. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | samples: bpf: Do not define bpf_printk macroMichal Rostecki2019-05-2410-72/+2
|/ | | | | | | | The bpf_printk macro was moved to bpf_helpers.h which is included in all example programs. Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2019-05-093-0/+269
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...
| * BPF: Add sample code for new ib_umad tracepointIra Weiny2019-03-273-0/+269
| | | | | | | | | | | | | | | | Provide a count of class types for a summary of MAD packets. The example shows one way to filter the trace data based on management class. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | samples: bpf: add hbm sample to .gitignoreDaniel T. Lee2019-04-251-0/+1
| | | | | | | | | | | | | | | | This commit adds hbm to .gitignore which is currently ommited from the ignore file. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | samples/bpf: fix build with new clangAlexei Starovoitov2019-04-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang started to error on invalid asm clobber usage in x86 headers and many bpf program samples failed to build with the message: CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14: In file included from ../include/linux/in.h:23: In file included from ../include/uapi/linux/in.h:24: In file included from ../include/linux/socket.h:8: In file included from ../include/linux/uio.h:14: In file included from ../include/crypto/hash.h:16: In file included from ../include/linux/crypto.h:26: In file included from ../include/linux/uaccess.h:5: In file included from ../include/linux/sched.h:15: In file included from ../include/linux/sem.h:5: In file included from ../include/uapi/linux/sem.h:5: In file included from ../include/linux/ipc.h:9: In file included from ../include/linux/refcount.h:72: ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "er", i, "cx"); ^ ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "cx"); ^ 2 errors generated. Override volatile() to workaround the problem. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | samples, selftests/bpf: add NULL check for ksym_searchDaniel T. Lee2019-04-044-1/+21
| | | | | | | | | | | | | | | | | | | | Since, ksym_search added with verification logic for symbols existence, it could return NULL when the kernel symbols are not loaded. This commit will add NULL check logic after ksym_search. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | samples: bpf: add xdp_sample_pkts to .gitignoreDaniel T. Lee2019-03-221-0/+1
|/ | | | | | | | This commit adds xdp_sample_pkts to .gitignore which is currently ommited from the ignore file. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* bpf: hbm: fix spelling mistake "deault" -> "default"Colin Ian King2019-03-071-2/+2
| | | | | | | | There are a couple of typos, fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* bpf: HBM test scriptbrakmo2019-03-021-0/+436
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Script for testing HBM (Host Bandwidth Manager) framework. It creates a cgroup to use for testing and load a BPF program to limit egress bandwidht. It then uses iperf3 or netperf to create loads. The output is the goodput in Mbps (unless -D is used). It can work on a single host using loopback or among two hosts (with netperf). When using loopback, it is recommended to also introduce a delay of at least 1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized. USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D] [-d=<delay>|--delay=<delay>] [--debug] [-E] [-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >] [-l] [-N] [-p=<port>|--port=<port>] [-P] [-q=<qdisc>] [-R] [-s=<server>|--server=<server] [--stats] [-t=<time>|--time=<time>] [-w] [cubic|dctcp] Where: out Egress (default egress) -b or --bpf BPF program filename to load and attach. Default is nrm_out_kern.o for egress, -c or -cc TCP congestion control (cubic or dctcp) -d or --delay Add a delay in ms using netem -D In addition to the goodput in Mbps, it also outputs other detailed information. This information is test dependent (i.e. iperf3 or netperf). --debug Print BPF trace buffer -E Enable ECN (not required for dctcp) -f or --flows Number of concurrent flows (default=1) -i or --id cgroup id (an integer, default is 1) -l Do not limit flows using loopback -N Use netperf instead of iperf3 -h Help -p or --port iperf3 port (default is 5201) -P Use an iperf3 instance for each flow -q Use the specified qdisc. -r or --rate Rate in Mbps (default 1s 1Gbps) -R Use TCP_RR for netperf. 1st flow has req size of 10KB, rest of 1MB. Reply in all cases is 1 byte. More detailed output for each flow can be found in the files netperf.<cg>.<flow>, where <cg> is the cgroup id as specified with the -i flag, and <flow> is the flow id starting at 1 and increasing by 1 for flow (as specified by -f). -s or --server hostname of netperf server. Used to create netperf test traffic between to hosts (default is within host) netserver must be running on the host. --stats Get HBM stats (marked, dropped, etc.) -t or --time duration of iperf3 in seconds (default=5) -w Work conserving flag. cgroup can increase its bandwidth beyond the rate limit specified while there is available bandwidth. Current implementation assumes there is only one NIC (eth0), but can be extended to support multiple NICs. This is just a proof of concept. cubic or dctcp specify TCP CC to use Examples: ./do_hbm_test.sh -l -d=1 -D --stats Runs a 5 second test, using a single iperf3 flow and with the default rate limit of 1Gbps and a delay of 1ms (using netem) using the default TCP congestion control on the loopback device (hence we use "-l" to enforce bandwidth limit on loopback device). Since no direction is specified, it defaults to egress. Since no TCP CC algorithm is specified it uses the system default (Cubic for this test). With no -D flag, only the value of the AGGREGATE OUTPUT would show. id refers to the cgroup id and is useful when running multi cgroup tests (supported by a future patch). This patchset does not support calling TCP's congesion window reduction, even when packets are dropped by the BPF program, resulting in a large number of packets dropped. It is recommended that the current HBM implemenation only be used with ECN enabled flows. A future patch will add support for reducing TCP's cwnd and will increase the performance of non-ECN enabled flows. Output: Details for HBM in cgroup 1 id:1 rate_mbps:493 duration:4.8 secs packets:11355 bytes_MB:590 pkts_dropped:4497 bytes_dropped_MB:292 pkts_marked_percent: 39.60 bytes_marked_percent: 49.49 pkts_dropped_percent: 39.60 bytes_dropped_percent: 49.49 PING AVG DELAY:2.075 AGGREGATE_GOODPUT:505 ./do_nrm_test.sh -l -d=1 -D --stats dctcp Same as above but using dctcp. Note that fewer bytes are dropped (0.01% vs. 49%). Output: Details for HBM in cgroup 1 id:1 rate_mbps:945 duration:4.9 secs packets:16859 bytes_MB:578 pkts_dropped:1 bytes_dropped_MB:0 pkts_marked_percent: 28.74 bytes_marked_percent: 45.15 pkts_dropped_percent: 0.01 bytes_dropped_percent: 0.01 PING AVG DELAY:2.083 AGGREGATE_GOODPUT:965 ./do_nrm_test.sh -d=1 -D --stats As first example, but without limiting loopback device (i.e. no "-l" flag). Since there is no bandwidth limiting, no details for HBM are printed out. Output: Details for HBM in cgroup 1 PING AVG DELAY:2.019 AGGREGATE_GOODPUT:42655 ./do_hbm.sh -l -d=1 -D --stats -f=2 Uses iper3 and does 2 flows ./do_hbm.sh -l -d=1 -D --stats -f=4 -P Uses iperf3 and does 4 flows, each flow as a separate process. ./do_hbm.sh -l -d=1 -D --stats -f=4 -N Uses netperf, 4 flows ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=<server-name> Uses netperf between two hosts. The remote host name is specified with -s= and you need to start the program netserver manually on the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp. ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \ -s=<server-name> As previous, but allows use of extra bandwidth. For this test the rate is 8Gbps vs. 1Gbps of the previous test. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* bpf: User program for testing HBMbrakmo2019-03-022-0/+444
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The program nrm creates a cgroup and attaches a BPF program to the cgroup for testing HBM (Host Bandwidth Manager) for egress traffic. One still needs to create network traffic. This can be done through netesto, netperf or iperf3. A follow-up patch contains a script to create traffic. USAGE: hbm [-d] [-l] [-n <id>] [-r <rate>] [-s] [-t <secs>] [-w] [-h] [prog] Where: -d Print BPF trace debug buffer -l Also limit flows doing loopback -n <#> To create cgroup "/hbm#" and attach prog. Default is /nrm1 This is convenient when testing HBM in more than 1 cgroup -r <rate> Rate limit in Mbps -s Get HBM stats (marked, dropped, etc.) -t <time> Exit after specified seconds (deault is 0) -w Work conserving flag. cgroup can increase its bandwidth beyond the rate limit specified while there is available bandwidth. Current implementation assumes there is only NIC (eth0), but can be extended to support multiple NICs. Currrently only supported for egress. Note, this is just a proof of concept. -h Print this info prog BPF program file name. Name defaults to hbm_out_kern.o More information about HBM can be found in the paper "BPF Host Resource Management" presented at the 2018 Linux Plumbers Conference, Networking Track (http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf) Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* bpf: Sample HBM BPF program to limit egress bwbrakmo2019-03-024-0/+327
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A cgroup skb BPF program to limit cgroup output bandwidth. It uses a modified virtual token bucket queue to limit average egress bandwidth. The implementation uses credits instead of tokens. Negative credits imply that queueing would have happened (this is a virtual queue, so no queueing is done by it. However, queueing may occur at the actual qdisc (which is not used for rate limiting). This implementation uses 3 thresholds, one to start marking packets and the other two to drop packets: CREDIT - <--------------------------|------------------------> + | | | 0 | Large pkt | | drop thresh | Small pkt drop Mark threshold thresh The effect of marking depends on the type of packet: a) If the packet is ECN enabled, then the packet is ECN ce marked. The current mark threshold is tuned for DCTCP. c) Else, it is dropped if it is a large packet. If the credit is below the drop threshold, the packet is dropped. Note that dropping a packet through the BPF program does not trigger CWR (Congestion Window Reduction) in TCP packets. A future patch will add support for triggering CWR. This BPF program actually uses 2 drop thresholds, one threshold for larger packets (>= 120 bytes) and another for smaller packets. This protects smaller packets such as SYNs, ACKs, etc. The default bandwidth limit is set at 1Gbps but this can be changed by a user program through a shared BPF map. In addition, by default this BPF program does not limit connections using loopback. This behavior can be overwritten by the user program. There is also an option to calculate some statistics, such as percent of packets marked or dropped, which the user program can access. A latter patch provides such a program (hbm.c) Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* samples/bpf: silence compiler warning for xdpsock_user.cYonghong Song2019-03-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiling xdpsock_user.c with 4.8.5, I hit the following compilation warning: HOSTCC samples/bpf/xdpsock_user.o /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c: In function ‘main’: /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:449:6: warning: ‘idx_cq’ may be used unini tialized in this function [-Wmaybe-uninitialized] u32 idx_cq, idx_fq; ^ /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:606:7: warning: ‘idx_rx’ may be used unini tialized in this function [-Wmaybe-uninitialized] u32 idx_rx, idx_tx = 0; ^ /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:506:6: warning: ‘idx_rx’ may be used unini tialized in this function [-Wmaybe-uninitialized] u32 idx_rx, idx_fq = 0; As an example, the code pattern looks like: u32 idx_cq; ... ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq); if (ret) { ... } ... idx_fq ... The compiler warns since it does not know whether &idx_fq is assigned or not inside the library function xsk_ring_prod__reserve(). Let us assign an initial value 0 to such auto variables to silence compiler warning. Fixes: 248c7f9c0e21 ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access") Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* samples: bpf: use libbpf where easyJakub Kicinski2019-03-014-25/+35
| | | | | | | | | | | | Some samples don't really need the magic of bpf_load, switch them to libbpf. v2: - specify program types. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* samples: bpf: remove load_sock_ops in favour of bpftoolJakub Kicinski2019-03-0112-114/+16
| | | | | | | | | | bpftool can do all the things load_sock_ops used to do, and more. Point users to bpftool instead of maintaining this sample utility. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* samples: bpf: force IPv4 in pingJakub Kicinski2019-03-015-5/+5
| | | | | | | | | | | | | ping localhost may default of IPv6 on modern systems, but samples are trying to only parse IPv4. Force IPv4. samples/bpf/tracex1_user.c doesn't interpret the packet so we don't care which IP version will be used there. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* samples: bpf: fix: broken sample regarding removed functionDaniel T. Lee2019-02-273-3/+3
| | | | | | | | | | | | | | | | | Currently, running sample "task_fd_query" and "tracex3" occurs the following error. On kernel v5.0-rc* this sample will be unavailable due to the removal of function 'blk_start_request' at commit "a1ce35f". (function removed, as "Single Queue IO scheduler" no longer exists) $ sudo ./task_fd_query failed to create kprobe 'blk_start_request' error 'No such file or directory' This commit will change the function 'blk_start_request' to 'blk_mq_start_request' to fix the broken sample. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* samples/bpf: convert xdpsock to use libbpf for AF_XDP accessMagnus Karlsson2019-02-254-648/+261
| | | | | | | | | | | | | | | | This commit converts the xdpsock sample application to use the AF_XDP functions present in libbpf. This cuts down the size of it by nearly 300 lines of code. The default ring sizes plus the batch size has been increased and the size of the umem area has decreased. This so that the sample application will provide higher throughput. Note also that the shared umem code has been removed from the sample as this is not supported by libbpf at this point in time. Tested-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>