linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	arm: eBPF JIT compiler	Shubham Bansal	2017-08-22	2	-810/+1746
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The JIT compiler emits ARM 32 bit instructions. Currently, It supports eBPF only. Classic BPF is supported because of the conversion by BPF core. This patch is essentially changing the current implementation of JIT compiler of Berkeley Packet Filter from classic to internal with almost all instructions from eBPF ISA supported except the following BPF_ALU64 \| BPF_DIV \| BPF_K BPF_ALU64 \| BPF_DIV \| BPF_X BPF_ALU64 \| BPF_MOD \| BPF_K BPF_ALU64 \| BPF_MOD \| BPF_X BPF_STX \| BPF_XADD \| BPF_W BPF_STX \| BPF_XADD \| BPF_DW Implementation is using scratch space to emulate 64 bit eBPF ISA on 32 bit ARM because of deficiency of general purpose registers on ARM. Currently, only LITTLE ENDIAN machines are supported in this eBPF JIT Compiler. Tested on ARMv7 with QEMU by me (Shubham Bansal). Testing results on ARMv7: 1) test_bpf: Summary: 341 PASSED, 0 FAILED, [312/333 JIT'ed] 2) test_tag: OK (40945 tests) 3) test_progs: Summary: 30 PASSED, 0 FAILED 4) test_lpm: OK 5) test_lru_map: OK Above tests are all done with following flags enabled discreatly. 1) bpf_jit_enable=1 a) CONFIG_FRAME_POINTER enabled b) CONFIG_FRAME_POINTER disabled 2) bpf_jit_enable=1 and bpf_jit_harden=2 a) CONFIG_FRAME_POINTER enabled b) CONFIG_FRAME_POINTER disabled See Documentation/networking/filter.txt for more information. Signed-off-by: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	arm: use set_memory.h header	Laura Abbott	2017-05-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	set_memory_* functions have moved to set_memory.h. Switch to this explicitly Link: http://lkml.kernel.org/r/1488920133-27229-3-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ARM: net: bpf: fix zero right shift	Rabin Vincent	2016-01-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LSR instruction cannot be used to perform a zero right shift since a 0 as the immediate value (imm5) in the LSR instruction encoding means that a shift of 32 is perfomed. See DecodeIMMShift() in the ARM ARM. Make the JIT skip generation of the LSR if a zero-shift is requested. This was found using american fuzzy lop. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: filter: make JITs zero A for SKF_AD_ALU_XOR_X	Rabin Vincent	2016-01-06	1	-15/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data instructions since it XORs A with X while all the others replace A with some loaded value. All the BPF JITs fail to clear A if this is used as the first instruction in a filter. This was found using american fuzzy lop. Add a helper to determine if A needs to be cleared given the first instruction in a filter, and use this in the JITs. Except for ARM, the rest have only been compile-tested. Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum") Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bpf, arm: start flushing icache range from header	Daniel Borkmann	2015-11-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During review I noticed that the icache range we're flushing should start at header already and not at ctx.image. Reason is that after 55309dd3d4cd ("net: bpf: arm: address randomize and write protect JIT code"), we also want to make sure to flush the random-sized trap in front of the start of the actual program (analogous to x86). No operational differences from user side. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2015-10-20	1	-0/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ARM: net: make BPF_LD \| BPF_IND instruction trigger r_X initialisation to 0.	Nicolas Schichan	2015-10-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this patch, if the only instructions using r_X are of the BPF_LD \| BPF_IND type, r_X would not be reset to 0, using whatever value was there when entering the jited code. With this patch, r_X will be correctly marked as used so it will be reset to 0 in the prologue code. This fix also makes the test "LD_IND byte default X" pass in the test_bpf module when the ARM JIT is enabled. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ARM: net: support BPF_ALU \| BPF_MOD instructions in the BPF JIT.	Nicolas Schichan	2015-10-05	2	-6/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For ARMv7 with UDIV instruction support, generate an UDIV instruction followed by an MLS instruction. For other ARM variants, generate code calling a C wrapper similar to the jit_udiv() function used for BPF_ALU \| BPF_DIV instructions. Some performance numbers reported by the test_bpf module (the duration per filter run is reported in nanoseconds, between "jitted:<x>" and "PASS": ARMv7 QEMU nojit: test_bpf: #3 DIV_MOD_KX jited:0 2196 PASS ARMv7 QEMU jit: test_bpf: #3 DIV_MOD_KX jited:1 104 PASS ARMv5 QEMU nojit: test_bpf: #3 DIV_MOD_KX jited:0 2176 PASS ARMv5 QEMU jit: test_bpf: #3 DIV_MOD_KX jited:1 1104 PASS ARMv5 kirkwood nojit: test_bpf: #3 DIV_MOD_KX jited:0 1103 PASS ARMv5 kirkwood jit: test_bpf: #3 DIV_MOD_KX jited:1 311 PASS Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ebpf: migrate bpf_prog's flags to bitfield	Daniel Borkmann	2015-10-03	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	As we need to add further flags to the bpf_prog structure, lets migrate both bools to a bitfield representation. The size of the base structure (excluding insns) remains unchanged at 40 bytes. Add also tags for the kmemchecker, so that it doesn't throw false positives. Even in case gcc would generate suboptimal code, it's not being accessed in performance critical paths. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM: net: add support for BPF_ANC \| SKF_AD_HATYPE in ARM JIT.	Nicolas Schichan	2015-07-27	2	-2/+23
\| \| \| \| \|	Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM: net: add support for BPF_ANC \| SKF_AD_PAY_OFFSET in ARM JIT.	Nicolas Schichan	2015-07-27	1	-0/+8
\| \| \| \| \|	Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM: net: add support for BPF_ANC \| SKF_AD_PKTTYPE in ARM JIT.	Nicolas Schichan	2015-07-27	1	-0/+11
\| \| \| \| \|	Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM: net: fix vlan access instructions in ARM JIT.	Nicolas Schichan	2015-07-22	1	-3/+5
\| \| \| \| \| \| \| \| \|	This makes BPF_ANC \| SKF_AD_VLAN_TAG and BPF_ANC \| SKF_AD_VLAN_TAG_PRESENT have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG and LD_VLAN_TAG_PRESENT tests pass. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM: net: handle negative offsets in BPF JIT.	Nicolas Schichan	2015-07-22	1	-9/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the JIT would reject negative offsets known during code generation and mishandle negative offsets provided at runtime. Fix that by calling bpf_internal_load_pointer_neg_helper() appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing the execution flow to the slow path helpers when the offset is negative. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM: net: fix condition for load_order > 0 when translating load instructions.	Nicolas Schichan	2015-07-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To check whether the load should take the fast path or not, the code would check that (r_skb_hlen - load_order) is greater than the offset of the access using an "Unsigned higher or same" condition. For halfword accesses and an skb length of 1 at offset 0, that test is valid, as we end up comparing 0xffffffff(-1) and 0, so the fast path is taken and the filter allows the load to wrongly succeed. A similar issue exists for word loads at offset 0 and an skb length of less than 4. Fix that by using the condition "Signed greater than or equal" condition for the fast path code for load orders greater than 0. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2015-05-13	1	-3/+39
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Four minor merge conflicts: 1) qca_spi.c renamed the local variable used for the SPI device from spi_device to spi, meanwhile the spi_set_drvdata() call got moved further up in the probe function. 2) Two changes were both adding new members to codel params structure, and thus we had overlapping changes to the initializer function. 3) 'net' was making a fix to sk_release_kernel() which is completely removed in 'net-next'. 4) In net_namespace.c, the rtnl_net_fill() call for GET operations had the command value fixed, meanwhile 'net-next' adjusted the argument signature a bit. This also matches example merge resolutions posted by Stephen Rothwell over the past two days. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ARM: net: delegate filter to kernel interpreter when imm_offset() return ↵	Nicolas Schichan	2015-05-11	1	-1/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	value can't fit into 12bits. The ARM JIT code emits "ldr rX, [pc, #offset]" to access the literal pool. #offset maximum value is 4095 and if the generated code is too large, the #offset value can overflow and not point to the expected slot in the literal pool. Additionally, when overflow occurs, bits of the overflow can end up changing the destination register of the ldr instruction. Fix that by detecting the overflow in imm_offset() and setting a flag that is checked for each BPF instructions converted in build_body(). As of now it can only be detected in the second pass. As a result the second build_body() call can now fail, so add the corresponding cleanup code in that case. Using multiple literal pools in the JITed code is going to require lots of intrusive changes to the JIT code (which would better be done as a feature instead of fix), just delegating to the kernel BPF interpreter in that case is a more straight forward, minimal fix and easy to backport. Fixes: ddecdfcea0ae ("ARM: 7259/3: net: JIT compiler for packet filters") Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ARM: net fix emit_udiv() for BPF_ALU \| BPF_DIV \| BPF_K intruction.	Nicolas Schichan	2015-05-11	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In that case, emit_udiv() will be called with rn == ARM_R0 (r_scratch) and loading rm first into ARM_R0 will result in jit_udiv() function being called the same dividend and divisor. Fix that by loading rn first into ARM_R1 and then rm into ARM_R0. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Cc: <stable@vger.kernel.org> # v3.13+ Fixes: aee636c4809f (bpf: do not use reciprocal divide) Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ARM: net: add JIT support for loads from struct seccomp_data.	Nicolas Schichan	2015-05-13	1	-0/+10
\|/ \| \| \| \|	Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: bpf: arm: make hole-faulting more robust	Daniel Borkmann	2014-09-23	2	-3/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Will Deacon pointed out, that the currently used opcode for filling holes, that is 0xe7ffffff, seems not robust enough ... $ echo 0xffffffe7 \| xxd -r > test.bin $ arm-linux-gnueabihf-objdump -m arm -D -b binary test.bin ... 0: e7ffffff udf #65535 ; 0xffff ... while for Thumb, it ends up as ... 0: ffff e7ff vqshl.u64 q15, <illegal reg q15.5>, #63 ... which is a bit fragile. The ARM specification defines some permanently guaranteed undefined instruction (UDF) space, for example for ARM in ARMv7-AR, section A5.4 and for Thumb in ARMv7-M, section A5.2.6. Similarly, ptrace, kprobes, kgdb, bug and uprobes make use of such instruction as well to trap. Given mentioned section from the specification, we can find such a universe as (where 'x' denotes 'don't care'): ARM: xxxx 0111 1111 xxxx xxxx xxxx 1111 xxxx Thumb: 1101 1110 xxxx xxxx We therefore should use a more robust opcode that fits both. Russell King suggested that we can even reuse a single 32-bit word, that is, 0xe7fddef1 which will fault if executed in ARM or Thumb mode as done in f928d4f2a86f ("ARM: poison the vectors page"). That will still hold our requirements: $ echo 0xf1defde7 \| xxd -r > test.bin $ arm-unknown-linux-gnueabi-objdump -m arm -D -b binary test.bin ... 0: e7fddef1 udf #56801 ; 0xdde1 $ echo 0xf1defde7f1defde7f1defde7 \| xxd -r > test.bin $ arm-unknown-linux-gnueabi-objdump -marm -Mforce-thumb -D -b binary test.bin ... 0: def1 udf #241 ; 0xf1 2: e7fd b.n 0x0 4: def1 udf #241 ; 0xf1 6: e7fd b.n 0x4 8: def1 udf #241 ; 0xf1 a: e7fd b.n 0x8 So on ARM 0xe7fddef1 conforms to the above UDF pattern, and the low 16 bit likewise correspond to UDF in Thumb case. The 0xe7fd part is an unconditional branch back to the UDF instruction. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: bpf: be friendly to kmemcheck	Daniel Borkmann	2014-09-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reported by Mikulas Patocka, kmemcheck currently barks out a false positive since we don't have special kmemcheck annotation for bitfields used in bpf_prog structure. We currently have jited:1, len:31 and thus when accessing len while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that we're reading uninitialized memory. As we don't need the whole bit universe for pages member, we can just split it to u16 and use a bool flag for jited instead of a bitfield. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: bpf: arm: address randomize and write protect JIT code	Daniel Borkmann	2014-09-10	1	-6/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the ARM variant for 314beb9bcab ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). It is now possible to implement it due to commits 75374ad47c64 ("ARM: mm: Define set_memory_* functions for ARM") and dca9aa92fc7c ("ARM: add DEBUG_SET_MODULE_RONX option to Kconfig") which added infrastructure for this facility. Thus, this patch makes sure the BPF generated JIT code is marked RO, as other kernel text sections, and also lets the generated JIT code start at a pseudo random offset instead on a page boundary. The holes are filled with illegal instructions. JIT tested on armv7hl with BPF test suite. Reference: http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: bpf: make eBPF interpreter images read-only	Daniel Borkmann	2014-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: filter: split 'struct sk_filter' into socket and bpf parts	Alexei Starovoitov	2014-08-03	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_' prefix - everything that is pure BPF is changed to 'bpf_' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern orig_prog; unsigned int (bpf_func)(const struct sk_buff skb, const struct bpf_insn filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter ' and BPF_PROG_RUN to be used with 'struct bpf_prog ' __sk_filter_release(struct sk_filter ) gains __bpf_prog_release(struct bpf_prog ) helper function also perform related renames for the functions that work with 'struct bpf_prog ', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock )/sk_detach_filter(struct sock ) and SK_RUN_FILTER(struct sk_filter , ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog )/bpf_prog_destroy(struct bpf_prog ) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: filter: get rid of BPF_S_* enum	Daniel Borkmann	2014-06-02	1	-72/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch finally allows us to get rid of the BPF_S_* enum. Currently, the code performs unnecessary encode and decode workarounds in seccomp and filter migration itself when a filter is being attached in order to overcome BPF_S_* encoding which is not used anymore by the new interpreter resp. JIT compilers. Keeping it around would mean that also in future we would need to extend and maintain this enum and related encoders/decoders. We can get rid of all that and save us these operations during filter attaching. Naturally, also JIT compilers need to be updated by this. Before JIT conversion is being done, each compiler checks if A is being loaded at startup to obtain information if it needs to emit instructions to clear A first. Since BPF extensions are a subset of BPF_LD \| BPF_{W,H,B} \| BPF_ABS variants, case statements for extensions can be removed at that point. To ease and minimalize code changes in the classic JITs, we have introduced bpf_anc_helper(). Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int), arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we unfortunately didn't have access, but changes are analogous to the rest. Joint work with Alexei Starovoitov. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Chema Gonzalez <chemag@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: filter: add jited flag to indicate jit compiled filters	Daniel Borkmann	2014-03-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a jited flag into sk_filter struct in order to indicate whether a filter is currently jited or not. The size of sk_filter is not being expanded as the 32 bit 'len' member allows upper bits to be reused since a filter can currently only grow as large as BPF_MAXINSNS. Therefore, there's enough room also for other in future needed flags to reuse 'len' field if necessary. The jited flag also allows for having alternative interpreter functions running as currently, we can only detect jit compiled filters by testing fp->bpf_func to not equal the address of sk_run_filter(). Joint work with Alexei Starovoitov. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Rename skb->rxhash to skb->hash	Tom Herbert	2014-03-26	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The packet hash can be considered a property of the packet, not just on RX path. This patch changes name of rxhash and l4_rxhash skbuff fields to be hash and l4_hash respectively. This includes changing uses of the field in the code which don't call the access functions. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bpf: do not use reciprocal divide	Eric Dumazet	2014-01-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'devel-stable' into for-next	Russell King	2013-11-12	1	-1/+5
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: arch/arm/include/asm/atomic.h arch/arm/include/asm/hardirq.h arch/arm/kernel/smp.c
\| *	ARM: net: fix arm instruction endian-ness in bpf_jit_32.c	Ben Dooks	2013-10-19	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use <asm/opcodes.h> to correctly transform instruction byte ordering into in-memory ordering. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Dave Martin <Dave.Martin@arm.com>
* \|	net: fix unsafe set_memory_rw from softirq	Alexei Starovoitov	2013-10-07	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] <Interrupt> [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] * DEADLOCK * [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [<ffffffff8175a145>] dump_stack+0x55/0x76 [ 56.936067] [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208 [ 56.942445] [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50 [ 56.948932] [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150 [ 56.955470] [<ffffffff810ccb52>] mark_lock+0x282/0x2c0 [ 56.961945] [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50 [ 56.968474] [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90 [ 56.981942] [<ffffffff810cef72>] lock_acquire+0x92/0x1d0 [ 56.988745] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50 [ 57.002493] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0 [ 57.051720] [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40 [ 57.058727] [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40 [ 57.065577] [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0 [ 57.085373] [<ffffffff81058245>] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	arm: bpf_jit: can call module_free() from any context	Daniel Borkmann	2013-05-20	1	-15/+3
\| \| \| \| \| \| \| \| \| \|	Follow-up on module_free()/vfree() that takes care of the rest, so no longer this workaround with work_struct needed. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	filter: bpf_jit_comp: refactor and unify BPF JIT image dump output	Daniel Borkmann	2013-03-21	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If bpf_jit_enable > 1, then we dump the emitted JIT compiled image after creation. Currently, only SPARC and PowerPC has similar output as in the reference implementation on x86_64. Make a small helper function in order to reduce duplicated code and make the dump output uniform across architectures x86_64, SPARC, PPC, ARM (e.g. on ARM flen, pass and proglen are currently not shown, but would be interesting to know as well), also for future BPF JIT implementations on other archs. Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Matt Evans <matt@ozlabs.org> Cc: Eric Dumazet <eric.dumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM:net: an issue for k which is u32, never < 0	Chen Gang	2013-03-12	1	-1/+1
\| \| \| \| \| \| \| \| \|	k is u32 which never < 0, need type cast, or cause issue. Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ARM: net: bpf_jit: fix emit_swap16() for non ARMv6+.	Nicolas Schichan	2013-02-14	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \|	The original code was generating an lsl instructions using the value of ARM_R8 (skb_headlen, possibly uninitialized if no skb_headlen access was required) as a shift amount. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	Linus Torvalds	2012-12-13	2	-5/+26
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull networking changes from David Miller: 1) Allow to dump, monitor, and change the bridge multicast database using netlink. From Cong Wang. 2) RFC 5961 TCP blind data injection attack mitigation, from Eric Dumazet. 3) Networking user namespace support from Eric W. Biederman. 4) tuntap/virtio-net multiqueue support by Jason Wang. 5) Support for checksum offload of encapsulated packets (basically, tunneled traffic can still be checksummed by HW). From Joseph Gasparakis. 6) Allow BPF filter access to VLAN tags, from Eric Dumazet and Daniel Borkmann. 7) Bridge port parameters over netlink and BPDU blocking support from Stephen Hemminger. 8) Improve data access patterns during inet socket demux by rearranging socket layout, from Eric Dumazet. 9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and Jon Maloy. 10) Update TCP socket hash sizing to be more in line with current day realities. The existing heurstics were choosen a decade ago. From Eric Dumazet. 11) Fix races, queue bloat, and excessive wakeups in ATM and associated drivers, from Krzysztof Mazur and David Woodhouse. 12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions in VXLAN driver, from David Stevens. 13) Add "oops_only" mode to netconsole, from Amerigo Wang. 14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also allow DCB netlink to work on namespaces other than the initial namespace. From John Fastabend. 15) Support PTP in the Tigon3 driver, from Matt Carlson. 16) tun/vhost zero copy fixes and improvements, plus turn it on by default, from Michael S. Tsirkin. 17) Support per-association statistics in SCTP, from Michele Baldessari. And many, many, driver updates, cleanups, and improvements. Too numerous to mention individually. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) net/mlx4_en: Add support for destination MAC in steering rules net/mlx4_en: Use generic etherdevice.h functions. net: ethtool: Add destination MAC address to flow steering API bridge: add support of adding and deleting mdb entries bridge: notify mdb changes via netlink ndisc: Unexport ndisc_{build,send}_skb(). uapi: add missing netconf.h to export list pkt_sched: avoid requeues if possible solos-pci: fix double-free of TX skb in DMA mode bnx2: Fix accidental reversions. bna: Driver Version Updated to 3.1.2.1 bna: Firmware update bna: Add RX State bna: Rx Page Based Allocation bna: TX Intr Coalescing Fix bna: Tx and Rx Optimizations bna: Code Cleanup and Enhancements ath9k: check pdata variable before dereferencing it ath5k: RX timestamp is reported at end of frame ath9k_htc: RX timestamp is reported at end of frame ...
\| *	ARM: net: bpf_jit_32: add VLAN instructions for BPF JIT	Daniel Borkmann	2012-11-14	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is a follow-up for patch "net: filter: add vlan tag access" to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ARM: net: bpf_jit_32: add XOR instruction for BPF JIT	Daniel Borkmann	2012-11-14	2	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is a follow-up for patch "filter: add XOR instruction for use with X/K" that implements BPF ARM JIT parts for the BPF XOR operation. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets.	Schichan Nicolas	2012-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The offset must be multiplied by 4 to be sure to access the correct 32bit word in the stack scratch space. For instance, a store at scratch memory cell #1 was generating the following: st r4, [sp, #1] While the correct code for this is: st r4, [sp, #4] To reproduce the bug (assuming your system has a NIC with the mac address 52:54:00:12:34:56): echo 0 > /proc/sys/net/core/bpf_jit_enable tcpdump -ni eth0 "ether[1] + ether[2] - ether[3] * ether[4] - ether[5] \ == -0x3AA" # this will capture packets as expected echo 1 > /proc/sys/net/core/bpf_jit_enable tcpdump -ni eth0 "ether[1] + ether[2] - ether[3] * ether[4] - ether[5] \ == -0x3AA" # this will not. This bug was present since the original inclusion of bpf_jit for ARM (ddecdfce: ARM: 7259/3: net: JIT compiler for packet filters). Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* \|	ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch.	Schichan Nicolas	2012-12-11	1	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Official prototype for kzalloc is: void kzalloc(size_t, gfp_t); The ARM bpf_jit code was having the assumption that it was: void kzalloc(gfp_t, size); This was resulting the use of some random GFP flags depending on the size requested and some random overflows once the really needed size was more than the value of GFP_KERNEL. This bug was present since the original inclusion of bpf_jit for ARM (ddecdfce: ARM: 7259/3: net: JIT compiler for packet filters). Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
*	ARM: 7421/1: bpf_jit: BPF_S_ANC_ALU_XOR_X support	Mircea Gherzan	2012-06-14	2	-0/+9
\| \| \| \| \| \| \| \|	JIT support for the XOR operation introduced by the commit ffe06c17afbb. Signed-off-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
*	ARM: 7259/3: net: JIT compiler for packet filters	Mircea Gherzan	2012-03-24	3	-0/+1108
	Based of Matt Evans's PPC64 implementation. The compiler generates ARM instructions but interworking is supported for Thumb2 kernels. Supports both little and big endian. Unaligned loads are emitted for ARMv6+. Not all the BPF opcodes that deal with ancillary data are supported. The scratch memory of the filter lives on the stack. Hardware integer division is used if it is available. Enabled in the same way as for x86-64 and PPC64: echo 1 > /proc/sys/net/core/bpf_jit_enable A value greater than 1 enables opcode output. Signed-off-by: Mircea Gherzan <mgherzan@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>